Skip to main content
eLife logoLink to eLife
. 2020 May 29;9:e58556. doi: 10.7554/eLife.58556

Spatial inter-centromeric interactions facilitated the emergence of evolutionary new centromeres

Krishnendu Guin 1, Yao Chen 2, Radha Mishra 1,, Siti Rawaidah BM Muzaki 2, Bhagya C Thimmappa 1,, Caoimhe E O'Brien 3, Geraldine Butler 3, Amartya Sanyal 2,, Kaustuv Sanyal 1,
Editors: Job Dekker4, Kevin Struhl5
PMCID: PMC7292649  PMID: 32469306

Abstract

Centromeres of Candida albicans form on unique and different DNA sequences but a closely related species, Candida tropicalis, possesses homogenized inverted repeat (HIR)-associated centromeres. To investigate the mechanism of centromere type transition, we improved the fragmented genome assembly and constructed a chromosome-level genome assembly of C. tropicalis by employing PacBio sequencing, chromosome conformation capture sequencing (3C-seq), chromoblot, and genetic analysis of engineered aneuploid strains. Further, we analyzed the 3D genome organization using 3C-seq data, which revealed spatial proximity among the centromeres as well as telomeres of seven chromosomes in C. tropicalis. Intriguingly, we observed evidence of inter-centromeric translocations in the common ancestor of C. albicans and C. tropicalis. Identification of putative centromeres in closely related Candida sojae, Candida viswanathii and Candida parapsilosis indicates loss of ancestral HIR-associated centromeres and establishment of evolutionary new centromeres (ENCs) in C. albicans. We propose that spatial proximity of the homologous centromere DNA sequences facilitated karyotype rearrangements and centromere type transitions in human pathogenic yeasts of the CUG-Ser1 clade.

Research organism: Other

Introduction

The efficient maintenance of the genetic material and its propagation to subsequent generations determine the fitness of an organism. Genomic rearrangements are often associated with the development of multiple diseases, including cancer. Chromosomal rearrangements, on the other hand, are often observed during speciation (Searle, 1998). Such structural changes begin with the formation of at least one DNA double-strand break (DSB), which is generally repaired by homologous recombination (HR) or non-homologous end joining (NHEJ) in vivo. Studies using engineered in vivo model systems suggested that the success of DSB repair through HR depends upon an efficient identification of a template donor. This process of ‘homology search’ is facilitated by the physical proximity and the extent of DNA sequence homology (Lee et al., 2016; Agmon et al., 2013; Burgess and Kleckner, 1999). Multi-invasion-induced rearrangements (MIRs) involving more than one template donors have recently been shown to be influenced by physical proximity and homology (Piazza et al., 2017). Therefore, the nature of genomic rearrangements is mostly dependent on the type of spatial genome organization. In yeasts, apicomplexans, and certain plants, centromeres cluster inside the nucleus (Muller et al., 2019), which may facilitate translocations between two chromosomes involving their centromeric and adjacent pericentromeric loci.

The centromere, one of the guardians of genome stability, assembles a large DNA-protein complex to form the kinetochore, which ensures fidelity of chromosome segregation by correctly attaching chromosomes to the spindle. Paradoxically, this conserved process of chromosome segregation is carried out by highly diverse species-specific centromere DNA sequences. For example, the length of centromere DNA is ~125 bp in budding yeast Saccharomyces cerevisiae (Clarke and Carbon, 1980), but it can be as long as a few megabases in humans (Mahtani and Willard, 1990). Centromeres have been cloned and characterized from a large number of fungal species. The only factor that remains common to most fungal centromeres is the presence of histone H3 variant CENP-ACse4 except in some Mucorales like Mucor circinelloides (Navarro-Mendoza et al., 2019). Many kinetochore proteins are believed to have evolved from pre-eukaryotic lineages and remained conserved within closely related species complexes or expanded through gene duplication (Meraldi et al., 2006; Tromer et al., 2019; van Hooff et al., 2017). It remains a paradox that despite the rapid evolution of centromere DNA, the kinetochore structure remains relatively well-conserved (Ekwall, 2007). Therefore, an examination of the evolutionary processes driving species-specific changes in centromere DNA is essential for a better understanding of centromere biology.

The first cloned centromere that of the budding yeast S. cerevisiae carries conserved genetic elements capable of forming a functional centromere de novo when cloned into a yeast replicative plasmid (Clarke and Carbon, 1980). Such genetic regulation of centromere function also exists in the fission yeast Schizosaccharomyces pombe, where centromeres possess inverted repeat-associated structures of 40–100 kb (Clarke and Baum, 1990). Other closely related budding and fission yeasts were also found to harbor a DNA sequence-dependent regulation of centromere function (Gordon et al., 2011; Tong et al., 2019; Kobayashi et al., 2015), but the advantage of having such genetic regulation is not well understood. In fact, the majority of species with known centromeres are thought to be regulated by an epigenetic mechanism (Ekwall, 2007). A truly epigenetically-regulated fungal centromere carrying a 3–5 kb long CENP-ACse4-bound unique DNA sequence exists in another budding yeast C. albicans (Sanyal et al., 2004), a CUG-Ser1 clade species in the fungal phylum of Ascomycota. Subsequently, such unique centromeres were also discovered in closely related Candida dubliniensis (Padmanabhan et al., 2008) and Candida lusitaniae (Kapoor et al., 2015). Strikingly, all seven centromeres of C. tropicalis, another CUG-Ser1 clade species, carry 3–4 kb long inverted repeats (IR) flanking ~3 kb long CENP-ACse4 rich central core (CC). The centromere sequences are highly identical to each other in C. tropicalis. Intriguingly, centromere DNA of C. tropicalis can facilitate de novo recruitment of CENP-ACse4 to some extent (Chatterjee et al., 2016). In contrast, centromeres of C. albicans completely lack such a DNA sequence-dependent mechanism (Baum et al., 2006). Such a rapid transition in the structural and functional properties of centromeres within two closely related species offers a unique opportunity to study the process of centromere type transition.

Kinetochore proteins appeared as a single punctum at the periphery of a nucleus indicating the presence of constitutively clustered centromeres in C. tropicalis (Chatterjee et al., 2016). Our previous analysis also showed that centromeres of C. tropicalis were located near interchromosomal synteny breakpoints (ICSBs) as relics of ancient translocations in the common ancestor of C. tropicalis and C. albicans (Chatterjee et al., 2016). Do homologous centromere DNA regions in close spatial proximity facilitate chromosomal translocation events? Due to the nature of the then-available fragmented genome assembly, the genome-wide distribution of the ICSBs and the spatial organization of the genome in C. tropicalis remained unexplored. However, the near-complete C. albicans genome assembly was available. Therefore, to examine whether the spatial proximity of clustered centromeres drives interchromosomal translocation events guiding speciation in the CUG-Ser1 clade required a chromosome-level complete genome assembly of C. tropicalis.

In this study, we constructed a chromosome-level gapless genome assembly of the C. tropicalis type strain MYA-3404 by combining information from previously available contigs, NGS reads and high-throughput 3C-seq data. Using this assembly and 3C-seq data, we studied the spatial genome organization in C. tropicalis. Next, we mapped the ICSBs in the C. tropicalis genome with reference to that of C. albicans (ASM18296v3) to test whether the frequency of ICSB correlated with the spatial genome organization. In addition, we performed Oxford Nanopore and Illumina sequencing and assembled the genome of Candida sojae (strain NCYC-2607), a sister species of C. tropicalis in the CUG-Ser1 clade (Shen et al., 2018). Finally, using this genome assembly of C. sojae and publicly available genome assembly of C. viswanathii (ASM332773v1), we identified the putative centromeres of these two species as HIR-associated loci syntenic to the centromeres of C. tropicalis. Based on our results, we propose a model that suggests homology and proximity guided centromere-proximal translocations facilitated karyotype evolution and possibly aided in rapid transition from HIR-associated to unique centromere types in the members of the CUG-Ser1 clade.

Results

A chromosome-level gapless assembly of the C. tropicalis genome in seven chromosomes

C. tropicalis has seven pairs of chromosomes (Chatterjee et al., 2016; Butler et al., 2009). However, the current publicly available genome assembly (ASM633v3) has 23 nuclear contigs and one mitochondrial contig. To completely assemble the nuclear genome of C. tropicalis in seven chromosomes, we combined results of short-read Illumina sequencing and long-read single molecule real-time sequencing (SMRT-seq) with high-throughput 3C-seq (simplified Hi-C) experiment (Figure 1A, Figure 1—figure supplement 1A–DSexton et al., 2012). We started from the publicly available genome assembly of C. tropicalis strain MYA-3404 in 23 nuclear contigs (ASM633v3, Assembly A) (Butler et al., 2009). We used Illumina sequencing reads to scaffold them into 16 contigs to get Assembly B (Figure 1A). Next, we used the SMRT-seq long reads to join these contigs, which resulted in an assembly of 12 contigs (Assembly C, Supplementary file 1). Based on the contour clamped homogenized electric field (CHEF)-gel karyotyping (Figure 1B) and 3C-seq data (Figure 1—figure supplement 1E–G), we joined two contigs and rectified a misjoin in Assembly C to produce an assembly of seven chromosomes and five short orphan haplotigs (OHs). We suspected that the OHs are heterozygous loci in the diploid genome of C. tropicalis. Analysis of the de novo contigs (Figure 1—figure supplement 1H, Materials and methods), sequence coverage data (Figure 1—figure supplement 2A–B), and Southern hybridization of engineered aneuploid strains demonstrated that the small OHs mapped to heterozygous regions of the genome (Figure 1—figure supplement 2C–I, Materials and methods). Next, we used de novo contigs to fill pre-existing 104 N-gaps and scaffolded 14 sub-telomeres (Figure 1—figure supplement 3A–C, Supplementary file 2). Finally, we used 3C-seq reads to polish the complete genome assembly of C. tropicalis constituting 14,609,527 bp in seven telomere-to-telomere long gapless chromosomes (Figure 1B). We call this new assembly as Assembly2020.

Figure 1. Construction of the gapless assembly of C. tropicalis type strain MYA-3404 in seven chromosomes.

(A) Schematic showing the stepwise construction of the gapless chromosome-level assembly (Assembly2020) of C. tropicalis (also see Figure 1—figure supplement 1 and Figure 1—figure supplement 2). (B) An ethidium bromide (EtBr)-stained CHEF gel image of separated chromosomes of the C. tropicalis (strain MYA-3404) and C. albicans (strain SC5314) (Materials and methods). C. albicans chromosomes are used as size markers for estimation and validation of lengths and identities of C. tropicalis chromosomes in the newly constructed Assembly2020. (C) An ideogram of seven chromosomes of C. tropicalis as deduced from Assembly2020 and drawn to scale. The genomic location of the three loci showing copy number variations (CNVs), DUP4, DUP5 and DUPR located on Chr4, Chr5 and ChrR respectively, are marked and depicted as striped box. The CNVs for which the correct homolog-wise distribution of the duplicated copy is unknown are marked with asterisks. Homolog-specific differences for Chr1 and Chr4, occurred due to an exchange of chromosomal parts in a balanced heterozygous translocation between Chr1B and Chr4B, are highlighted with black borders (also see Figure 1—figure supplement 4C). (D) A circos plot showing the genome-wide distribution of various sequence features. Very high sequence coverage at rDNA locus is clipped for more precise representation and marked with an asterisk.

Figure 1.

Figure 1—figure supplement 1. Schematic of the strategies used for construction of the gapless chromosome-level assembly of C. tropicalis.

Figure 1—figure supplement 1.

(A - D) The outline of the steps followed for the construction of the genome assembly. (A) Major steps followed for 3C-sequencing in this study were (I) crosslinking, (II) restriction digestion, and (III) ligation, library preparation, and sequencing. (B) A cartoon explaining the use of contact probability values for establishing contiguity between two DNA fragments. Pink arrows denote coordinates of the anchor bin, with respect to which the contact probability scores (represented as the red dots) are determined. (C) Long reads generated by SMRT-seq were used to construct de novo genome assemblies using Canu as well as the FALCON pipeline. (D) Use of de novo contigs to fill N-gaps, finding the alleles of the orphan contigs in the genome and scaffolding of the sub-telomeres. (E) The 3C profile (bin size = 10 kb) of 3′-terminal bin of contig6 (anchor; gray vertical line) showing its contact probabilities (blue dots) with bins on contig5 and contig6. (F) The 3C profile (bin size = 10 kb) of 3′ terminal bin of contig5 (anchor; gray vertical line) showing its contact probabilities (blue dots) with bins on contig5 and contig6. (G) A schematic representation of chromosome 2 assembled by fusion of contig5 and contig6 in a tail-to-tail orientation based on the 3C profile results. (H) Orphan contigs OHs are mapped to the chromosomes by BLAST analysis of the ORFs, which are located on the Canu assembled de novo contigs. The allelic difference in the OH loci is depicted by color-coded ORFs (orange and green arrows).
Figure 1—figure supplement 2. Orphan contigs are alleles in the diploid genome of C. tropicalis.

Figure 1—figure supplement 2.

(A) IGV track images showing the coverage of 3C-seq data on the y-axis (number of reads mapped per bin for each million of the total reads) over the orphan contigs and a control locus from Chr1. (B) Violin plots showing the distribution of read coverage of 3C-seq data across the orphan contigs (bin size = 5 bp) and the control region on Chr1 generated using deepTools2 bamCoverage script. (C) Schematic showing the positions of HindIII sites (vertical black lines) and the length of the expected bands detectable by the probe used (red bars) for Southern hybridization to confirm sch9 deletion strains (CtKG001). (D) Phosphorimage of the blot for confirmation of sch9△/sch9△ mutant strains. (E) Schematic showing various number of cells (105, 104, 103, and 102) of CtKG002 (left) and CtKG003 (right) plated on CM+FOA plate. The plate image showing appearance of FOAR colonies in CtKG002 but not in CtKG003 strain. (F) The FOAR colonies thus obtained were picked up, patched on CM-URA (left), and YPDU media (right) and imaged after 48 hr of growth at 30°C. (G) The EtBr stained gel image for multiplex PCR products to detect the loss of MTLa or MTLα alleles in the FOAR colonies along with the wild-type control (Primers are listed on Supplementary file 9). (H and I) Experimental validation of the allelic nature of contig16 and contig14, respectively, using Southern blot analysis. The length of restriction fragments polymorphisms between the alleles after digesting with ClaI and EcoRI (restriction enzyme sites are indicated using blue arrowheads) are graphically represented for contig16 and contig14, respectively. The lanes in gels represent the wild-type MYA-3404 (2nd lane) and the monosomic aneuploid strains (CtKG101 - 105) where one homolog of Chr5 is absent. The probes used in this experiment are denoted using red bars.
Figure 1—figure supplement 3. Schematic outline of the strategy followed for N-gap filling and scaffolding of sub-telomeres.

Figure 1—figure supplement 3.

(A - B) Strategy-I and strategy-II (Materials and methods) for filling N-gaps without flanking repeats or with flanking repeats, respectively. Repeats are presented as black arrows. (C) Schematic for scaffolding of sub-telomeres using the de novo assembled contigs.
Figure 1—figure supplement 4. Identification of CNVs in the C. tropicalis strain MYA-3404.

Figure 1—figure supplement 4.

(A) EtBr stained gel images and phosphorimages obtained from Southern hybridization experiments using a centromere-proximal (Probe A) and a centromere-distal probe (Probe B) from Chr4 (Supplementary file 9). (B) Violin plots for the number of reads mapped per bin (2 bp) on ChrR (excluding the rDNA locus), and other chromosomes or loci as indicated. The average number of reads mapped on the chromosomes and DUP4, DUP5, and DUPR loci are presented. (C) CNAtra output of read depth signals calculated from 3C-seq reads (black dots; bin size = 1 kb) and estimated copy numbers (red lines) of each chromosome in C. tropicalis. For regions whose estimated copy numbers are <1.5 or >2.5, their respective start and end coordinates (shown in black vertical text) as well as estimated copy numbers (shown in red horizontal text) are manually labelled. Black box represents a zoom in view of a region on ChrR (600–1000 kb) where a duplicated region shows an estimated copy number of ~4.
Figure 1—figure supplement 5. Chromoblot, sequence coverage analysis, and haplotyping for validation of the chromosome-level genome assembly of C. tropicalis.

Figure 1—figure supplement 5.

(A) Schematic of the balanced heterozygous translocation between Chr1B and Chr4B. The DUP4 locus is highlighted with the black striped box. The junction between Chr1 and Chr4 on Chr1B and Chr4B are marked with black and purple arrows, respectively. (B) Contact probability heatmaps (bin size = 10 kb) of Chr1 and Chr4 of C. tropicalis showing a balanced translocation as evidenced by a butterfly-like pattern (chromatin contacts split into two blocks) in the interchromosomal area. The 3C-seq reads were mapped to Assembly2020 (top; original) with Chr1A and Chr4A genomic sequences (Figure 1C). We have also mapped the 3C-seq reads to an alternate assembly (bottom) with Chr1B and Chr4B sequences. Alternate assembly has been generated by exchanging the genomic sequences at the translocation breakpoint in Chr1 and Chr4. Coordinate of translocation was mapped using two de novo assembled contigs supporting the junctions. Chromosome labels and their corresponding ideograms are shown on the heatmap. Colorbar represents the contact probability in log2 scale. (E) An ethidium bromide stained gel image and phosphorimages obtained from Southern hybridization using a probe from part of Chr1, which is exchanged with Chr4 (probe F) and a second probe from part of Chr4, that is exchanged with Chr1 (Probe E) (Supplementary file 9). Black triangles point to the genomic coordinates of the probes used. (C) IGV tracks showing 3C-seq (blue) and SMRT-seq coverage (yellow) across the translocation junctions on each of the unaltered homolog of Chr1 (black border) and Chr4 (purple border), respectively. (D) An ethidium bromide stained gel image and phosphorimages obtained from Southern hybridization using centromere-proximal probes from Chr1 (Probe C) and ChrR (Probe D). (E) A synteny dot-plot comparing the colinearity between the chromosomes and the FALCON-generated contigs (labeled as a-l). Five very short contigs are denoted by an asterisk. The enlarged version of the dot plot for these contigs is shown on the right panel. The dot-plot was generated using Symap.
Figure 1—figure supplement 6. Partial conservation of a LOH block in each of the C. albicans, C. tropicalis and C. sojae genome.

Figure 1—figure supplement 6.

The circos tracks represent the SNP density, positions of the centromeres, Indel density, Illumina sequence coverage (the sequence coverage at the rDNA loci is clipped for clearer representation and marked with an asterisk) as indicated. The ribbon plot was drawn by connecting the genomic coordinates of the conserved single copy orthologs between C. tropicalis and C. sojae (teal), and C. tropicalis and C. albicans (purple).

We assigned the numbers to each chromosome according to the length, starting from the longest as chromosome 1 (Chr1) through the shortest as chromosome 6 (Chr6). The remaining chromosome, the one containing the rDNA locus, was named as chromosome R (ChrR) (Figure 1C). Accordingly, centromeres on each chromosome were named after the respective chromosome number. Additionally, we oriented the DNA sequence of each chromosome in a way to consistently maintain the short arm at the 5′ end. The statistics of these genome assemblies of C. tropicalis is summarized in Supplementary file 3. In Assembly2020, 1278 out of 1315 Ascomycota-specific BUSCO gene sets could be identified compared to 1255 identified using Assembly A (Supplementary file 4, Materials and methods). The inclusion of 23 additional BUSCO gene sets suggests significantly improved contiguity and completeness of Assembly2020.

Previously, using centromere-proximal probes, we could distinctly identify five chromosomes (Chr1, Chr2, Chr3, Chr5, and Chr6) in chromoblot analysis (Chatterjee et al., 2016). However, the lengths of Chr4 and ChrR could not be determined. To validate the correct assembly of these two chromosomes (Chr4 and ChrR), we performed additional chromoblot analysis. We observed that Chr4 homologs differed in size (Figure 1—figure supplement 4A). Analysis of the sequence coverage across Chr4 identified an internal duplication of ~235 kb region, which could explain the size difference between the homologs Chr4A and Chr4B (Figure 1C, Figure 1—figure supplement 4B). We named this duplicated locus as DUP4. Subsequently, we scanned the entire genome for the presence of copy number variations (CNVs), which led to the identification of two additional large-scale duplication events: one each on Chr5 (DUP5,~23 kb) and ChrR (DUPR,~80 kb) (Figure 1C, Figure 1—figure supplement 4B). Further, using CNAtra software (Khalil et al., 2020) we confirmed these duplication events and identified additional small-scale CNV loci with copy number <1.5 or >2.5 (Figure 1—figure supplement 4C). Additionally, we detected a balanced heterozygous translocation event between Chr1 and Chr4 (Figure 1—figure supplement 5A) through analyses of 3C-seq data and de novo contigs (Figure 1—figure supplement 5B). This translocation was validated using chromoblot analysis (Figure 1—figure supplement 5C) as well as Illumina, and SMRT-seq read mapping (Figure 1—figure supplement 5D). Thus, while chromoblot analysis suggests that the actual length of ChrR is ~2.8 Mb (Figure 1—figure supplement 5E), the assembled length is 2.1 Mb (Figure 1C). Considering the length of the rDNA locus is ~700 kb in C. albicans (Jones et al., 2004), we reason that the difference between the assembled length and actual length (derived from chromoblot analysis) of ChrR in C. tropicalis can be attributed to the presence of the repetitive rDNA locus of ~700 kb, which is not completely assembled in Assembly2020.

Next, we performed phasing of the diploid genome of C. tropicalis using SMRT-seq and 3C-seq data to identify the homolog-specific variations (Materials and methods). This analysis produced 16 nuclear contigs, which were colinear with the chromosomes of Assembly2020, except for the previously validated heterozygous translocation between Chr1 and Chr4 (Figure 1—figure supplement 5F). To characterize the sequence variations in the diploid genome of C. tropicalis, we identified the single nucleotide polymorphisms (SNPs) and insertion-deletion (indel) mutations (Materials and methods). Intriguingly, we detected a long chromosomal region depleted of SNPs and indels on the left arm of ChrR (Figure 1D). We named this region that lost heterozygosity on ChrR as LOHR. Strikingly, we found parts of the syntenic region of LOHR to be SNP and indel depleted in the C. sojae strain NCYC-2607, a closely related species of C. tropicalis, as well as in C. albicans reference strain SC5314 (Figure 1—figure supplement 6). We also identified the genome-wide distribution of transposons and simple repeats but could not detect preferential enrichment of these sequence elements at any specific genomic location in C. tropicalis (Figure 1D). Together, we demonstrate, for the first time, multiple CNVs, a long-track LOH, and evidence of a heterozygous reciprocal translocation event in the diploid genome of C. tropicalis. Possible implications of these events in conferring virulence and drug resistance in this successful human fungal pathogen remain to be explored.

Conserved principle of the spatial genome organization in C. tropicalis and C. albicans

Indirect immunofluorescence imaging of the C. tropicalis strain (CtKS102) expressing Protein-A tagged CENP-ACse4 suggested that centromeres are clustered and localized at the periphery of the DAPI-stained nuclear DNA mass as a single punctum (Figure 2A–B). We mapped 3C-seq data (Materials and methods), that were generated using DpnII, to the Assembly2020 to construct the genome-wide chromatin contact map of C. tropicalis. The resultant heatmap depicts high signal intensities along the diagonal, indicating that the intrachromosomal interactions are generally stronger than interchromosomal interactions, as observed before (Figure 2CDuan et al., 2010). However, the most striking feature of the heatmap is the presence of conspicuous puncta in the interchromosomal areas, which signify strong spatial proximity between centromeres (Figure 2C–D). The aggregate signal analysis further reiterated the enrichment of centromere-centromere interactions (Figure 2E). Strikingly, we also noted the enrichment of telomere-telomere interactions as compared to the neighboring regions (Figure 2C–E). Statistical comparison was then performed between these telomere-telomere interactions and bulk chromatin, which revealed that the interchromosomal telomeric interactions were significantly greater than the all interchromosomal interactions (Mann-Whitney U test P value = 1.129⋅10−11) (Figure 2—figure supplement 1A). On the other hand, cis interactions between the two telomeres of an individual chromosome (intrachromosomal telomeric interactions) were also significantly enhanced compared to all intrachromosomal long-range (>100 kb) interactions (Mann-Whitney U test P value = 7.374⋅10−11) (Figure 2—figure supplement 1B). All these lines of evidence prompted us to propose that C. tropicalis chromosomes adopt the Rabl-like configuration, a characteristic feature of the higher-order genome organization in yeasts (Duan et al., 2010; Descorps-Declère et al., 2015; Burrack et al., 2016).

Figure 2. Spatial genome organization reveals centromere-centromere and telomere-telomere contacts in C. tropicalis.

(A) A representative field image of C. tropicalis (strain CtKS102) cells expressing Protein-A tagged CENP-ACse4. CENP-A signals (red) were obtained using anti-Protein A antibodies by indirect immuno-fluorescence microscopy. Nuclei of the corresponding cells were stained by DAPI (blue). The images were acquired using a DeltaVision imaging system (GE) and processed using FIJI software (Schindelin et al., 2012). Scale, 2 µm. (B) A 3D reconstruction showing clustered kinetochores marked by CENP-ACse4 (red) at the periphery of the DAPI-stained nucleus (blue) using Imaris software (Oxford Instruments) in C. tropicalis. Scale, 2 µm. (C) A genome-wide contact probability heatmap (bin size = 10 kb) generated using 3C-seq data. Chromosome labels and their corresponding ideograms are shown on the axes of the heatmap. Colorbar represents the contact probability in the log2 scale. (D) Zoom in view of heatmap showing Chr4 and Chr5 from panel C (blue box). (E) Heatmaps plotted from aggregate signal analysis of matrices (bin size = 2 kb) surrounding centromere-centromere (top) or telomere-telomere interactions (bottom). Top, genomic loci containing mid-points of centromeres are aligned at the center ; bottom, genomic loci from 5′ or 3′ ends of chromosomes are aligned at the bottom right corner.

Figure 2.

Figure 2—figure supplement 1. Analysis of 3C-seq data reveals interchromosomal and intrachromosomal telomeric contacts in C. tropicalis genome.

Figure 2—figure supplement 1.

(A) Histogram of all interchromosomal interactions (excluding zero values; gray) was plotted from the 3C-seq contact probability matrix (bin size = 2 kb) of C. tropicalis. The mean value of all interchromosomal interactions is indicated by the black vertical line. A cartoon of interchromosomal telomeric interactions (dotted curves) is depicted above the histogram as the interactions between telomeres of different chromosomes. The mean value of interchromosomal telomeric interactions is indicated by the blue vertical line in the histogram. The interchromosomal telomeric interactions are significantly greater than all interchromosomal interactions (Mann-Whitney U test P value = 1.129 × 10−11). (B) Histogram of all intrachromosomal long-range (>100 kb) interactions (excluding zero values; gray) was plotted from chromosome-wide contact probability matrices (bin size = 2 kb) of C. tropicalis. A cartoon of intrachromosomal telomeric interactions (dotted curves) is depicted above the histogram as the interactions between two telomeres of the same chromosome. Inset, a cartoon depicting long-range interactions (gray blocks) and intrachromosomal telomeric interactions (blue blocks) in a chromosome-wide matrix. A long-range interaction is defined as the cis interaction between two loci separated by a distance of >100 kb. Intrachromosomal telomeric interactions are computed as cis interactions between loci whose distances to two telomeres have a sum of ≤10 kb. Note that the distances indicated in the cartoon for 100 kb and 10 kb are not drawn to scale. The mean values of all long-range and intrachromosomal telomeric interactions are indicated by black and blue vertical lines, respectively. The intrachromosomal telomeric interactions are significantly greater than all long-range interactions (Mann-Whitney U test P value = 7.374 × 10−11).

Previously, microscopic and Hi-C studies revealed similar centromere clustering and strong physical interactions among centromeres in C. albicans (Burrack et al., 2016; Sreekumar et al., 2019a; Sreekumar et al., 2019b). This study now reveals that despite substantial karyotypic changes, a conserved principle of genome organization exists in two yeast species, C. albicans and C. tropicalis, with diverged centromere features.

Centromere and telomere proximal loci are hotspots for complex translocations

Using the chromosome-level assemblies of C. tropicalis type strain MYA-3404 and C. albicans type strain SC5314 (ASM18296v3), we performed a detailed genome-wide synteny analysis employing four different approaches. We used two analytical tools, Symap (Soderlund et al., 2011) and Satsuma synteny (Grabherr et al., 2010), and a custom approach to identify the ICSBs based on the synteny of the conserved orthologs (Figure 3A). Next, we compared and validated the results obtained from our custom approach of analysis with another published tool Synchro (Drillon et al., 2014). Considering the C. albicans genome as the reference, all four methods of analyses suggest that six out of seven centromeres (except CEN6) of C. tropicalis are located proximal to multiple ICSBs (Figure 3A, Figure 3—figure supplement 1A). Although it appears that CtCEN6 escaped inter-centromeric translocations, synteny analysis suggested that a chromosomal region carrying three consecutive CtCEN6-proximal ORFs was lost in the C. albicans genome (Figure 3—figure supplement 1B). Strikingly, these ICSBs are rare at the chromosomal arms (Figure 3A). ORF-level synteny analysis further revealed that four out of seven centromeres (CEN2, CEN3, CEN5, and CENR) in C. tropicalis are precisely located at the ICSBs (Figure 3—figure supplement 1C), while multiple ICSBs are located within ~100 kb of other two centromeres (Figure 3A). Additionally, a convergence of orthoblocks from as many as four different chromosomes of C. albicans was detected within 100 kb of C. tropicalis centromeres (Figure 3B). It is important to note that by using the C. tropicalis genome as the reference, all centromeres of C. albicans, except CaCEN2, were found to be associated with ICSBs (Figure 3—figure supplement 1D). Taken together, centromeres of both these species are found to be associated with chromosomal translocations.

Figure 3. Genome-wide mapping of interchromosomal synteny breakpoints in C. tropicalis identifies a spatial cue for karyotype evolution.

(A) Scaled representation of the color-coded orthoblocks (relative to C. albicans chromosomes) and ICSBs (white lines) in C. tropicalis (Materials and methods). Orthoblocks are defined as stretches of the target genome (C. tropicalis) carrying more than two syntenic ORFs from the same chromosome of the reference genome (C. albicans). The centromeres are represented with black arrowheads. (B) Zoom in view of the C. tropicalis centromere-specific ICSBs on CEN2, CEN3, CEN5 and CENR showing the color-coded (relative to C. albicans chromosomes) ORFs flanking each centromere. C. tropicalis-specific unique ORFs proximal to CEN3 and CEN5 are shown in red. (C) A plot showing the chromosome-wise ICSB density, calculated as number of ICSBs per 100 kb of the C. tropicalis genome (y-axis), as a function of the linear distance from the centromere in nine bins. These bins are a) 0–100 kb on both sides of centromere (bin I), (b) 100–200 kb (bin II), (c) 200–300 kb (bin III), (d) 300–400 kb (bin IV), (e) 400–500 kb (bin V), (f) 500–600 kb (bin VI), (g) 600–700 kb (bin VII), (h) >700 kb to 200 kb from telomere ends (bin VIII), and i) 200 kb from the telomere ends (bin IX). Chr6 was excluded from this analysis, as it does not harbor any ICSB. (D) A violin plot comparing the distribution of lengths of orthoblocks (y-axis) at three different genomic zones: a) the centromere-proximal zone (CP), (b) the centromere-distal zone (CD), and c) telomere-proximal zone (TP). Orthoblocks, which span over more than one zone, were assigned to the zone with maximum overlap. The centromere-distal dataset was compared with the other two groups using the Mann-Whitney U test and the respective P values are mentioned. (E - F) Circos plots representing the convergence of centromere-proximal ORFs of C. tropicalis chromosomes near the centromeres (CEN4 and CEN7) of C. albicans. Chromosomes of C. tropicalis and C. albicans are marked with black and purple filled circles at the beginning of each chromosome, respectively.

Figure 3.

Figure 3—figure supplement 1. Genome-wide synteny analysis between C. albicans and C. tropicalis suggests evidence of inter-centromere translocations in the last common ancestor.

Figure 3—figure supplement 1.

(A) Synteny maps of C. tropicalis chromosomes (the lowermost line of each panel, marked by filled black circles numbered from 1 to R), with respect to C. albicans chromosomes (lines above the C. tropicalis chromosomes), in the order of Chr1 to ChrR (top to bottom) for all panels. Centromeres, black triangles. The ORFs (represented as beads) are color-coded: inverted, red and non-inverted, green. The more conserved the reciprocal best hits (RBH) are, the darker are the shades of red/green color. (B) Zoom in view of the synteny relationship between the centromere proximal ORFs of C. tropicalis Chr6 with C. albicans Chr7. (C) The zoom in view of RBH ORFs proximal to the centromeres of C. tropicalis as indicated in the figure, where each centromere is located at an ICSB. (D) A scaled representation of the color-coded orthoblocks (relative to C. tropicalis chromosomes) and ICSBs (white lines) in C. albicans (Materials and methods). Orthoblocks are defined as stretches of the target genome (C. albicans) carrying more than two syntenic ORFs from the same chromosome of the reference genome (C. tropicalis). The centromeres are represented with black arrowheads. (E) Circos plot showing the ICSBs (purple lines on the outer-most circle) on C. tropicalis chromosomes (marked with black filled circles). The centromere proximal ORFs (10 ORFs on both sides) present in C. tropicalis are connected to their homologs present on C. albicans chromosomes (marked by purple filled circles) by color-coded lines (based on their origin). The positions of centromeres are marked with black lines of the inner-most circle in each chromosome. The genomic locations in C. albicans chromosomes showing the convergence of ORFs from at least two centromere-proximal loci of C. tropicalis are marked with red (proximal to the C. albicans centromere) and purple (a non-centromere locus) triangles. Note that all centromeres of C. albicans are proximal to ORFs, homologs of which are proximal to centromeres of C. tropicalis. (F - G) Circos plots showing the convergence of centromere proximal ORFs of C. tropicalis chromosomes near the centromeres on C. albicans chromosomes 3 (CaChr3) and 7 (CaChr7), respectively. Chromosomes of C. tropicalis and C. albicans are marked with black and purple filled circles at the beginning of each chromosome, respectively.

To correlate the frequency of translocations with the spatial genome organization, we quantified ICSB density (the number of ICSBs per 100 kb of the genome) for different zones across the chromosome for all chromosomes except CtChr6 (Figure 3C). Our analysis reveals that the ICSB density is maximum at the centromere-proximal zones for all six chromosomes, but drops sharply at the chromosomal arms. However, the ICSB density near the telomere-proximal zone for Chr2, Chr4, and ChrR shows an increase compared to the chromosomal arms, albeit at a lower magnitude than centromeres. We also compared the lengths of orthoblocks across three different genomic zones - the centromere-proximal (0–300 kb from the centromere on both sides), centromere-distal (>300 kb from the centromere to 200 kb away from the telomere ends), and telomere-proximal (0–200 kb from the telomere ends) zones. This analysis further reveals that the lengths of the orthoblocks located proximal to centromeres and telomeres are significantly smaller than orthoblocks located at the centromere-/telomere-distal zones (Figure 3D).

We further probed into the consequences of strong inter-centromeric interactions, as described above. Synteny analysis across centromere-proximal regions of the two species hints that inter-centromeric translocations may have occurred in the common ancestor of C. albicans and C. tropicalis. If such is the case, the centromere-proximal ORFs of different chromosomes in C. tropicalis should have converged on the C. albicans genome. Indeed, we identified at least ten loci where a convergence of C. tropicalis ORFs from different chromosomes had taken place in C. albicans (Figure 3—figure supplement 1E). Intriguingly, we found four such loci that are proximal to the centromeres (CEN3, CEN4, CEN7, and CENR) in C. albicans (Figure 3E–F, Figure 3—figure supplement 1F–G). This observation strongly supports the possibility of inter-centromeric translocation events in the common ancestor of C. albicans and C. tropicalis. Additionally, the other four centromeres in C. albicans are located proximal to ORFs, orthologs of which are also proximal to the centromeres in C. tropicalis (Figure 3—figure supplement 1E). We posit that the ancestral HIR-associated centromeres were lost in C. albicans, and ENCs formed proximal to the ancestral centromere loci on unique DNA sequences. A similar centromere type transition within two isolates of C. parapsilosis, another species of the CUG-Ser1 clade, has been recently reported (Ola et al., 2020).

Rapid transition in the centromere type within the members of the CUG-Ser1 clade

Since multiple translocation events near centromeric regions of the C. tropicalis genome could be detected, we hypothesized that complex translocations between HIR-associated centromeres in the common ancestor of C. albicans and C. tropicalis led to the loss of HIR and the evolution of unique centromere types observed in C. albicans and C. dubliniensis. However, the genomic rearrangements are rare events, even at the evolutionary time scale. Therefore, if HIR-associated centromeres are to be the ancestral state from which unique centromeres were derived, some other closely related species should have retained HIR-associated centromeres. Indeed, we identified eight HIR-associated structures, in the reference genome of C. parapsilosis strain CDC317 (ASM18276v2). Identification of the HIR-associated structures present at the intergenic and transcription-poor regions, one each on all eight chromosomes, suggests that these loci are the putative centromeres of C. parapsilosis. Indeed, it was recently reported that all eight CENP-ACse4 enriched centromeres in the CLIB214 strain of C. parapsilosis are located at HIR-associated loci (Ola et al., 2020). Based on these lines of evidence, we conclude that the common ancestor of C. albicans and C. tropicalis possibly carried HIR-associated centromeres. Surprisingly, two centromeres in another isolate (90-137) of C. parapsilosis have been shown to be formed on non-HIR-associated loci (Ola et al., 2020). However, the driving force triggering polymorphisms in centromere locations within the same species is yet to be understood.

Although IRs are present in CEN4, CEN5, and CENR of C. albicans, these sequences are not homogenized like the HIR-associated centromeres in C. tropicalis (Figure 4A). To study the presence of HIRs in C. sojae (NCYC-2607), a sister species of C. tropicalis (Shen et al., 2018), we assembled its genome into 42 contigs, including seven chromosome-length contigs (Materials and methods). Using this assembly, we identified seven putative centromeres in C. sojae as intergenic and HIR-associated loci syntenic to the centromeres in C. tropicalis (Figure 4—figure supplement 1A–C). Each of these seven putative centromeres in C. sojae consists of a ~2 kb long CC region flanked by 3–12 kb long inverted repeats (Supplementary file 5). Using a similar approach, we identified six HIR-associated centromeres in the publicly available genome assembly (ASM332773v1) of Candida viswanathii, another species closely related to C. tropicalis (Figure 4—figure supplement 1D–E, Supplementary file 6; Tsui et al., 2008). A dot-plot analysis identified the presence of homologous sequences shared across IRs but not among the CC elements (Figure 4A) of the HIR-associated centromeres present in C. tropicalis and the putative centromeres of C. sojae and C. viswanathii (Supplementary file 7). Moreover, we detected extensive structural conservation in centromere DNA elements, especially among IRs within an individual species (Figure 4—figure supplement 2A). These structural feature of IRs are also significantly conserved across the three species, C. tropicalis, C. sojae, and C. viswanathii (Figure 4—figure supplement 2B).

Figure 4. Genome-wide analysis of centromere DNA sequences across the CUG-Ser1 clade reveals the emergence of unique centromeres from an ancestral homogenized inverted repeat-associated centromere type.

(A) A dot-plot matrix representing the sequence and structural homology among species of the CUG-Ser1 clade was generated using Gepard (Materials and methods). (B) A logo plot showing the 12-bp-long IR-motif, identified using MEME-suit (Materials and methods). (C) The distribution of IR-motif density on centromere DNA sequences and across the entire genome of each species was calculated as the number of motifs per kb of DNA (Materials and methods). Note that C. albicans and C. dubliniensis centromeres that form on unique and different DNA sequences do not contain the IR-motif. (D) IGV track images showing the IR-motif density across seven chromosomes of C. tropicalis. The location of the centromere on each chromosome is marked with a black arrowhead. (E) IGV track images showing the IR-motif distribution across seven HIR-associated centromeres of C. tropicalis.

Figure 4.

Figure 4—figure supplement 1. Identification of HIR-associated centromeres in the CUG-Ser1 clade.

Figure 4—figure supplement 1.

(A) Schematic of the method used for the identification of putative centromeres in C. sojae, C. viswanathii, and C. parapsilosis. Putative centromeric loci in these species were tested for gene synteny with C. tropicalis (for C. sojae and C. viswanathii), presence of IRs, and overlap with intergenic/ORF-free regions. (B) Genome-wide synteny of conserved ortholog pairs between C. sojae and C. tropicalis. (C) Circos plot similar to that of B, showing 10 ORFs on both sides of each centromere of C. tropicalis connected to the corresponding genomic loci carrying homologs in C. sojae. (D) Genome-wide synteny between conserved ortholog pairs between C. viswanathii and C. tropicalis. (E) Circos plot similar to that of D, showing 10 ORFs on both sides of each centromere of C. tropicalis connected to the corresponding genomic loci carrying homologs in C. viswanathii. The location of the centromere on each chromosome on subpanel B, C, D, and E is marked with a black line. Chromosomes/contigs are marked at the beginning of each chromosome/contig with colored filled circles (a to g are tig00000002, tig00000008, tig00000017, tig00000038, tig00000050, tig00016100, tig00000001, for C. sojae and h to n are NW_020797881.1, NW_020797885.1, NW_020797858.1, NW_020797886.1, NW_020797884.1, NW_020797877.1 and NW_020797878.1 for C. viswanathii). Contigs that are either <100 kb in length or do not carry putative centromeres (for C. sojae) or duplicated in the genome assembly (for C. viswanathii) were excluded from this analysis. The chromosomal coordinates of the ortholog pairs are connected using color-coded lines.
Figure 4—figure supplement 2. Inter-species conservation of centromere DNA sequences of closely related Candida species.

Figure 4—figure supplement 2.

(A) Bean plots showing the distribution of the percent sequence identity among the centromeric left repeat (LR), the central core (CC), and right repeat (RR) elements in C. tropicalis (Ct), C. sojae (Cs), C. viswanathii (Cv), and C. parapsilosis (Cp). The extent of sequence identity between all possible pairs of CC, LR, RR, and LR-RR pairs were calculated in blastn analysis using Clustal Omega. The means are depicted as a horizontal black bar on each of the bean pod. (B) Bean plots showing distribution and means (horizontal black bar on each of the bean pods) of percent sequence identity values obtained from pairwise DNA sequence alignment of all possible combinations for each of seven random loci (Random), centromeric left repeats (LR), central cores (CC) and right repeats (RR) between species-pairs as indicated, using Clustal Omega. The significance of difference between percent sequence identity of centromere elements and random loci for all three species pairs were tested using the Mann-Whitney U test (p<0.05) and the P value summary for each comparison is represented with asterisks. (C) Heatmap plot across the contigs carrying the putative centromeres showing the extent of enrichment of IR-motifs in C. sojae (contigs a-g, as described in Figure 4—figure supplement 1) and C. viswanathii (contigs h-n, as described in Figure 4—figure supplement 1). The locations of the centromeres are pointed with red arrowheads. D. IGV track images showing the extent of enrichment of IR-motif on the putative centromeres of C. sojae and C. viswanathii. (E) A representative figure showing a zoom in view of the IR-motif distribution on C. tropicalis CEN1 DNA. The motifs on the Crick strand (red) and Watson strand (blue) are color coded. (F) Heatmap showing the percent of IR-motifs present in converging and diverging orientation with respect to the central core region for each of the HIR associated putative centromeres present in C. sojae, C. tropicalis, and C. viswanathii. (G) The average number of IR-motifs per 250 bp on the IRs is plotted in the y-axis as a function of the distance from the start of CC (x-axis) for C. tropicalis (red), C. sojae (yellow), and C. viswanathii (blue). Bean plots were generated using BoxplotR.

Cloning of a full-length centromere of C. tropicalis in a replicative plasmid facilitated de novo CENP-ACse4 deposition but failed to do so when the native IRs were replaced with CaCEN5 IRs (Chatterjee et al., 2016). This result indicated DNA sequence specificity is required for centromere function in C. tropicalis. To identify the DNA sequence as a putative genetic element, we analyzed centromere DNA sequences of all three Candida species with HIR-associated centromeres and the unique centromeres of C. albicans for the presence of any conserved motif(s) (Materials and methods). This analysis identified a highly conserved 12-bp motif (dubbed as IR-motif) (Figure 4B) clustered specifically at centromeres but not anywhere else in the entire genome of C. tropicalis, C. sojae and C. viswanathii (Figure 4C–D, Figure 4—figure supplement 2C). On the contrary, the IR-motif density at centromeres in C. albicans remains approximately an order of magnitude lower than that of C. tropicalis (Figure 4C). This observation indicates a potential function of IR-motifs in the regulation of de novo CENP-ACse4 loading in C. tropicalis. Moreover, this CEN-enriched motif found at IRs is absent at central core region in C. tropicalis (Figure 4E) and at the putative centromeres in C. sojae and C. viswanathii (Figure 4—figure supplement 2D). Additionally, we noted that the direction of the IR-motif is diverging away from the central core in C. tropicalis (Figure 4—figure supplement 2E) as well as in the other two species (Figure 4—figure supplement 2F). The conserved structure and organization of the IR-motif sequences in the HIR-associated centromeres of three Candida species suggest an inter-species conserved function of the IR DNA sequence. However, the clusters of IR-motifs are located at a variable distance from CC in these species (Figure 4—figure supplement 2G). The importance of the sequence and the density of IR-motifs on the centromere function is yet to be determined.

Discussion

In this study, we improved the current genome assembly of the human fungal pathogen C. tropicalis by employing SMRT-seq, 3C-seq, and chromoblot experiments, and present Assembly2020, the first chromosome-level gapless genome assembly of this organism. We further identified three large-scale duplication events and few small-scale CNV loci in its genome, phased the diploid genome of C. tropicalis, and mapped SNPs and indels. We constructed a genome-wide chromatin contact map and identified significant centromere-centromere as well as telomere-telomere spatial interactions. Comparative genome analysis between C. albicans and C. tropicalis reveals that six out of seven centromeres of C. tropicalis are mapped precisely at or proximal to ICSBs. Strikingly, ORFs proximal to the centromeres of C. tropicalis are converged into specific regions on the C. albicans genome, suggesting that inter-centromeric translocations may have occurred in their common ancestor. Moreover, the presence of HIR-associated putative centromeres in C. sojae and C. viswanathii, like in C. tropicalis, suggests that such a centromere structure is plausibly the ancestral form in the CUG-Ser1 clade but lost both in C. albicans and C. dubliniensis. We propose that loss of such a centromere structure might have occurred during translocation events involving centromeres of homologous DNA sequences in the common ancestor, to give rise to ENCs on unique DNA sequences and facilitated speciation.

Unlike other centromeres, CEN6 of C. tropicalis did not seem to undergo inter-centromeric translocations. A closer analysis revealed that three CEN6-associated ORFs of C. tropicalis are absent in the C. albicans genome while the other flanking ORFs remain conserved. This observation can be explained by a double-stranded DNA break at the centromere followed by the fusion of broken ends resulting in the loss of those ORFs.

The availability of the chromosome-level genome assembly and improved annotations of genomic variants and genes absent in the publicly available fragmented genome assembly of C. tropicalis should greatly facilitate genome-wide association studies to understand the pathobiology of this organism including the cause of antifungal drug resistance. Besides, this study sheds light on how genetic elements required for de novo centromere establishment in an ancestral species could be lost in the derived lineages to give rise to epigenetically-regulated centromeres.

C. tropicalis is a human pathogenic ascomycete, closely related to the well-studied model fungal pathogen C. albicans (Legrand et al., 2019). These two species diverged from their common ancestor ~39 million years ago (Kumar et al., 2017) and evolved with distinct karyotypes (Chatterjee et al., 2016), having different phenotypic traits (Cavalheiro and Teixeira, 2018), and ecological niches (Pappas et al., 2018). While C. albicans remains the primary cause of candidiasis worldwide, systemic ICU-acquired candidiasis is primarily (30.5–41.6%) caused by C. tropicalis in tropical countries including India (Chakrabarti et al., 2015), Pakistan (Farooqi et al., 2013), and Brazil (da Costa et al., 2014). Moreover, the occurrence of drug resistance, particularly multidrug resistance, in C. tropicalis is on the rise (Chakrabarti et al., 2015; Xiao et al., 2015; Gonçalves et al., 2016). Therefore, relatively less-studied C. tropicalis is emerging as a major threat for nosocomial candidemia with 29–72% broad spectrum mortality rate (Lamoth et al., 2018). Fluconazole resistance in C. albicans can be gained due to segmental aneuploidy of Chr5 containing long IRs at the centromere, by the formation of isochromosomes (Selmecki et al., 2006), which was also identified in Chr4 with IRs at its centromere (Todd et al., 2019). All seven centromeres in C. tropicalis are associated with long IRs with the potential to form isochromosomes.

Since the mechanism of homology search during HR is positively influenced by spatial proximity and the extent of DNA sequence homology (Agmon et al., 2013; Seeber et al., 2018), at least in the engineered model systems, it is expected that spatially clustered homologous DNA sequences undergo more translocation events than other loci. Although these factors were not shown to be involved in karyotypic rearrangements during speciation, a retrospective survey in light of spatial proximity and homology now offers a better explanation. For example, the bipolar to the tetrapolar transition of the mating type locus in the Cryptococcus species complex was associated with inter-centromeric recombination following pericentric inversion (Sun et al., 2017). Similar inter-centromeric recombination has been reported in the common ancestor of two fission yeast species, Schizosaccharomyces cryophilus and Schizosaccharomyces octosporus (Tong et al., 2019). These examples raise an intriguing notion that centromeres serve as sites of recombination, which may lead to centromere loss and/or the emergence of ENCs. This notion is supported by the fact that DSBs at centromeres following fusion of the acentric fragments to other chromosomes led to chromosome number reduction in Ashbya species (Gordon et al., 2011) and Malassezia species (Sankaranarayanan et al., 2020). Genomic instability at the centromere can also lead to fluconazole resistance, as in the case of isochromosome formation on Chr5 of C. albicans (Selmecki et al., 2006). Additionally, breaks at the centromeres were reported to be associated with cancers in humans (Barra and Fachinetti, 2018).

What would be the consequence of the spatial proximity of chromosomal regions with high DNA sequence homology in other domains of life? interchromosomal contacts between chromosome pairs have been correlated with the number of translocation events in both naturally occurring populations and experimentally induced mammalian cells (Arsuaga et al., 2004; Bickmore and Teague, 2002; Branco and Pombo, 2006; Canela et al., 2017; Engreitz et al., 2012; Hlatky et al., 2002; Holley et al., 2002; Klein et al., 2011; Roukos et al., 2013; Zhang et al., 2012). It has been suggested that contacts between various chromosomal territories as well as their relative positions in the nucleus influence the sites and frequency of translocation events both in flies and mammals (Engreitz et al., 2012; Aten et al., 2004; Foster et al., 2013; Savage, 1998; Savage, 2000; Meaburn, 2016). While centromeres remained clustered either throughout the cell cycle or most parts of it in many fungal species, such is not the case in metazoan cells. Nevertheless, one of the well-studied translocation events, Robertsonian translocation (RT) involving fusion between arms of two different chromosomes near a centromere, is the most frequently detected chromosomal abnormality in humans (Therman et al., 1989). The occurrence of RT was first reported in grasshoppers (Robertson, 1916) and subsequently it has been implicated in the karyotype evolution in humans (Therman et al., 1989), mice (Castiglia and Capanna, 2002; Dumas and Britton-Davidian, 2002), and wheat (Friebe et al., 2005). Moreover, RTs cause sterility in humans (Guichaoua et al., 1990), often linked with the heterogeneity of carcinomas (Hermsen et al., 2005), and implicated in genetic disorders (Mattei et al., 1984). Intriguingly, cytological and Hi-C based evidence (Imakaev et al., 2012) of spatial proximity (reviewed in Muller et al., 2019) among the repeat-associated centromere DNA sequences (Kalitsis et al., 2006) in these species supports a possibility that RTs may have been guided by spatial proximity. Similarly, chromoplexy, involving a series of translocation events among multiple chromosomes without alterations in the copy number, was identified in prostate cancers (Zhang et al., 2013; Baca et al., 2013). Although fine mapping of translocation events at the repetitive regions in human cancer cells is challenging, the growing evidence that such events are associated with the formation of micronuclei (Crasta et al., 2012) supports the idea that the spatial genome organization may influence chromoplexy as well (Meaburn et al., 2007).

The identification of HIR-associated putative centromeres in C. parapsilosis, C. sojae, and C. viswanathii supports the idea that the unique centromeres might have evolved from an ancestral HIR-associated centromere (Coughlan et al., 2016; Figure 5A). While HIR-associated centromeres of C. tropicalis, C. sojae, and C. viswanathii form on different DNA sequences, a well-conserved IR-motif was identified in this study that is present in multiple copies on the centromeric IR sequences across these three species. Some centromeres in C. albicans carry chromosome-specific IRs but lack IR-motifs. Besides, CaCEN5 IRs could not functionally complement the centromere function in C. tropicalis for the de novo CENP-ACse4 recruitment. This indicates a possible role of the conserved IR-motifs on species-specific centromere function (Chatterjee et al., 2016). Therefore, the loss of HIR-associated centromeres in C. albicans that are only epigenetically propagated (Baum et al., 2006) clearly shows how the ability of de novo establishment of kinetochore assembly in an ancestral lineage can be lost in a derived lineage. However, the mechanism through which IR-motifs may regulate centromere identity remains to be explored.

Figure 5. The spatial genome organization remained conserved in the CUG-Ser1 clade despite centromere type diversity.

Figure 5.

(A) A maximum likelihood-based phylogenetic tree of closely related CUG-Ser1 species analyzed in this study. The centromere structure of each species is shown and drawn to scale. (B) A model showing possible events during the loss of HIR-associated centromeres and emergence of the unique centromere type through inter-centromeric translocations possibly occurred in the common ancestor of C. tropicalis and C. albicans. The model is drawn to show translocation events involving two C. tropicalis chromosomes (CtChr3 and CtChr4) as representatives, which can be mapped proximal to the centromere on C. albicans ChrR (CaChrR) as shown in Figure 3F. (C) Rabl-like chromosomal conformation is maintained despite inter-centromeric translocations that facilitated centromere type transition.

Loss of HIR-associated centromeres during inter-centromeric translocations or MIR must have been catastrophic for the cell, and the survivor was obligated to activate another centromere at an alternative locus. How is such a location determined? Artificial removal of a native centromere in C. albicans leads to the activation of a neocentromere (Thakur and Sanyal, 2013; Ketel et al., 2009), which then becomes part of the centromere cluster (Burrack et al., 2016). This evidence supports the existence of a spatial determinant, known as the CENP-A cloud or CENP-A-rich zone (Thakur and Sanyal, 2013; Fukagawa and Earnshaw, 2014), influencing the preferential formation of neocentromere at loci proximal to the native centromere (Thakur and Sanyal, 2013; Scott and Sullivan, 2014). We found that the unique and different centromeres of C. albicans are located proximal to the ORFs, which are also proximal to the centromeres in C. tropicalis. This observation indicates that the formation of the new centromeres in C. albicans may have been influenced by spatial proximity to the ancestral centromere cluster. However, new centromeres of C. albicans are formed on loci with completely unique and different DNA sequences. Similar to centromeres of C. albicans, centromere repositioning events may lead to the formation of ENCs, which are often associated with speciation in mammals (Rocchi et al., 2012; Stanyon et al., 2008). It was found that the location of one centromere in horse varies across individuals (Wade et al., 2009; Purgato et al., 2015). Although, there are cases where ENCs formed without genomic rearrangements, the driving force facilitating centromere relocation was proposed to be associated with chromosomal inversion and translocation in certain cases (Schubert, 2018). Because of these reasons, it may be logical to consider the centromeres of C. albicans as ENCs (Figure 5B). Intriguingly, even after the catastrophic chromosomal rearrangements, the ENCs in C. albicans remain clustered similar to C. tropicalis (Figure 5C). This observation identifies spatial clustering of centromeres as a matter of cardinal importance for the fungal genome organization.

Materials and methods

Media

C. tropicalis and C. sojae strains (Supplementary file 8) used in this study were grown in non-selective YPDU (2% dextrose, 2% peptone, 1% yeast extract, and 0.01% uracil), and incubated at 30°C at 180 rpm. For growing C. albicans strains, YPD media was supplemented with 0.1 mg/mL of uridine. The transformation of C. tropicalis was performed as described previously (Chatterjee et al., 2016). The selection of transformants was based on prototrophy for the metabolic markers used. In the case of selection for the antibiotic marker (CaSAT1), conferring nourseothricin (NTC) resistance, growth media was supplemented with 100 μg/mL NTC (NTC; Werner Bioagents, CAS No. 96736-11-7). Recycling of the CaSAT1 marker was done by growing the NTC resistant strains in YPMU (4% maltose, 2% peptone, 1% yeast extract, and 0.01% uracil) and segregants which are NTC sensitive were selected by patching them on YPDU and YPDU supplemented with NTC. For counter selection against CaURA3, the 5-Fluoroorotic Acid (5-FOA; Sigma-Aldrich, CAS No. 207291-81-4) was used at 1 mg/mL concentration. The strains, primers, and plasmids used in this study are listed in Supplementary files 8, 9, and 10, respectively.

Pulsed-field gel electrophoresis

C. tropicalis strain MYA-3404 and C. albicans strain SC5314 were grown until the exponential phase (~2 × 107 cells/mL). Cells were washed with 50 mM EDTA and counted with a hemocytometer. Approximately 6 × 108 cells were used for the preparation of 1 mL genomic DNA plugs. The plugs were made according to the instruction manual protocol (Bio-Rad, Cat No. 170–3593) with CleanCut Agarose (0.6%) and the lyticase enzyme provided by the kit. A 0.6% pulsed field certified agarose gel was prepared using 0.5x TBE buffer (0.1 M Tris, 0.09 M Boric acid, 0.01 M EDTA, pH 8.0) and PFGE was performed on contour-clamped homogeneous electric field (CHEF) system using CHEF-DR II (Bio-Rad) module. The running conditions used were as follows: block-I at 100–200 s for 24 hr at 4.5 V/cm/120°, block-II at 200–400 s for 48 hr at 2.5 V/cm/120°, block-III at 600–800 s for 120 hr at 2.5 V/cm/120°. The gel was stained with ethidium bromide (EtBr) and analyzed by Quantity One software (Bio-Rad).

Indirect immunofluorescence microscopy

Subcellular localization of Protein-A tagged CENP-ACse4 with DAPI (4′,6-diamidino-2-phenylindole) stained nuclear mass was performed in C. tropicalis strain CtKS102 following the method described previously for C. albicans (Sanyal and Carbon, 2002). Asynchronously grown C. tropicalis cells were fixed with the 1/10th volume of formaldehyde (37%) for 1 hr at room temperature. Antibodies used were diluted as follows: 1:1000 for rabbit anti-Protein A (Sigma, Cat No. P3775). The dilutions for secondary antibodies used were Alexa flour 568 goat anti-rabbit IgG (Invitrogen, Cat No. A11011) 1:1000. Antibody dilutions were prepared in 5% skimmed milk (HiMedia, Cat No. GRM1254) solution in 1x phosphate buffered saline (PBS) pH 7.4 (137 mM NaCl, 2.7 mM KCl, 10 mM Na2HPO4, 1.8 mM KH2PO4).

Preparation of high molecular weight genomic DNA

Briefly, 50 OD600 equivalent (1 OD600 = ~2 × 107 cells) cells were collected, washed with chilled 50 mM EDTA pH 8.0 and flash-frozen with liquid nitrogen. Next, the cell pellet was lyophilized. Then a volume equivalent to 5 mL of glass beads was added to the tube and vortexed till the pellet turns powdery. Then 20 mL Cetyltrimethyl ammonium bromide (CTAB) extraction buffer (100 mM Tris-HCl pH 7.5, 0.7 M NaCl, 10 mM EDTA, 1% CTAB powder, 1% 2-Mercaptoethanol) was added, and the tube was incubated at 65°C for ~30 min with occasional mixing by inverting the tube. Subsequently the tube was chilled on ice for 10 min, and the supernatant was transferred into another tube. An equal volume of chloroform was mixed with the supernatant gently inverting for 5 to 10 min. The mix was then centrifuged at 3200 rpm for 10 min, and the aqueous phase was carefully pipetted out using cut tips to a fresh tube. An equal volume of isopropanol was added into the supernatant and mixed gently until white thread-like structures appeared. The mix was incubated at −20°C for 1 hr and centrifuged at 3200 rpm for 10 min to pellet the DNA. The pellet was washed twice with freshly prepared 70% ethanol and air-dried. The dried pellet was dissolved in 1 mL of 1x TE containing RNase A to a final concentration of 100 µg/mL and incubated at 37°C for 30 to 45 min. Sodium acetate solution was added into the mix to a final concentration of 0.5 M, and the solution was transferred to several 1.5 mL tubes in the aliquots of 0.4 mL each. An equal volume of isopropanol was added to each tube, mixed gently, and centrifuged at 13,000 rpm for 15 min. The supernatant was decanted, and the DNA pellet was washed with 70% ethanol. The pellet was air-dried and finally dissolved in 200 µL of 1x TE buffer. The quality of the isolated DNA was determined by performing PFGE analysis (switching time 1–25 s, at 5.8 V/cm/120° for 24 hr, 1% agarose gel) on CHEF-DR II module (Bio-Rad).

Oxford Nanopore sequencing of C. sojae strain NCYC-2607

High molecular weight genomic DNA was isolated from yeast cells, and the average length of the DNA fragments of the genomic DNA was checked on a CHEF gel using a CHEF-DR II system (Bio-Rad). Next, the DNA sample was quantified by NanoDrop (ND-1000 Spectrophotometer, NanoDrop Technologies) and Qubit 3 fluorometer (Thermo Fisher Scientific) using dsDNA HS assay kit (Thermo Fisher Scientific, Cat No. Q33230). An appropriate amount of DNA was taken forward for library preparation as per the manufacturer’s instructions using reagents included in SQK-LSK109 and EXP-NBD103/EXP-NBD104 kits. DNA samples were then pooled together on a single R9 flow-cell, and sequenced by the MinION system (Oxford Nanopore Technologies). The fragmentation step was skipped to retain the longer fragments. The raw reads were taken forward for base calling using Guppy version 3.1.5. A total of 92320 reads containing 530421800 bp were generated.

Illumina sequencing of C. sojae strain NCYC-2607

DNA was quantified by Qubit 3 fluorometer (Thermo Fisher Scientific) using a dsDNA HS assay kit (Thermo Fisher Scientific, Cat No. Q33230). Approximately 100 ng of intact DNA was enzymatically fragmented by targeting 250–500 bp fragment size. The DNA fragments with overhangs resulting from fragmentation were end-filled. The 3’ to 5’ exonuclease activity of end-repair mix removed the 3’ overhangs, and polymerase activity filled in the 5’ overhangs. To the blunt-ended fragments, adenylation was performed by adding a single ‘A’ nucleotide to the 3’ ends. To the adenylated fragments, loop adapters were ligated and cleaved with uracil-specific excision reagent enzyme. The sample was further purified using AMPure XP beads (Beckman Coulter, Cat No. A63880), and DNA was then enriched by PCR with six cycles using NEBNext Ultra II Q5 master mix (NEB, Cat No. M0544S), Illumina universal primers, and sample-specific indexed Illumina primers. The amplified products were cleaned up by using AMPure XP beads, and the final DNA library was eluted in 15 µL of 0.1x TE buffer. One µL of the library was used to quantify the DNA concentration by Qubit 3 fluorometer using the dsDNA HS reagent. The fragment analysis was performed on Agilent 2100 Bioanalyzer (Agilent, Model G2939B), by loading 1 µL of the library into Agilent DNA 7500 chip. In this experiment, we generated 3501768 paired-end reads of 2 × 301 bp length.

De novo genome assembly of C. sojae strain NCYC-2607

A total of 92320 reads containing 530421800 bp were used for the construction of a de novo assembly using Canu (Koren et al., 2017). Canu was run using default parameters in the trimming and the correction mode with ‘-genomeSize < 15 m>’, which produced the genome assembly of C. sojae in 42 contigs. Next, to rectify the base-pair level errors, we performed five rounds of polishing of the contigs using Illumina reads with Pilon (Walker et al., 2014).

SMRT sequencing on PacBio sequel system

The genomic DNA fragments of ~20 kb length were size-selected and taken forward for library preparation using SMRTbell Template Prep Kit (Part No. 100-259-100). PacBio sequencing of the C. tropicalis MYA-3404 genome was performed by Sequel SMRT Cell 1M (Part No. 101-008-000) using Sequel Binding Kit 2.0 (Part no. 100-862-200) and SMRT Link version 5.0.1.9585. This run generated 996041 reads with an average read length of 5.8 kb.

Construction of assembly B

Gepard (Krumsiek et al., 2007) was used to generate dot matrix plots and identify areas of overlap between supercontigs. Supercontigs whose ends overlapped were identified and their sequences merged. Whole genome Illumina sequencing data of the C. tropicalis strain MYA-3404 were used to verify these predictions. We submitted the reads to NCBI under the BioProject accession number PRJNA604451.

Construction of the de novo SMRT assembly and contig stitching using SMIS

The de novo SMRT assembly using 996041 PacBio raw reads was generated using Canu 1.6 (94). The program was run in the trimming and correction mode with the ‘-pacbio-raw<input.fastq>’ option that produced 135 contigs. For stitching the contigs from Assembly B using the PacBio raw reads, we used Single Molecular Integrative Scaffolding (Ning, 2014) with the default options, to get a 12-contig assembly (Assembly C). Details of the assemblies produced by Canu and SMIS are presented in Supplementary file 3.

Filling N-gaps

The de novo SMRT contigs were used to fill the existing N-gaps in Assembly A. We used 500 bases upstream, and downstream regions of the N-gaps as queries against a custom BLAST (Altschul et al., 1990) database generated using Geneious software from the de novo assembled contigs and filled these N-gaps upon the mapping of upstream and downstream query sequences on the same contig with 100% coverage and more than 95% identity. Using this approach, we filled 78 out of 104 gaps leaving 26 gaps on seven chromosomes (Supplementary file 2, Figure 1—figure supplement 3A). We suspected that the remaining gaps were repetitive regions in the genome as immediate flanking regions identified multiple hits. To avoid this, we used a second strategy in which we used a 1 kb query sequence from either 10 kb upstream or downstream region of the N-gap, and performed a BLAST analysis against the de novo contigs generated using FALCON (Chin et al., 2016). All the remaining 26 gaps could be filled using this strategy (Supplementary file 2, Figure 1—figure supplement 3B). Further, to validate our claim, we confirmed the mapping of the Illumina and PacBio reads over the newly filled sequence.

Assembly of sub-telomeric regions

To assemble the sub-telomere regions, we performed a BLAST search using the terminal 5000 bp sequence of each chromosome as queries against the de novo SMRT contigs and identified the contigs containing the 23 bp telomeric repeats specific for C. tropicalis (5′-TGATCGTGACATCCTTACACCAA-3′) as reported previously (Butler et al., 2009). Schematic of the sub-telomere scaffolding has been shown (Figure 1—figure supplement 3C).

Mapping of the orphan haplotigs using the de novo SMRT assembly

Canu is a diploid-aware genome assembler (Koren et al., 2017), which generates two contigs from a heterozygous locus. Therefore, we used the Canu generated contigs (SMRT assembly) to map the orphan haplotigs as heterozygous regions of the genome (see Figure 1—figure supplement 1H). Heterozygosity of the orphan haplotigs was demonstrated by the Illumina read coverage (Figure 1—figure supplement 2B). For this analysis, the 3C-seq reads were mapped on the OHs and a control locus of Chr1 using Bowtie2 (Langmead and Salzberg, 2012). The number of mapped reads were counted using the bamCoverage utility from deepTools2 (Ramírez et al., 2016) and plotted using boxplotR (Spitzer et al., 2014).

Pilon polishing of the genome assembly

The final telomere-to-telomere assembled chromosomes were polished through Pilon (Walker et al., 2014) using the Illumina reads obtained from the 3C-seq experiment. Pilon corrected base-pair level assembly errors and validated 99.5–99.8% bases of the seven chromosomes. The polishing step was repeated six times when the improvement stalled.

Construction of aneuploids for confirmation of heterozygosity of the OHs

We constructed C. tropicalis strains monosomic for Chr5 and used them to demonstrate that loss of one homolog of Chr5 leads to loss of one of the two alleles of the orphan contigs: contig14 and contig16, that are mapped on Chr5. Since the sch9 mutants in C. albicans were viable but lost chromosomes at a significantly higher rate than the wild-type (Varshney et al., 2015), we adopted the same strategy to delete both copies of SCH9 homologs in C. tropicalis. Next, a reporter strain was created in this sch9 mutant strain background of C. tropicalis to assay for loss of a Chr5 homolog. These strains (2 n-1) that lacked one homolog of Chr5 were used to confirm the presence of heterozygosity of orphan haplotigs (OHs) of CtChr5.

A. Deletion of SCH9 in C. tropicalis

The SCH9 homolog in C. tropicalis was identified in a BLAST search using CaSCH9 as the query sequence against the C. tropicalis proteome. A putative homolog of SCH9 was located on Chr1:1994521–1996662 and encoded by the Crick strand. A deletion cassette (pKG1) for double homologous recombination-mediated deletion of SCH9 ORF was constructed by cloning upstream and downstream homology regions in pSFS2a plasmid (Reuss et al., 2004). This construct was transformed into CtKS102 for the deletion of both copies of SCH9 ORF by recycling the CaSAT1 marker after the deletion of the first copy of SCH9 gene. Independent colonies of the sch9/sch9 null mutant strain (CtKG001) were confirmed using Southern hybridization (Figure 1—figure supplement 2C–D). Primers used in this study are mentioned in Supplementary file 9.

B. Construction of a reporter strain for construction of strains with Chr5 monosomy by integration of URA3 on Chr5

Upstream and downstream homology regions of the target intergenic locus (Chr5_497_kb) in Chr5 were amplified, and cloned into pBSCaURA3 plasmid (Chatterjee et al., 2016) to construct pKG2 (Supplementary file 10). This cassette was released by restriction digestion with BamHI and ApaI and transformed into the sch9 mutant strain CtKG001 to construct the reporter strain CtKG002. Similarly, we integrated CaURA3 into the target intergenic locus (Chr5_497_kb) of CtKS102 to create a control strain CtKG003. In both the strains (CtKG002 and CtKG003) the short arm (5’ end) of one of the two homologs is marked with CaURA3 marker and the long arm (3’ end) carries the heterozygous MTL locus (MTLa or MTLα) with two distinct alleles present on two homologs. Concomitant loss of one of the MTL alleles together with CaURA3 marker would indicate loss of one homolog of Chr5.

C. Isolation and confirmation of the 2 n-1 aneuploids for Chr5

Different cell numbers (105, 104, 103, and 102) of the reporter strain (CtKG002) and the wild-type control strain (CtKG003) were plated on complete media (CM) + 5-FOA and incubated for 48–72 hr at 30°C. Multiple FOAR colonies appeared for CtKG002 strain but no colonies appeared for the control strain CtKG003. The colonies were then patched on YPDU and CM-URA plates for confirmation of the loss of the CaURA3 marker. Next, PCR was performed to confirm the loss of one of the MTL loci (MTLa or MTLα) in these colonies using a multiplex PCR strategy described previously (Figure 1—figure supplement 2G; Porman et al., 2011).

Library preparation and sequencing of the library DNA for chromosome conformation capture (3C-seq)

Wild-type C. tropicalis strain MYA-3404 was cultured in non-selective YPDU media and 500 OD600 equivalent cells were harvested for crosslinking. The cells were cross-linked with formaldehyde to a final concentration of 1.5% for 10 min and the cross-linking reaction was quenched by adding glycine to a final concentration of 400 mM. The crosslinked cells were centrifuged and the cell pellet was stored at −80°C till further use.

For making the 3C library of C. tropicalis, the cross-linked cell pellet was first resuspended in 5 mL of ice-cold 1x NEBuffer DpnII (50 mM Bis-Tris-HCl, 100 mM NaCl, 10 mM MgCl2, 1 mM DTT; pH 6 @ 25°C) and then lysed by liquid nitrogen grinding in a chilled mortar using a pestle to a fine powder. The powdered sample was scraped using a spatula into a pre-chilled tube and resuspended in 15 mL cold 1x NEBuffer DpnII. Cell lysate containing ~3 × 108 cells (4 mL) was processed for 3C library preparation. This lysate was centrifuged and the pellet was resuspended in 1.5 mL of cold 1x NEBuffer DpnII and then aliquoted equally into four 1.5 mL microcentrifuge tubes. Next, the chromatin was solubilized by adding SDS to a final concentration of 0.1% in each microcentrifuge tube and the sample was incubated at 65°C for exactly 10 min. The reaction was quenched by adding 45 µL of 10% Triton X-100 per tube with gentle mixing. Chromatin was then digested with 750 units of DpnII (NEB, Cat No. R0543M; 50,000 units/mL) per tube and incubated at 37°C overnight with gentle agitation (300 rpm) on a heating block. Next day, the restriction enzyme was heat-inactivated at 65°C for 20 min. The digested chromatin fraction in each tube was ligated with 50 U of T4 DNA ligase (Invitrogen Cat No.15224090; 1 U/µL) at 16°C for 6 hr in a diluted condition (reaction volume 8 mL) to favor intra-molecular ligation of cross-linked restriction fragments. Reverse cross-linking was performed by adding 100 µL of 10 mg/mL Proteinase K (Invitrogen Cat No.25530031) per tube and incubating at 65°C overnight. Next, DNA, which constitutes the 3C library, was purified using conventional phenol-chloroform extraction and concentrated using Amicon Ultra-0.5 mL 30K centrifugal filters. About 1 µg of 3C library was used for size selection using Agencourt AMPure XP beads (Beckman Coulter) to select DNA fragments of 500–700 bp in length. The paired-end NGS library was prepared using NEBNext Ultra II kit, and sequencing was carried out using the Illumina HiSeq 2500 2 × 101 bp platform by a third party service provider.

3C-seq data analysis

FASTQ files containing ~75 million 2 × 101 bp paired-end 3C-seq reads were initially processed using hiclib package (http://mirnylab.bitbucket.org/hiclib/) (Imakaev et al., 2012). The resultant genome-wide chromatin interaction matrix was converted to a contact probability matrix. Codes associated with the downstream analysis could be found at the Github repository (https://github.com/Yao-Chen/candida-tropicalis-analysis); copy archived at https://github.com/elifesciences-publications/candida-tropicalis-analysisChen, 2020).

A. Mapping of reads and generation of the contact probability matrix

First, two sides of paired-end reads were separated and iteratively aligned to Assembly2020 using Bowtie2 (Langmead and Salzberg, 2012) with the ‘--very-sensitive’ option. The iteration started from the first 20 bases of each read and continued with an increment of 5 bases in the subsequent iteration. Next, the aligned read pairs whose both sides had MAPQ score ≥1 were processed through the fragment filter, where self-circles, dangling ends, extra-dangling ends (maximum molecular length = 500), error pairs and PCR duplicates were excluded from downstream analysis. A genome-wide interaction matrix was generated using the remaining unique valid pairs (bin size = 2–10 kb). The bin filter removed bins with <50% sequence information in the reference genome and 1% bins with low read coverage. The matrix was iteratively corrected for biases and eventually converted to a contact probability matrix where the sum of each row/column approximates 1.

B. Aggregate signal analysis

Sub-matrices for all combinations of centromere-centromere interactions across different chromosomes were extracted from the genome-wide contact probability matrix. Genomic loci containing mid-points of centromeres were first aligned and then all the sub-matrices were stacked on top of each other and averaged. Similar analysis was performed for all telomere-telomere interactions (both intra and interchromosomal telomeric interactions) where sub-matrices for telomere-telomere interactions across all chromosomes were extracted, stacked and averaged.

C. Analysis of telomeric interactions

To investigate interchromosomal telomeric interactions, a histogram of all interchromosomal interactions (excluding zero values) were generated (bulk chromatin). Mean contact probabilities of all interchromosomal interactions as well as all interchromosomal telomeric interactions were computed, where 5’ and 3’ end bins on each chromosome were taken as telomeric bins. Similarly, a histogram of all intrachromosomal long-range interactions (excluding zero values) was generated (bulk chromatin), where long-range interactions were defined as interactions between loci separated at a distance of >100 kb. Intrachromosomal telomeric interactions were taken as interactions between two loci that were close to two telomeres of an individual chromosome respectively (sum of distances between each locus to the nearest telomere is ≤10 kb). Mann-Whitney U test was used to compare interchromosomal or intrachromosomal telomeric interactions to the bulk chromatin.

D. Contig scaffolding

3C-seq reads were aligned to contig sequences and contact probability matrix was generated as described above. The 3C profile of a bin was plotted using values in a single row from the contact probability matrix. It is well-established that contact frequency generally shows a distance-dependent decay (Dekker et al., 2002). Therefore, the connectivity between two contigs can be inferred by investigating the contact probabilities between the terminal bin of a contig and loci on the other contig.

Identification of SNPs, indels and CNVs

A. Detection of SNPs and indels

The SNPs and indels were identified using GATK software (McKenna et al., 2010) with the paired-end Illumina reads obtained from the 3C-seq experiment in a 12 cores Ubuntu 16.4 system with 96 GB memory. Briefly, the 3C-seq reads were mapped to Assembly2020 using Bwa-mem (Li, 2013) paired-end alignment mode following sorting of the resulting SAM file with Picard (https://broadinstitute.github.io/picard/), SAM to BAM conversion using SAMtools (Li et al., 2009), and duplicate marking using ‘MarkDuplicates’ utility of Picard. Next, we used GenomeAnalysisTK.jar (version 3.8.0) to call the variants with ‘-ploidy 2’ option, SNPs were extracted, filtered with ‘--filterExpression 'QD <2.0 || FS >60.0 || MQ <40.0 || MQRankSum <−12.5 || ReadPosRankSum <−8.0 || SOR > 4.0' --filterName ‘basic_snp_filter’’ option following base quality score recalibration. Similarly indels were extracted, filtered with ‘--filterExpression 'QD <2.0 || FS >200.0 || ReadPosRankSum <−20.0 || SOR > 10.0' --filterName ‘basic_indel_filter’’ option following base quality score recalibration. The data tracks were visualized using IGV (Robinson et al., 2011) and presented using Circa software.

B. Read coverage plot and CNV detection

To generate a genome-wide read coverage plot, the 3C-seq reads were mapped to Assembly2020 using Bowtie2 (Langmead and Salzberg, 2012) paired-end alignment mode with ‘--end-to-end’ and ‘--very-sensitive’ option. The resultant SAM file was converted to BAM format and sorted using SAMtools (Li et al., 2009). Next, the mapped reads were counted using deepTools2 (Ramírez et al., 2016) bamCoverage utility with the BPM normalization method, and the resulting BED file was used for downstream calculations or visualization in IGV (Robinson et al., 2011). To detect CNVs throughout the C. tropicalis genome, the sorted BAM file, generated from coverage analysis above, was processed by ‘CNAtraLite’ option of CNAtra tool (https://github.com/AISKhalil/CNAtra) (Khalil et al., 2020). where the MAPQ filter was disabled. The read depth signal (bin size = 1 kb) and the estimated copy numbers were then plotted for each chromosome by ‘CNVsTrackPlot’ function of ‘CNAtraLite’. In this analysis, regions whose estimated copy numbers are <1.5 or>2.5 are considered as CNVs.

Haplotype analysis

The FALCON, FALCON-Unzip (Chin et al., 2016), and FALCON-Phase (Kronenberg, 2018) from the pb-assembly suite were run locally in a 12 core Ubuntu 16.4 system with 96 GB memory according to the instruction provided (https://github.com/PacificBiosciences/pb-assembly) (Chin et al., 2016Wenger et al., 2019) The configuration files used for running FALCON, FALCON-Unzip and FALCON-Phase will be available upon request. Briefly, FALCON was run using modified fcrun.cfg with the input option ‘pa_DBdust_option = true’, and ‘pa_fasta_filter_option = streamed-internal-median’. Next, data partitioning was performed with ‘pa_DBsplit_option=-x500 -s100’ and ‘ovlp_DBsplit_option=-x500 -s100’, repeat masking was performed using ‘pa_HPCTANmask_option = -k18 -h480 -w8 -e.8 -s100’, ‘pa_HPCREPmask_option = -k18 -h480 -w8 -e.8 -s100’, and ‘pa_REPmask_code = 0,300;0,300;0,300’ options. Preassembly was generated using the following parameters: ‘genome_size = 15000000’, ‘seed_coverage = 20’, ‘length_cutoff = 100’, ‘pa_HPCdaligner_option=-v -B128 -M24’, ‘pa_daligner_option=-e.8 -l1000 -k18 -h480 -w8 -s100 -T10’, ‘falcon_sense_option=--output-multi --min-idt 0.70 --min-cov 2 --max-n-read 1800’, ‘falcon_sense_greedy = False’. Next, Pread overlapping was performed using ‘ovlp_daligner_option=-e.96 -l1000 -k24 -h1024 -w6 -s100’, and ‘ovlp_HPCdaligner_option=-v -B128 -M24’. Next, the final assembly was generated using ‘overlap_filtering_setting=--max-diff 100 --max-cov 100 --min-cov 2’ and ‘length_cutoff_pr = 500’. Next, phasing of haplotypes was performed using FALCON-Unzip and FALCON-Phase as described (https://github.com/PacificBiosciences/pb-assembly) (Chin et al., 2016Wenger et al., 2019).

Assessment of the genome assembly completeness using BUSCO

BUSCO (Simão et al., 2015) version 3.0.2 was run against ascomycota_odb9 database using the following script: python./scripts/run_BUSCO.py -i genome.fasta -o BUSCO_output -l/Path_to_llineage_dir/ -m genome -c 1 -sp candida_tropicalis.

Synteny analysis

Genome-wide synteny analysis was performed using Symap (Soderlund et al., 2011) with the parameters as (a) Min. dots 3 (minimum number of anchors required to define a synteny block), (b) top N 2 (retain the top N hits for every sequence region, as well as all hits with score at least 80% of the Nth), (c) BLAT args: ‘-minScore = 30 -minIdentity = 70 -tileSize = 10 -qMask = lower -maxIntron = 10000’. The Satsuma synteny and Synchro software were run using default parameters. For the custom approach to map the interchromosomal synteny breakpoints (ICSBs), first, the single-copy orthologs were identified using OrthoFinder (Emms and Kelly, 2015), then the corresponding genomic coordinates of the ortholog pairs were sorted and the ICSBs were identified. For comparing the FALCON generated contigs with the Assembly2020 chromosomes, the dot-plot between the two assemblies was generated using the default options of Symap version 4.2.

Identification of the putative centromeres in the members of CUG-Ser1 clade

The putative centromeres of C. sojae and C. viswanathii were identified as HIR-associated intergenic regions syntenic to centromeres of C. tropicalis centromeres. Briefly, the genomic loci in C. sojae and C. viswanathii, which are syntenic to the centromeres of C. tropicalis were scanned for the presence of inverted repeats falling in ORF-free regions using YASS (Noé and Kucherov, 2005) with the default parameters. Pair-wise alignments between seven random genomic loci of ~10 kb length, LR, CC, or RR DNA elements were performed using Clustal Omega (Sievers and Higgins, 2014). Synteny dot-plot analysis for centromere DNA sequences including the flanking ORF-free region in C. albicans, C. dublininensis and the HIR sequences of C. tropicalis, C. sojae, C. viswanathii, and C. parapsilosis was generated using Gepard (Krumsiek et al., 2007) by running it in the simple mode with default parameters. The IR sequences from centromeres of C. tropicalis and the putative centromeres of C. sojae and C. viswanathii were analysed to identify the presence of conserved motifs using motif discovery tool MEME following the default parameters with ‘ZOOPS: zero or one site per sequence’ as the motif site distribution algorithm, and maximum motif width set to 12 bp. Next, we scanned for the presence of IR-motifs across the chromosomes including centromere DNA and flanking ORF-free regions in C. albicans, C. dubliniensis, and putative centromeres of C. parapsilosis using FIMO with default parameters (Bailey et al., 2009).

Construction of the phylogenetic tree

The publicly available genomes and the protein fasta files (when available) of C. albicans (ASM254v2), C. dubliniensis (ASM2694v1), C. viswanathii (ASM332773v1), and C. parapsilosis (ASM18276v2) were downloaded from NCBI database. The protein fasta files for C. tropicalis and C. sojae were generated using Augustus ab initio protein prediction software and the python script getAnnoFasta.py (Stanke and Morgenstern, 2005). Because of the partially diploid nature of C. viswanathii genome assembly, the duplicated contigs, that carried >100 kb of DNA sequence on another contig, were identified from dot-plot analysis (self) using Symap (Soderlund et al., 2011), and excluded from analysis. The protein fasta files were then used as input files for running OrthoFinder V2.3.1 (112). OrthoFinder was run using the default parameters except the -M msa option for the construction of maximum-likelihood trees using MAFFT (Katoh et al., 2002) and FastTree (Price et al., 2010). The tree topology was visualized using Evolview (Subramanian et al., 2019).

Data access

All sequencing data of C. tropicalis and C. sojae reported in this study have been submitted to NCBI under the BioProject accession numbers PRJNA596050 and PRJNA604451. The sequences of seven chromosomes of C. tropicalis in Assembly2020 (GCA_013177555.1 ASM1317755v1) are available through GenBank accession numbers CP047869-CP047875. The contig sequences of C. sojae genome assembly (GCA_013177575.1 ASM1317757v1) are available through GenBank accession number WWPN00000000.

Acknowledgements

We thank all the members of KS laboratory and AS laboratory for stimulating discussions and critical reading of the manuscript. We acknowledge S Sun and J Heitman for helping with SMRT-seq of C. tropicalis at the PacBio sequencing facility at Duke University. We also thank AIS Khalil for helping with CNAtra software for CNV analysis. Illumina sequencing experiments for the C. sojae genome were performed at Clevergene Biocorp, Bangalore, India. We also thank B Suma for confocal microscopy, JNCASR. KG acknowledges Shyama Prasad Mukherjee Fellowship from Council of Scientific and Industrial Research (CSIR), Govt. of India [07/733 (0181)/2013-EMR-I] and financial assistance from JNCASR. This project is supported by a grant (BT/PR27490/Med/29/1323/2018) from the Department of Biotechnology (DBT), Govt. of India to KS. KS acknowledges TATA innovation fellowship (BT/HRD/35/01/03/2017) and Department of Biotechnology grant in Life Science Research, Education and Training at JNCASR (BT/INF/22/SP27679/2018). Intramural funding from JNCASR is acknowledged. This work is also supported by Nanyang Technological University’s Nanyang Assistant Professorship grant and Singapore Ministry of Education Academic Research Fund Tier 1 grant [RG39/18] to AS.

Funding Statement

The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.

Contributor Information

Amartya Sanyal, Email: asanyal@ntu.edu.sg.

Kaustuv Sanyal, Email: sanyal@jncasr.ac.in.

Job Dekker, University of Massachusetts Medical School, United States.

Kevin Struhl, Harvard Medical School, United States.

Funding Information

This paper was supported by the following grants:

  • Council of Scientific and Industrial Research Shyama Prasad Mukherjee Fellowship 07/733(0181)/2013-EMR-I to Krishnendu Guin.

  • Department of Biotechnology, Ministry of Science and Technology BT/PR27490/Med/29/1323/2018 to Kaustuv Sanyal.

  • Ministry of Education - Singapore RG39/18 to Amartya Sanyal.

  • Department of Biotechnology, Ministry of Science and Technology BT/INF/22/SP27679/2018 to Kaustuv Sanyal.

  • Department of Biotechnology , Ministry of Science and Technology BT/HRD/35/01/03/2017 to Kaustuv Sanyal.

  • Nanyang Technological University Nanyang Assistant Professorship grant to Amartya Sanyal.

Additional information

Competing interests

No competing interests declared.

Author contributions

Conceptualization, Formal analysis, Writing - original draft, N-gap filling, Pilon polishing, haplotype analysis, genome-wide synteny analyses, identification of the putative HIR-associated centromeres and motif analysis, chromoblot analysis, Southern blotting, and subcellular localization experiments, Oxford Nanopore library preparation for C. sojae and generated its genome assembly.

Conceptualization, Formal analysis, Validation, Writing - original draft.

Constructed aneuploid strains.

Performed the 3C-seq library preparation.

Performed SMIS and Canu run and scaffolding of telomeres.

Performed scaffolding of C. tropicalis genome in 16 contigs (Assembly B).

Performed scaffolding of C. tropicalis genome in 16 contigs (Assembly B).

Conceptualization, Supervision, Writing - original draft, Writing - review and editing.

Conceptualization, Supervision, Writing - original draft, Writing - review and editing.

Additional files

Source data 1. Source_data_combined.
elife-58556-data1.xlsx (378.5KB, xlsx)
Supplementary file 1. Assembly C with 12 contigs.
elife-58556-supp1.pptx (38.3KB, pptx)
Supplementary file 2. Assembly of sub-telomeres and filling up N-gaps in the genome assembly of C. tropicalis using de contigs.
elife-58556-supp2.pptx (37.1KB, pptx)
Supplementary file 3. Statistics for different versions of genome the assembly of C. tropicalis (MYA-3404) generated in this study.
elife-58556-supp3.pptx (41.8KB, pptx)
Supplementary file 4. A comparative analysis of Assembly A and the improved Assembly2020 of C. tropicalis.
elife-58556-supp4.pptx (39.7KB, pptx)
Supplementary file 5. Features of centromere DNA elements in C. sojae.
elife-58556-supp5.pptx (39.3KB, pptx)
Supplementary file 6. Features of centromere DNA elements in C. viswanathii.
elife-58556-supp6.pptx (36.3KB, pptx)
Supplementary file 7. Centromere coordinates used for identifying conserved DNA sequence motifs in Candida species.
elife-58556-supp7.pptx (49.6KB, pptx)
Supplementary file 8. List of strains used in this study.
elife-58556-supp8.pptx (40KB, pptx)
Supplementary file 9. List of primers used in this study.
elife-58556-supp9.pptx (52.8KB, pptx)
Supplementary file 10. List of plasmids used in this study.
elife-58556-supp10.pptx (34.6KB, pptx)
Transparent reporting form

Data availability

All sequencing data reported in the study and the genome assembly of C. tropicalis and C. sojae have been submitted to NCBI under the BioProject accession numbers PRJNA596050 and PRJNA604451.

The following datasets were generated:

Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Candida tropicalis and Candida sojae. NCBI BioProject. PRJNA596050

Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Whole genome sequencing of Candida tropicalis isolates. NCBI BioProject. PRJNA604451

References

  1. Agmon N, Liefshitz B, Zimmer C, Fabre E, Kupiec M. Effect of nuclear architecture on the efficiency of double-strand break repair. Nature Cell Biology. 2013;15:694–699. doi: 10.1038/ncb2745. [DOI] [PubMed] [Google Scholar]
  2. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. Journal of Molecular Biology. 1990;215:403–410. doi: 10.1016/S0022-2836(05)80360-2. [DOI] [PubMed] [Google Scholar]
  3. Arsuaga J, Greulich-Bode KM, Vazquez M, Bruckner M, Hahnfeldt P, Brenner DJ, Sachs R, Hlatky L. Chromosome spatial clustering inferred from radiogenic aberrations. International Journal of Radiation Biology. 2004;80:507–515. doi: 10.1080/09553000410001723857. [DOI] [PubMed] [Google Scholar]
  4. Aten JA, Stap J, Krawczyk PM, van Oven CH, Hoebe RA, Essers J, Kanaar R. Dynamics of DNA double-strand breaks revealed by clustering of damaged chromosome domains. Science. 2004;303:92–95. doi: 10.1126/science.1088845. [DOI] [PubMed] [Google Scholar]
  5. Baca SC, Prandi D, Lawrence MS, Mosquera JM, Romanel A, Drier Y, Park K, Kitabayashi N, MacDonald TY, Ghandi M, Van Allen E, Kryukov GV, Sboner A, Theurillat JP, Soong TD, Nickerson E, Auclair D, Tewari A, Beltran H, Onofrio RC, Boysen G, Guiducci C, Barbieri CE, Cibulskis K, Sivachenko A, Carter SL, Saksena G, Voet D, Ramos AH, Winckler W, Cipicchio M, Ardlie K, Kantoff PW, Berger MF, Gabriel SB, Golub TR, Meyerson M, Lander ES, Elemento O, Getz G, Demichelis F, Rubin MA, Garraway LA. Punctuated evolution of prostate Cancer genomes. Cell. 2013;153:666–677. doi: 10.1016/j.cell.2013.03.021. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Ren J, Li WW, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Research. 2009;37:W202–W208. doi: 10.1093/nar/gkp335. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Barra V, Fachinetti D. The dark side of centromeres: types, causes and consequences of structural abnormalities implicating centromeric DNA. Nature Communications. 2018;9:4340. doi: 10.1038/s41467-018-06545-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Baum M, Sanyal K, Mishra PK, Thaler N, Carbon J. Formation of functional centromeric chromatin is specified epigenetically in candida albicans. PNAS. 2006;103:14877–14882. doi: 10.1073/pnas.0606958103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Bickmore WA, Teague P. Influences of chromosome size, gene density and nuclear position on the frequency of constitutional translocations in the human population. Chromosome Research : An International Journal on the Molecular, Supramolecular and Evolutionary Aspects of Chromosome Biology. 2002;10:707–715. doi: 10.1023/a:1021589031769. [DOI] [PubMed] [Google Scholar]
  10. Branco MR, Pombo A. Intermingling of chromosome territories in interphase suggests role in Translocations and transcription-dependent associations. PLOS Biology. 2006;4:e138. doi: 10.1371/journal.pbio.0040138. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Burgess SM, Kleckner N. Collisions between yeast chromosomal loci in vivo are governed by three layers of organization. Genes & Development. 1999;13:1871–1883. doi: 10.1101/gad.13.14.1871. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Burrack LS, Hutton HF, Matter KJ, Clancey SA, Liachko I, Plemmons AE, Saha A, Power EA, Turman B, Thevandavakkam MA, Ay F, Dunham MJ, Berman J. Neocentromeres provide chromosome segregation accuracy and centromere clustering to multiple loci along a Candida albicans Chromosome. PLOS Genetics. 2016;12:e1006317. doi: 10.1371/journal.pgen.1006317. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. Butler G, Rasmussen MD, Lin MF, Santos MA, Sakthikumar S, Munro CA, Rheinbay E, Grabherr M, Forche A, Reedy JL, Agrafioti I, Arnaud MB, Bates S, Brown AJ, Brunke S, Costanzo MC, Fitzpatrick DA, de Groot PW, Harris D, Hoyer LL, Hube B, Klis FM, Kodira C, Lennard N, Logue ME, Martin R, Neiman AM, Nikolaou E, Quail MA, Quinn J, Santos MC, Schmitzberger FF, Sherlock G, Shah P, Silverstein KA, Skrzypek MS, Soll D, Staggs R, Stansfield I, Stumpf MP, Sudbery PE, Srikantha T, Zeng Q, Berman J, Berriman M, Heitman J, Gow NA, Lorenz MC, Birren BW, Kellis M, Cuomo CA. Evolution of pathogenicity and sexual reproduction in eight candida genomes. Nature. 2009;459:657–662. doi: 10.1038/nature08064. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Canela A, Maman Y, Jung S, Wong N, Callen E, Day A, Kieffer-Kwon KR, Pekowska A, Zhang H, Rao SSP, Huang SC, Mckinnon PJ, Aplan PD, Pommier Y, Aiden EL, Casellas R, Nussenzweig A. Genome organization drives chromosome fragility. Cell. 2017;170:507–521. doi: 10.1016/j.cell.2017.06.034. [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Castiglia R, Capanna E. Chiasma repatterning across a chromosomal hybrid zone between chromosomal races of Mus musculus domesticus. Genetica. 2002;114:35–40. doi: 10.1023/a:1014626330022. [DOI] [PubMed] [Google Scholar]
  16. Cavalheiro M, Teixeira MC. Candida biofilms: threats, challenges, and promising strategies. Frontiers in Medicine. 2018;5:28. doi: 10.3389/fmed.2018.00028. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Chakrabarti A, Sood P, Rudramurthy SM, Chen S, Kaur H, Capoor M, Chhina D, Rao R, Eshwara VK, Xess I, Kindo AJ, Umabala P, Savio J, Patel A, Ray U, Mohan S, Iyer R, Chander J, Arora A, Sardana R, Roy I, Appalaraju B, Sharma A, Shetty A, Khanna N, Marak R, Biswas S, Das S, Harish BN, Joshi S, Mendiratta D. Incidence, characteristics and outcome of ICU-acquired candidemia in India. Intensive Care Medicine. 2015;41:285–295. doi: 10.1007/s00134-014-3603-2. [DOI] [PubMed] [Google Scholar]
  18. Chatterjee G, Sankaranarayanan SR, Guin K, Thattikota Y, Padmanabhan S, Siddharthan R, Sanyal K. Repeat-Associated fission Yeast-Like regional centromeres in the ascomycetous budding yeast Candida tropicalis. PLOS Genetics. 2016;12:e1005839. doi: 10.1371/journal.pgen.1005839. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Chen Y. Analysis of Candida tropicalis 3C-seq data. 16b9fa2GitHub. 2020 https://github.com/Yao-Chen/candida-tropicalis-analysis
  20. Chin CS, Peluso P, Sedlazeck FJ, Nattestad M, Concepcion GT, Clum A, Dunn C, O'Malley R, Figueroa-Balderas R, Morales-Cruz A, Cramer GR, Delledonne M, Luo C, Ecker JR, Cantu D, Rank DR, Schatz MC. Phased diploid genome assembly with single-molecule real-time sequencing. Nature Methods. 2016;13:1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Clarke L, Baum MP. Functional analysis of a centromere from fission yeast: a role for centromere-specific repeated DNA sequences. Molecular and Cellular Biology. 1990;10:1863–1872. doi: 10.1128/MCB.10.5.1863. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Clarke L, Carbon J. Isolation of a yeast centromere and construction of functional small circular chromosomes. Nature. 1980;287:504–509. doi: 10.1038/287504a0. [DOI] [PubMed] [Google Scholar]
  23. Coughlan AY, Hanson SJ, Byrne KP, Wolfe KH. Centromeres of the yeast Komagataella phaffii (Pichia pastoris) Have a simple Inverted-Repeat structure. Genome Biology and Evolution. 2016;8:2482–2492. doi: 10.1093/gbe/evw178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  24. Crasta K, Ganem NJ, Dagher R, Lantermann AB, Ivanova EV, Pan Y, Nezi L, Protopopov A, Chowdhury D, Pellman D. DNA breaks and chromosome pulverization from errors in mitosis. Nature. 2012;482:53–58. doi: 10.1038/nature10802. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. da Costa VG, Quesada RM, Abe AT, Furlaneto-Maia L, Furlaneto MC. Nosocomial bloodstream candida infections in a tertiary-care hospital in south Brazil: a 4-year survey. Mycopathologia. 2014;178:243–250. doi: 10.1007/s11046-014-9791-z. [DOI] [PubMed] [Google Scholar]
  26. Dekker J, Rippe K, Dekker M, Kleckner N. Capturing chromosome conformation. Science. 2002;295:1306–1311. doi: 10.1126/science.1067799. [DOI] [PubMed] [Google Scholar]
  27. Descorps-Declère S, Saguez C, Cournac A, Marbouty M, Rolland T, Ma L, Bouchier C, Moszer I, Dujon B, Koszul R, Richard GF. Genome-wide replication landscape of candida glabrata. BMC Biology. 2015;13:69. doi: 10.1186/s12915-015-0177-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  28. Drillon G, Carbone A, Fischer G. SynChro: a fast and easy tool to reconstruct and visualize synteny blocks along eukaryotic chromosomes. PLOS ONE. 2014;9:e92621. doi: 10.1371/journal.pone.0092621. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Duan Z, Andronescu M, Schutz K, McIlwain S, Kim YJ, Lee C, Shendure J, Fields S, Blau CA, Noble WS. A three-dimensional model of the yeast genome. Nature. 2010;465:363–367. doi: 10.1038/nature08973. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Dumas D, Britton-Davidian J. Chromosomal rearrangements and evolution of recombination: comparison of chiasma distribution patterns in standard and robertsonian populations of the house mouse. Genetics. 2002;162:1355–1366. doi: 10.1093/genetics/162.3.1355. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Ekwall K. Epigenetic control of centromere behavior. Annual Review of Genetics. 2007;41:63–81. doi: 10.1146/annurev.genet.41.110306.130127. [DOI] [PubMed] [Google Scholar]
  32. Emms DM, Kelly S. OrthoFinder: solving fundamental biases in whole genome comparisons dramatically improves orthogroup inference accuracy. Genome Biology. 2015;16:157. doi: 10.1186/s13059-015-0721-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Engreitz JM, Agarwala V, Mirny LA. Three-dimensional genome architecture influences partner selection for chromosomal translocations in human disease. PLOS ONE. 2012;7:e44196. doi: 10.1371/journal.pone.0044196. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Farooqi JQ, Jabeen K, Saeed N, Iqbal N, Malik B, Lockhart SR, Zafar A, Brandt ME, Hasan R. Invasive candidiasis in Pakistan: clinical characteristics, species distribution and antifungal susceptibility. Journal of Medical Microbiology. 2013;62:259–268. doi: 10.1099/jmm.0.048785-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Foster HA, Estrada-Girona G, Themis M, Garimberti E, Hill MA, Bridger JM, Anderson RM. Relative proximity of chromosome territories influences chromosome exchange partners in radiation-induced chromosome rearrangements in primary human bronchial epithelial cells. Mutation Research/Genetic Toxicology and Environmental Mutagenesis. 2013;756:66–77. doi: 10.1016/j.mrgentox.2013.06.003. [DOI] [PubMed] [Google Scholar]
  36. Friebe B, Zhang P, Linc G, Gill BS. Robertsonian translocations in wheat arise by centric misdivision of univalents at Anaphase I and rejoining of broken centromeres during interkinesis of meiosis II. Cytogenetic and Genome Research. 2005;109:293–297. doi: 10.1159/000082412. [DOI] [PubMed] [Google Scholar]
  37. Fukagawa T, Earnshaw WC. Neocentromeres. Current Biology. 2014;24:R946–R947. doi: 10.1016/j.cub.2014.08.032. [DOI] [PubMed] [Google Scholar]
  38. Gonçalves SS, Souza ACR, Chowdhary A, Meis JF, Colombo AL. Epidemiology and molecular mechanisms of antifungal resistance in Candida and Aspergillus. Mycoses. 2016;59:198–219. doi: 10.1111/myc.12469. [DOI] [PubMed] [Google Scholar]
  39. Gordon JL, Byrne KP, Wolfe KH. Mechanisms of chromosome number evolution in yeast. PLOS Genetics. 2011;7:e1002190. doi: 10.1371/journal.pgen.1002190. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Grabherr MG, Russell P, Meyer M, Mauceli E, Alföldi J, Di Palma F, Lindblad-Toh K. Genome-wide synteny through highly sensitive sequence alignment: satsuma. Bioinformatics. 2010;26:1145–1151. doi: 10.1093/bioinformatics/btq102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  41. Guichaoua MR, Quack B, Speed RM, Noel B, Chandley AC, Luciani JM. Infertility in human males with autosomal translocations: meiotic study of a 14;22 robertsonian translocation. Human Genetics. 1990;86:162–166. doi: 10.1007/BF00197698. [DOI] [PubMed] [Google Scholar]
  42. Hermsen M, Snijders A, Guervós MA, Taenzer S, Koerner U, Baak J, Pinkel D, Albertson D, van Diest P, Meijer G, Schrock E. Centromeric chromosomal translocations show tissue-specific differences between squamous cell carcinomas and adenocarcinomas. Oncogene. 2005;24:1571–1579. doi: 10.1038/sj.onc.1208294. [DOI] [PubMed] [Google Scholar]
  43. Hlatky L, Sachs RK, Vazquez M, Cornforth MN. Radiation-induced chromosome aberrations: insights gained from biophysical modeling. BioEssays. 2002;24:714–723. doi: 10.1002/bies.10126. [DOI] [PubMed] [Google Scholar]
  44. Holley WR, Mian IS, Park SJ, Rydberg B, Chatterjee A. A model for interphase chromosomes and evaluation of radiation-induced aberrations. Radiation Research. 2002;158:568–580. doi: 10.1667/0033-7587(2002)158[0568:AMFICA]2.0.CO;2. [DOI] [PubMed] [Google Scholar]
  45. Imakaev M, Fudenberg G, McCord RP, Naumova N, Goloborodko A, Lajoie BR, Dekker J, Mirny LA. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nature Methods. 2012;9:999–1003. doi: 10.1038/nmeth.2148. [DOI] [PMC free article] [PubMed] [Google Scholar]
  46. Jones T, Federspiel NA, Chibana H, Dungan J, Kalman S, Magee BB, Newport G, Thorstenson YR, Agabian N, Magee PT, Davis RW, Scherer S. The diploid genome sequence of candida albicans. PNAS. 2004;101:7329–7334. doi: 10.1073/pnas.0401648101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  47. Kalitsis P, Griffiths B, Choo KH. Mouse telocentric sequences reveal a high rate of homogenization and possible role in robertsonian translocation. PNAS. 2006;103:8786–8791. doi: 10.1073/pnas.0600250103. [DOI] [PMC free article] [PubMed] [Google Scholar]
  48. Kapoor S, Zhu L, Froyd C, Liu T, Rusche LN. Regional centromeres in the yeast candida lusitaniae lack pericentromeric heterochromatin. PNAS. 2015;112:12139–12144. doi: 10.1073/pnas.1508749112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Katoh K, Misawa K, Kuma K, Miyata T. MAFFT: a novel method for rapid multiple sequence alignment based on fast fourier transform. Nucleic Acids Research. 2002;30:3059–3066. doi: 10.1093/nar/gkf436. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Ketel C, Wang HS, McClellan M, Bouchonville K, Selmecki A, Lahav T, Gerami-Nejad M, Berman J. Neocentromeres form efficiently at multiple possible loci in Candida albicans. PLOS Genetics. 2009;5:e1000400. doi: 10.1371/journal.pgen.1000400. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Khalil AIS, Khyriem C, Chattopadhyay A, Sanyal A. Hierarchical discovery of large-scale and focal copy number alterations in low-coverage Cancer genomes. BMC Bioinformatics. 2020;21:147. doi: 10.1186/s12859-020-3480-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Klein IA, Resch W, Jankovic M, Oliveira T, Yamane A, Nakahashi H, Di Virgilio M, Bothmer A, Nussenzweig A, Robbiani DF, Casellas R, Nussenzweig MC. Translocation-capture sequencing reveals the extent and nature of chromosomal rearrangements in B lymphocytes. Cell. 2011;147:95–106. doi: 10.1016/j.cell.2011.07.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Kobayashi N, Suzuki Y, Schoenfeld LW, Müller CA, Nieduszynski C, Wolfe KH, Tanaka TU. Discovery of an unconventional centromere in budding yeast redefines evolution of point centromeres. Current Biology. 2015;25:2026–2033. doi: 10.1016/j.cub.2015.06.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Koren S, Walenz BP, Berlin K, Miller JR, Bergman NH, Phillippy AM. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Research. 2017;27:722–736. doi: 10.1101/gr.215087.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Kronenberg ZN. FALCON-Phase: integrating PacBio and Hi-C data for phased diploid genomes. bioRxiv. 2018 doi: 10.1101/327064. [DOI]
  56. Krumsiek J, Arnold R, Rattei T. Gepard: a rapid and sensitive tool for creating dotplots on genome scale. Bioinformatics. 2007;23:1026–1028. doi: 10.1093/bioinformatics/btm039. [DOI] [PubMed] [Google Scholar]
  57. Kumar S, Stecher G, Suleski M, Hedges SB. TimeTree: a resource for timelines, timetrees, and divergence times. Molecular Biology and Evolution. 2017;34:1812–1819. doi: 10.1093/molbev/msx116. [DOI] [PubMed] [Google Scholar]
  58. Lamoth F, Lockhart SR, Berkow EL, Calandra T. Changes in the epidemiological landscape of invasive candidiasis. Journal of Antimicrobial Chemotherapy. 2018;73:i4–i13. doi: 10.1093/jac/dkx444. [DOI] [PubMed] [Google Scholar]
  59. Langmead B, Salzberg SL. Fast gapped-read alignment with bowtie 2. Nature Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Lee CS, Wang RW, Chang HH, Capurso D, Segal MR, Haber JE. Chromosome position determines the success of double-strand break repair. PNAS. 2016;113:E146–E154. doi: 10.1073/pnas.1523660113. [DOI] [PMC free article] [PubMed] [Google Scholar]
  61. Legrand M, Jaitly P, Feri A, d'Enfert C, Sanyal K. Candida Albicans: An Emerging Yeast Model to Study Eukaryotic Genome Plasticity. Trends in Genetics. 2019;35:292–307. doi: 10.1016/j.tig.2019.01.005. [DOI] [PubMed] [Google Scholar]
  62. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup The sequence alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
  63. Li H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM.  arXiv. 2013 https://arxiv.org/abs/1303.3997
  64. Mahtani MM, Willard HF. Pulsed-field gel analysis of alpha-satellite DNA at the human X chromosome centromere: high-frequency polymorphisms and array size estimate. Genomics. 1990;7:607–613. doi: 10.1016/0888-7543(90)90206-A. [DOI] [PubMed] [Google Scholar]
  65. Mattei MG, Souiah N, Mattei JF. Chromosome 15 anomalies and the Prader-Willi syndrome: cytogenetic analysis. Human Genetics. 1984;66:313–334. doi: 10.1007/BF00287636. [DOI] [PubMed] [Google Scholar]
  66. McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The genome analysis toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Research. 2010;20:1297–1303. doi: 10.1101/gr.107524.110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  67. Meaburn KJ, Misteli T, Soutoglou E. Spatial genome organization in the formation of chromosomal translocations. Seminars in Cancer Biology. 2007;17:80–90. doi: 10.1016/j.semcancer.2006.10.008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  68. Meaburn KJ. Spatial genome organization and its emerging role as a potential diagnosis tool. Frontiers in Genetics. 2016;7:134. doi: 10.3389/fgene.2016.00134. [DOI] [PMC free article] [PubMed] [Google Scholar]
  69. Meraldi P, McAinsh A, Rheinbay E, Sorger P. Phylogenetic and structural analysis of centromeric DNA and kinetochore proteins. Genome Biology. 2006;7:R23. doi: 10.1186/gb-2006-7-3-r23. [DOI] [PMC free article] [PubMed] [Google Scholar]
  70. Muller H, Gil J, Drinnenberg IA. The Impact of Centromeres on Spatial Genome Architecture. Trends in Genetics. 2019;35:565–578. doi: 10.1016/j.tig.2019.05.003. [DOI] [PubMed] [Google Scholar]
  71. Navarro-Mendoza MI, Pérez-Arques C, Panchal S, Nicolás FE, Mondo SJ, Ganguly P, Pangilinan J, Grigoriev IV, Heitman J, Sanyal K, Garre V. Early diverging fungus Mucor circinelloides Lacks Centromeric Histone CENP-A and Displays a Mosaic of Point and Regional Centromeres. Current Biology. 2019;29:3791–3802. doi: 10.1016/j.cub.2019.09.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
  72. Ning Z. SMIS (Single Molecular Integrative Scaffolding): an assembly pipeline to improve scaffolds using Oxford Nanopore or PacBio long reads. GitHub. 2014 https://github.com/wtsi-hpag/smis
  73. Noé L, Kucherov G. YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research. 2005;33:W540–W543. doi: 10.1093/nar/gki478. [DOI] [PMC free article] [PubMed] [Google Scholar]
  74. Ola M, O'Brien CE, Coughlan AY, Ma Q, Donovan PD, Wolfe KH, Butler G. Polymorphic centromere locations in the pathogenic yeast candida parapsilosis. Genome Research. 2020;30:684–696. doi: 10.1101/gr.257816.119. [DOI] [PMC free article] [PubMed] [Google Scholar]
  75. Padmanabhan S, Thakur J, Siddharthan R, Sanyal K. Rapid evolution of Cse4p-rich centromeric DNA sequences in closely related pathogenic yeasts, candida Albicans and Candida dubliniensis. PNAS. 2008;105:19797–19802. doi: 10.1073/pnas.0809770105. [DOI] [PMC free article] [PubMed] [Google Scholar]
  76. Pappas PG, Lionakis MS, Arendrup MC, Ostrosky-Zeichner L, Kullberg BJ. Invasive candidiasis. Nature Reviews Disease Primers. 2018;4:18026. doi: 10.1038/nrdp.2018.26. [DOI] [PubMed] [Google Scholar]
  77. Piazza A, Wright WD, Heyer WD. Multi-invasions are recombination byproducts that induce chromosomal rearrangements. Cell. 2017;170:760–773. doi: 10.1016/j.cell.2017.06.052. [DOI] [PMC free article] [PubMed] [Google Scholar]
  78. Porman AM, Alby K, Hirakawa MP, Bennett RJ. Discovery of a phenotypic switch regulating sexual mating in the opportunistic fungal pathogen candida tropicalis. PNAS. 2011;108:21158–21163. doi: 10.1073/pnas.1112076109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  79. Price MN, Dehal PS, Arkin AP. FastTree 2 – approximately maximum-likelihood trees for large alignments. PLOS ONE. 2010;5:e9490. doi: 10.1371/journal.pone.0009490. [DOI] [PMC free article] [PubMed] [Google Scholar]
  80. Purgato S, Belloni E, Piras FM, Zoli M, Badiale C, Cerutti F, Mazzagatti A, Perini G, Della Valle G, Nergadze SG, Sullivan KF, Raimondi E, Rocchi M, Giulotto E. Centromere sliding on a mammalian chromosome. Chromosoma. 2015;124:277–287. doi: 10.1007/s00412-014-0493-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Ramírez F, Ryan DP, Grüning B, Bhardwaj V, Kilpert F, Richter AS, Heyne S, Dündar F, Manke T. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Research. 2016;44:W160–W165. doi: 10.1093/nar/gkw257. [DOI] [PMC free article] [PubMed] [Google Scholar]
  82. Reuss O, Vik A, Kolter R, Morschhäuser J. The SAT1 flipper, an optimized tool for gene disruption in Candida Albicans. Gene. 2004;341:119–127. doi: 10.1016/j.gene.2004.06.021. [DOI] [PubMed] [Google Scholar]
  83. Robertson WRB. Chromosome studies. I. taxonomic relationships shown in the chromosomes of tettigidae and Acrididae: v-shaped chromosomes and their significance in Acrididae, locustidae, and Gryllidae: chromosomes and variation. Journal of Morphology. 1916;27:179–331. doi: 10.1002/jmor.1050270202. [DOI] [Google Scholar]
  84. Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nature Biotechnology. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
  85. Rocchi M, Archidiacono N, Schempp W, Capozzi O, Stanyon R. Centromere repositioning in mammals. Heredity. 2012;108:59–67. doi: 10.1038/hdy.2011.101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Roukos V, Burman B, Misteli T. The cellular etiology of chromosome translocations. Current Opinion in Cell Biology. 2013;25:357–364. doi: 10.1016/j.ceb.2013.02.015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  87. Sankaranarayanan SR, Ianiri G, Coelho MA, Reza MH, Thimmappa BC, Ganguly P, Vadnala RN, Sun S, Siddharthan R, Tellgren-Roth C, Dawson TL, Heitman J, Sanyal K. Loss of centromere function drives karyotype evolution in closely related Malassezia species. eLife. 2020;9:e53944. doi: 10.7554/eLife.53944. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Sanyal K, Baum M, Carbon J. Centromeric DNA sequences in the pathogenic yeast candida albicans are all different and unique. PNAS. 2004;101:11374–11379. doi: 10.1073/pnas.0404318101. [DOI] [PMC free article] [PubMed] [Google Scholar]
  89. Sanyal K, Carbon J. The CENP-A homolog CaCse4p in the pathogenic yeast Candida Albicans is a centromere protein essential for chromosome transmission. PNAS. 2002;99:12969–12974. doi: 10.1073/pnas.162488299. [DOI] [PMC free article] [PubMed] [Google Scholar]
  90. Savage JRK. A brief survey of aberration origin theories. Mutation Research/Fundamental and Molecular Mechanisms of Mutagenesis. 1998;404:139–147. doi: 10.1016/S0027-5107(98)00107-9. [DOI] [PubMed] [Google Scholar]
  91. Savage JR. Proximity matters. Science. 2000;290:62–63. doi: 10.1126/science.290.5489.62. [DOI] [PubMed] [Google Scholar]
  92. Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M, Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B, Tinevez J-Y, White DJ, Hartenstein V, Eliceiri K, Tomancak P, Cardona A. Fiji: an open-source platform for biological-image analysis. Nature Methods. 2012;9:676–682. doi: 10.1038/nmeth.2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  93. Schubert I. What is behind "centromere repositioning"? Chromosoma. 2018;127:229–234. doi: 10.1007/s00412-018-0672-y. [DOI] [PubMed] [Google Scholar]
  94. Scott KC, Sullivan BA. Neocentromeres: a place for everything and everything in its place. Trends in Genetics. 2014;30:66–74. doi: 10.1016/j.tig.2013.11.003. [DOI] [PMC free article] [PubMed] [Google Scholar]
  95. Searle JB. Speciation, chromosomes, and genomes. Genome Research. 1998;8:1–3. doi: 10.1101/gr.8.1.1. [DOI] [PubMed] [Google Scholar]
  96. Seeber A, Hauer MH, Gasser SM. Chromosome dynamics in response to DNA damage. Annual Review of Genetics. 2018;52:295–319. doi: 10.1146/annurev-genet-120417-031334. [DOI] [PubMed] [Google Scholar]
  97. Selmecki A, Forche A, Berman J. Aneuploidy and isochromosome formation in drug-resistant candida albicans. Science. 2006;313:367–370. doi: 10.1126/science.1128242. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Sexton T, Yaffe E, Kenigsberg E, Bantignies F, Leblanc B, Hoichman M, Parrinello H, Tanay A, Cavalli G. Three-dimensional folding and functional organization principles of the Drosophila genome. Cell. 2012;148:458–472. doi: 10.1016/j.cell.2012.01.010. [DOI] [PubMed] [Google Scholar]
  99. Shen XX, Opulente DA, Kominek J, Zhou X, Steenwyk JL, Buh KV, Haase MAB, Wisecaver JH, Wang M, Doering DT, Boudouris JT, Schneider RM, Langdon QK, Ohkuma M, Endoh R, Takashima M, Manabe RI, Čadež N, Libkind D, Rosa CA, DeVirgilio J, Hulfachor AB, Groenewald M, Kurtzman CP, Hittinger CT, Rokas A. Tempo and mode of genome evolution in the budding yeast subphylum. Cell. 2018;175:1533–1545. doi: 10.1016/j.cell.2018.10.023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  100. Sievers F, Higgins DG. Clustal Omega, accurate alignment of very large numbers of sequences. Methods Mol Bio. 2014;1079:105–116. doi: 10.1007/978-1-62703-646-7_6. [DOI] [PubMed] [Google Scholar]
  101. Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
  102. Soderlund C, Bomhoff M, Nelson WM. SyMAP v3.4: a turnkey synteny system with application to plant genomes. Nucleic Acids Research. 2011;39:e68. doi: 10.1093/nar/gkr123. [DOI] [PMC free article] [PubMed] [Google Scholar]
  103. Spitzer M, Wildenhain J, Rappsilber J, Tyers M. BoxPlotR: a web tool for generation of box plots. Nature Methods. 2014;11:121–122. doi: 10.1038/nmeth.2811. [DOI] [PMC free article] [PubMed] [Google Scholar]
  104. Sreekumar L, Jaitly P, Chen Y, Thimmappa BC, Sanyal A, Sanyal K. Cis- and Trans-chromosomal Interactions Define Pericentric Boundaries in the Absence of Conventional Heterochromatin. Genetics. 2019a;212:1121–1132. doi: 10.1534/genetics.119.302179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  105. Sreekumar L, Kumari K, Bakshi A, Varshney N, Thimmappa BC, Guin K, Narlikar L, Padinhateer R, Siddharthan R, Sanyal K. Orc4 spatiotemporally stabilizes centromeric chromatin. bioRxiv. 2019b doi: 10.1101/465880. [DOI] [PMC free article] [PubMed]
  106. Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Research. 2005;33:W465–W467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Stanyon R, Rocchi M, Capozzi O, Roberto R, Misceo D, Ventura M, Cardone MF, Bigoni F, Archidiacono N. Primate chromosome evolution: ancestral karyotypes, marker order and neocentromeres. Chromosome Research. 2008;16:17–39. doi: 10.1007/s10577-007-1209-z. [DOI] [PubMed] [Google Scholar]
  108. Subramanian B, Gao S, Lercher MJ, Hu S, Chen WH. Evolview v3: a webserver for visualization, annotation, and management of phylogenetic trees. Nucleic Acids Research. 2019;47:W270–W275. doi: 10.1093/nar/gkz357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  109. Sun S, Yadav V, Billmyre RB, Cuomo CA, Nowrousian M, Wang L, Souciet JL, Boekhout T, Porcel B, Wincker P, Granek JA, Sanyal K, Heitman J. Fungal genome and mating system transitions facilitated by chromosomal translocations involving intercentromeric recombination. PLOS Biology. 2017;15:e2002527. doi: 10.1371/journal.pbio.2002527. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Thakur J, Sanyal K. Efficient neocentromere formation is suppressed by gene conversion to maintain centromere function at native physical chromosomal loci in Candida Albicans. Genome Research. 2013;23:638–652. doi: 10.1101/gr.141614.112. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Therman E, Susman B, Denniston C. The nonrandom participation of human acrocentric chromosomes in robertsonian translocations. Annals of Human Genetics. 1989;53:49–65. doi: 10.1111/j.1469-1809.1989.tb01121.x. [DOI] [PubMed] [Google Scholar]
  112. Todd RT, Wikoff TD, Forche A, Selmecki A. Genome plasticity in Candida Albicans is driven by long repeat sequences. eLife. 2019;8:e45954. doi: 10.7554/eLife.45954. [DOI] [PMC free article] [PubMed] [Google Scholar]
  113. Tong P, Pidoux AL, Toda NRT, Ard R, Berger H, Shukla M, Torres-Garcia J, Müller CA, Nieduszynski CA, Allshire RC. Interspecies conservation of organisation and function between nonhomologous regional centromeres. Nature Communications. 2019;10:2343. doi: 10.1038/s41467-019-09824-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  114. Tromer EC, van Hooff JJE, Kops G, Snel B. Mosaic origin of the eukaryotic kinetochore. PNAS. 2019;116:12873–12882. doi: 10.1073/pnas.1821945116. [DOI] [PMC free article] [PubMed] [Google Scholar]
  115. Tsui CK, Daniel HM, Robert V, Meyer W. Re-examining the phylogeny of clinically relevant candida species and allied genera based on multigene analyses. FEMS Yeast Research. 2008;8:651–659. doi: 10.1111/j.1567-1364.2007.00342.x. [DOI] [PubMed] [Google Scholar]
  116. van Hooff JJ, Tromer E, van Wijk LM, Snel B, Kops GJ. Evolutionary dynamics of the kinetochore network in eukaryotes as revealed by comparative genomics. EMBO Reports. 2017;18:1559–1571. doi: 10.15252/embr.201744102. [DOI] [PMC free article] [PubMed] [Google Scholar]
  117. Varshney N, Schaekel A, Singha R, Chakraborty T, van Wijlick L, Ernst JF, Sanyal K. A surprising role for the Sch9 protein kinase in chromosome segregation in Candida Albicans. Genetics. 2015;199:671–674. doi: 10.1534/genetics.114.173542. [DOI] [PMC free article] [PubMed] [Google Scholar]
  118. Wade CM, Giulotto E, Sigurdsson S, Zoli M, Gnerre S, Imsland F, Lear TL, Adelson DL, Bailey E, Bellone RR, Blöcker H, Distl O, Edgar RC, Garber M, Leeb T, Mauceli E, MacLeod JN, Penedo MC, Raison JM, Sharpe T, Vogel J, Andersson L, Antczak DF, Biagi T, Binns MM, Chowdhary BP, Coleman SJ, Della Valle G, Fryc S, Guérin G, Hasegawa T, Hill EW, Jurka J, Kiialainen A, Lindgren G, Liu J, Magnani E, Mickelson JR, Murray J, Nergadze SG, Onofrio R, Pedroni S, Piras MF, Raudsepp T, Rocchi M, Røed KH, Ryder OA, Searle S, Skow L, Swinburne JE, Syvänen AC, Tozaki T, Valberg SJ, Vaudin M, White JR, Zody MC, Lander ES, Lindblad-Toh K, Broad Institute Genome Sequencing Platform. Broad Institute Whole Genome Assembly Team Genome sequence, comparative analysis, and population genetics of the domestic horse. Science. 2009;326:865–867. doi: 10.1126/science.1178158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  119. Walker BJ, Abeel T, Shea T, Priest M, Abouelliel A, Sakthikumar S, Cuomo CA, Zeng Q, Wortman J, Young SK, Earl AM. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLOS ONE. 2014;9:e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  120. Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ, Concepcion GT, Ebler J, Fungtammasan A, Kolesnikov A, Olson ND, Töpfer A, Alonge M, Mahmoud M, Qian Y, Chin CS, Phillippy AM, Schatz MC, Myers G, DePristo MA, Ruan J, Marschall T, Sedlazeck FJ, Zook JM, Li H, Koren S, Carroll A, Rank DR, Hunkapiller MW. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. 2019;37:1155–1162. doi: 10.1038/s41587-019-0217-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  121. Xiao M, Fan X, Chen SC, Wang H, Sun ZY, Liao K, Chen SL, Yan Y, Kang M, Hu ZD, Chu YZ, Hu TS, Ni YX, Zou GL, Kong F, Xu YC. Antifungal susceptibilities of candida glabrata species complex, Candida krusei, candida parapsilosis species complex and Candida tropicalis causing invasive candidiasis in China: 3 year national surveillance. Journal of Antimicrobial Chemotherapy. 2015;70:802–810. doi: 10.1093/jac/dku460. [DOI] [PubMed] [Google Scholar]
  122. Zhang Y, McCord RP, Ho YJ, Lajoie BR, Hildebrand DG, Simon AC, Becker MS, Alt FW, Dekker J. Spatial organization of the mouse genome and its role in recurrent chromosomal translocations. Cell. 2012;148:908–921. doi: 10.1016/j.cell.2012.02.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  123. Zhang CZ, Leibowitz ML, Pellman D. Chromothripsis and beyond: rapid genome evolution from complex chromosomal rearrangements. Genes & Development. 2013;27:2513–2530. doi: 10.1101/gad.229559.113. [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision letter

Editor: Job Dekker1

In the interests of transparency, eLife publishes the most substantive revision requests and the accompanying author responses.

[Editors' note: this paper was reviewed by Review Commons.]

Acceptance summary:

A high-accuracy gapless assembly of Candida tropicalis is presented and compared to other candida species, revealing evolutionary changes to centromeres. The authors present evidence for interchromosomal rearrangement events near centromeres and telomeres that could have contributed to centromere evolution. 3C data shows that, as in other fungi, centromeres and telomeres cluster in 3D space, providing a possible mechanisms by which rearrangements could be favored to occur between centromeres and between telomeres.

eLife. 2020 May 29;9:e58556. doi: 10.7554/eLife.58556.sa2

Author response


Reviewer #1 (Evidence, reproducibility and clarity (Required)):

The authors have tried to address sudden evolutionary jump in centromere formation of two closely related species i.e. Candida tropicalis (Ct) and Candida albicans (Ca). While Ct has homogenized inverted repeat (HIR) associated centromeres, Ca evolved to form centromeres on unique DNA elements. To address this, the authors first generated chromosome level genome assembly (Assembly2020) of Ct using whole genome Illumina sequencing, 3C sequencing, PacBio SMRT sequencing, chromoblot analysis, and genetic analysis of engineered aneuploid strains to improve previously existing fragmented genome assembly. Interestingly, even though Ca and Ct have different mechanisms to form centromeres, there spatial organization of colocalizing centromeres at the nucleus periphery remains conserved. This is also demonstrated by 3C data which shows significantly higher centromerecentromere contact probability. Authors speculate that spatial proximity of homologous regions of centromere can facilitate genomic rearrangements. They demonstrate it by performing genome wide synteny analysis of Ct using Ca as reference genome. Results show that all the centromeres in Ct are either present at an interchromosomal synteny breakpoint (ICSB) or near an ICSB (except CEN6). Authors have also showed presence of HIR associated putative centromeres in species closely related to Ct i.e. C. sojae and C. viswanathii which is lost in Ca and C. dubliniensis (closely related species to Ca). Overall, this study proposes that close spatial proximity of homologous regions of centromeres in Ct led to inter-centromeric translocation events which possibly led to formation of evolutionary new centromeres on unique DNA sequences in Ca.

Strengths:

1) Very coherently demonstrated a possible mechanism for evolution of HIR-associated centromeres to centromere formation on unique DNA sequences in two closely related species Ct and Ca.

We thank the reviewer for encouraging words.

2) Generated chromosome level assembly of Ct and fragmented level assembly of C. sojae. This will act as an important resource for the scientific community.

Thanks for pointing out one of the critical aspects of this study. Indeed, our primary motivation was to generate a chromosome-level genome assembly of Candida tropicalis, an emerging medically relevant but poorly studied human pathogen. We hope that the genomic resources made available for the scientific community through this work will facilitate future studies.

3) In Assembly2020, author have identified various key structural regions like long copy number variations, long-track loss of heterozygosity and heterozygous translocation events providing a wholistic genomic map.

We appreciate that the reviewer highlighted this point. In fact, as a follow-up study for the next paper, we are trying to understand the importance of some of these genomic features in the drug resistance of the organism.

Minor Comments:

4) Title: Please correct: “inter-centromeric” instead of “intercentromeric”

We have made this change throughout the manuscript.

5) Please share the codes used to run the programs or share the GitHub link for the codes.

We have uploaded the codes written by us associated with 3C-seq data analysis to a Github repository and added the link in the revised manuscript (https://github.com/YaoChen/candida-tropicalis-analysis). For codes from publicly available programs/software, we have cited the references and mentioned the details of parameters in the Materials and method section.

6) Figure 1—figure supplement 2C-G: Please explain the assay used to determine aneuploidy in more details. The rationale and the explanation are not provided anywhere in the paper or in the supplementary information. What is the purpose of deleting SCH9?

We thank the reviewer for the suggestion. We have now described in detail the rationale and explained the steps followed to generate monosomic strains (CtKG101 – 105) (see Materials and methods). The deletion of SCH9 alleles increases the rate of chromosome loss in C. albicans (Varshney et al., 2015). To exploit this property of sch9 mutants, we deleted both copies of SCH9 in C. tropicalis and created a reporter strain (CtKG002) in the sch9 mutant background to assay for loss of Chr5. In this reporter strain, the short arm (5ʹ end) of one of the two homologs of Chr5 has been marked with URA3, and the long arm (3ʹ end) carries the MTLa or MTLα locus in such a way that two distinct alleles present on two homologs. URA3 and MTL loci are thus unlinked to each other. Next, we plated different dilutions of CtKG002 cells on plates containing 5-Fluoroorotic Acid (5-FOA) media to perform a counter selection of cells, which have lost the URA3 marker and grown as 5-FOAR single colonies. We confirmed the simultaneous loss of URA3 marker and one of the MTL alleles. Strains showing concomitant loss of these two unlinked loci must have lost an entire homolog of Chr5 and, therefore, are monosomic for Chr5. These strains (2n-1) that lacked one homolog of Chr5 were used to confirm the presence of heterozygosity of orphan haplotigs (OHs) of CtChr5.

7) Explain Figure 1—figure supplement 2E in more details.

We have now explained Figure 1—figure supplement 2E in more detail.

8) Figure 1—figure supplement 2D: Shouldn't the higher band be marked as 4580bp and lower band be marked as 3312bp?

We thank the reviewer for pointing out this error. We have corrected Figure 1—figure supplement 2D in the revised manuscript.

9) Results paragraph one: Introduce what are orphan haplotigs in the text like we suspect orphan haplotigs to be regions of aneuploidy in the Ct genome.

We thank the reviewer for the suggestion. In the revised manuscript, we have defined “orphan haplotigs” as “suspected heterozygous loci in the diploid genome of C. tropicalis”.

10) Subsection “Centromere and telomere proximal loci are hotspots for complex

translocations”: Any comments on why Cen6 did not have any inter-centromeric translocation? Anything special about its structure or location of any conserved element or essential genes near the centromere that might prevent gene rearrangement event?

There is no explanation of why CtCEN6 did not show any inter-centromeric translocation with respect to the C. albicans genome. However, synteny analysis of ORFs proximal to CEN6 of C. tropicalis (CtCEN6) reveals that three CtCEN6-associated ORFs are absent, but the other flanking ORFs are present in the C. albicans. We have mentioned these points in the revised manuscript. It is important to note that by using the C. tropicalis genome as the reference, all centromeres of C. albicans, except CaCEN2, were found to be associated with ICSBs (Figure 3—figure supplement 1B). Taking together, centromeres of both these species are found to be associated with chromosomal translocations.

11) Paragraph three subsection “Rapid transition in the centromere type within the members of the CUG-Ser1 clade”: It is not clear what species were used for this analysis. Was the IR-motif recognized in inverted repeats of all four species Ct, C. sojae, C. viswanathii, and Ca or just the first three? From the subsequent analysis, it is clear that IR-motifs are enriched in near centromeric regions of Ct, C. sojae, and C. viswanathii. If Ca was used for analysis, then what is the distribution of IR-motif in Ca.

We have now clearly explained the method used to identify the IR-motifs. Briefly, the centromeric IR sequences from C. tropicalis and those from putative centromeres of C. sojae and C. viswanathii were analysed to identify presence of any conserved motifs using motif discovery tool MEME (Bailey, et al., 2009). Having identified the reported IR motif, we scanned the chromosomes using FIMO (Bailey, et al., 2009). Subsequently, we compared the average IR-motif density at centromere with that of the genomic average in all the Candida species studied here. We do not see an enrichment of IR-motifs in C. albicans or C. dubliniensis centromeres compared to the genome average while the IR-motif is at least ~10 fold enriched in C. tropicalis, C. sojae, C. viswanathii centromeres (Figure 4C).

12) It was interesting to see the conservation of IR-motif across species. Any comments on what can be the significance of this conserved motif? Does it have any close resemblance to any known protein binding sites?

Previously, we have tested the contribution of the DNA sequence of the centromeric IRs in C. tropicalis in conferring mitotic stability and de novo CENP-A loading functions by replacing the native IRs with IR sequences of same length from C. albicans CEN5 in the plasmid context. This study clearly showed the inability of CaCEN5 IR in complementing the CEN function (Chatterjee et al., 2016). Our observation that the IR-motifs are absent in CaCEN5 suggests these motifs may be required for de novo centromere establishment in C. tropicalis. It is possible that these motifs serve as binding sites for a specific protein, centromere-specific enrichment of which contributes to de novoCEN function. However, the exact mechanism remains unexplored and can be addressed by performing more specific experiments, which are beyond the scope of this study.

Reviewer #1 (Significance (Required)):

Candida are pathogenic yeast that presents a growing threat to human health. The diversity of species contributes to the difficulty in treatment. Gapless assembly will further aid in research and allowed the authors to delve further into the evolutionary history of genome arrangements.

Thank you for the positive comments and encouraging words of appreciation.

Reviewer #2 (Evidence, reproducibility and clarity (Required)):

Summary:

This work demonstrates whole genome assembly of Candida tropicalis, which is a related species with well-studied Candidida albicans. While centromeres are formed on unique DNA sequences in C. albicans, C. tropicalis possesses homogenized inverted repeat (HIR) associated centromeres. Comparison of whole genome sequences of both species, authors found evidences for intercentromeric translocations in the common ancestor of both species. They also predicted that HIR sequences were lost during intercentromeric translocations in lineage of C. albicans and acquired evolutionary new centromere (ENC) in this species. Consistent with this idea, they found HIR associated centromeres in C. parapsilosis, C. sojae, and C. viswanathii, but not C. dubliniensis as well as C. albicans.

Major comments:

1) Genome sequence and informatics analyses have been done well. The methodology in this paper is reliable. Identification of centromeres of C. tropicalis at ICSB is interesting observation.

We thank the reviewer for encouraging comments.

2) However, additional data for centromere identification in C. parapsilosis, C. sojae, and C. viswanathii may be needed. CENP-A ChIP experiments will strengthen quality of their conclusion.

We thank the reviewer for this suggestion. Coincidentally, the genomic coordinates of the putative centromeres reported in this study match perfectly with the genomic locations of CENP-A-rich centromeres in C. parapsilosis strain CLIB214 identified in a recent study (Ola et al., 2020). While we agree that an additional CENP-A ChIP experiment can validate the putative centromeres in the other two species, neither antibodies against CENP-ACse4 nor epitope-tagged CENP-ACse4 expressing strains of these species are available. Once the transformation protocol of these two species is established, such experiments can be done in the future. In addition, common features such (a) the conserved IR-associated structure (Figure 4A, Figure 4—figure supplement 1A), (b) overall conservation in DNA sequence compared to random genomic loci (Figure 4A, Figure 4—figure supplement 2A-B), (c) the presence of conserved motifs (Figure 4D-E, Figure 4—figure supplement 2C-D) with same direction (Figure S10F), and (d) fully or partially conserved gene synteny across the centromeres of C. tropicalis with the putative centromeres of C. sojae (Figure 4—figure supplement 1B, D), and C. viswanathii (Figure 4—figure supplement 1C, E) indicate that these HIRassociated loci are probably authentic centromeres.

3) Furthermore, if addition of C. tropicalis cen sequence to the replicating plasmid facilitates de novo CENP-A assembly, centromeres of C. parapsilosis, C. sojae, and C. viswanathii may also have similar activity. This would suggest that HIR may be a genetic element for centromere specification. If these experiments can be done, it would be good.

We thank the reviewer for this suggestion. Previously, we have tested the role of centromeric IRs in de novo CENP-A recruitment on the centromeric plasmid of C. tropicalis (Chatterjee et al., 2016). A similar experiment can be performed by cloning the HIRassociated centromere DNA of C. sojae and C. viswanathii and assaying for improvement of mitotic stability and de novo CENP-A loading. We would like to try these experiments by constructing strain expressing an epitope-tagged CENP-ACse4, once a transformation protocol of these two species can be standardized.

4) Does Cen6 in C. tropicalis have a HIR sequence? As it is a bit unclear, please state this point clearly in the revised version.

Our previous study reported that all seven centromeres are HIR-associated, which is stated in the original submission: “Strikingly, all seven centromeres of another CUG-Ser1 clade species C. tropicalis…highly identical to each other.” Our current analysis also showed that all seven centromeres of C. tropicalis, including CEN6, carry homogenized inverted repeat (Figure 4A) and contain the IR-motifs (Figure 4D-E). However, the central core of CEN6 harbors two ORFs, unlike the ORF-free CCs of the other six centromeres. We have mentioned these points in the revised version (Introduction).

5) If their intercentromeric translocation theory is correct, how HIR of cen 7 in C. albicans (corresponding Cen6 in C. tropicalis) are lost? Furthermore, Cen6 in C. tropicalis also makes a cluster with other centromeres, so why Cen6 was escaped from intercentromeric translocation? As I understand that it is hard to explain these points experimentally, please discuss it in the revised version.

Although it appears that CtCEN6 escaped inter-centromeric translocations, synteny analysis suggested that a chromosomal region carrying three consecutive CtCEN6proximal ORFs was lost in the C. albicans genome. This suggests that CtCEN6 had gone through chromosomal rearrangements. This can be explained by a double-stranded DNA break at CtCEN6 followed by a fusion of broken ends. This event might have led to the loss of HIRs and the emergence of an ENC on Chr7 in C. albicans. We speculate that more intracentromeric rearrangements might have taken place during the transition from HIRassociated centromeres to unique repeat-less ENCs.

Reviewer #2 (Significance (Required)):

This is a solid work and should be published in an appropriate journal. However, this work is on centromere evolution within Candida species, people outside the filed may not have wide interests. Even in centromere research field, it is hard to evaluate generality of this finding. If similar events happen in other species outside fungi, this would be interesting widely.

Thanks for the appreciation. With centromere features are known in more than 50 species, fungal centromeres are not only well-characterized but also show a wide diversity of centromere features found in other forms of life such as in plants and animals. Strikingly, the only feature that remained conserved among all fungal centromeres is their spatial clustering (Guin, Sreekumar, Sanyal, in press. Annual Review of Microbiology). In this work, we provide mechanistic evidence to support how spatial clustering favors centromere type transition, which may be found in animal and plant systems as centromeres are shown to be one of the most rapidly evolving loci in all domains of life (Henikoff et al., 2001; Padmanabhan et al., 2008).

We found genomic evidence of inter-centromeric translocation, which may have led to the loss of HIR-associated ancestral type and emergence of evolutionary new centromeres in C. albicans. However, even during such dramatic karyotype reorganization, the clustering of centromeres remained unperturbed (Figure 5C). This observation indicates that centromere clustering in fungi is a matter of cardinal importance. Based on our results (Figures 3 and Figure 3—figure supplement 1), we proposed that spatial proximity and DNA sequence homology aided karyotypic rearrangements and possibly aided speciation events in CUG-Ser1 clade. This conserved principle of spatial proximity and DNA sequence homology favoring recombination holds true among multiple domains of life and can be related to other well-studied phenomena. interchromosomal contacts between chromosome pairs have been correlated with the number of translocation events in both naturally occurring populations and experimentally induced mammalian cells (Arsuaga et al., 2004; Bickmore and Teague, 2002; Branco and Pombo, 2006; Canela et al., 2017; Engreitz et al., 2012; Hlatky et al., 2002; Holley et al., 2002; Klein et al., 2011; Roukos et al., 2013; Zhang et al., 2012). It has been suggested that contacts between various chromosomal territories as well as their relative positions in the nucleus influence the sites and frequency of translocation events both in flies and mammals (Roukos et al., 2013; Soutoglou and Misteli, 2008). One such well-studied translocation events, Robertsonian translocation (RT), involving the fusion between arms of two different chromosomes near the centromere, is the most frequently detected chromosomal abnormality in humans (Therman, Susman et al., 1989). The occurrence of RT was first reported in grasshoppers (Robertson 1916) and subsequently been implicated in karyotype evolution in humans (Therman, Susman et al., 1989), mice (Castiglia and Capanna 2002, Dumas and Britton-Davidian 2002), and wheat (Friebe, Zhang et al., 2005) among others. Although significantly different from centromere clustering in fungi, cytological and Hi-C based evidence of spatial proximity (reviewed in Muller, Gil et al., 2019; Imakaev et al., 2012) among the repeat-associated centromere DNA sequences (Kalitsis, Griffiths et al., 2006) in humans, mice and wheat supports a possibility that RT may have been guided by spatial proximity. Similarly, chromoplexy, involving a series of translocations among multiple chromosomes without alteration in copy number, was identified in prostate cancers (Baca, Prandi et al., 2013; Zhang, Leibowitz et al., 2013). Although fine mapping of translocation events at the repetitive regions in human cancer cells becomes difficult, growing evidence suggests that such events are associated with the formation of micronuclei (Crasta, Ganem et al., 2012). This further supports the idea that the spatial genome organization may influence chromoplexy (Meaburn, Misteli et al., 2007). Therefore, we strongly believe that evolutionary implications of our observations will be of interest to a broader group of researchers studying not only centromere biology but also mechanisms of genome evolution and speciation in fungi and beyond.

Reviewer #3 (Evidence, reproducibility and clarity (Required)):

This study uses multiple different technologies to improve the genome assembly of Ct. They also end up resolving haplotype-specific differences, copy number variations, a translocation, and an LoH event while doing so. The interesting hypothesis relates to differences centromere formation of different Candida species mainly focusing on the wellstudied Ca genome and the newly generated Ct assembly here. In my opinion, this is a complete piece of work with one clear deliverable (new assembly) and several interesting hypotheses, some of which would require further studies for a definitive conclusion.

We thank the reviewer for encouraging remarks.

I have only a few major concerns but the manuscript needs a good grammar and consistency check before publication.

We have tried to the best of our ability to check for grammatical errors and consistency throughout the manuscript.

1) Could the authors generate 3Cseq data for one other species from the CUG-Ser1 clade (C. sojae) to show the expected centromere locations (HIRs) cluster in 3D?

The species studied to date in subphylum Saccharomycotina show clustered centromeres at all stages of the cell cycle (reviewed in Muller et al., 2019). Among the members of the CUG-Ser1 clade, clustering of the centromere-kinetochore complex was first studied in C. albicans (Sanyal and Carbon, 2002) and was later confirmed by Hi-C analysis (Burrack LS et al., 2016). In fact, the centromeres of C. albicans share highly enriched transchromosomal contacts (Sreekumar L. et al., 2019). This property of centromere clustering in fungi was used to develop analytical techniques for prediction of centromere location using genome-wide contact probability data (Varoquaux, N. et al., 2015), which showed clustering of centromeres in the CUG-Ser1 clade member Scheffersomyces stipitis. Similarly, Hi-C data revealed clustering of centromeres in another CUG-Ser1 clade member Debaromyces hansenii (Marie-Nelly H. et al., 2014). Based on these known facts, we expect similar clustering of centromeres in C. sojae. This can be tested by sub-cellular localization of CENP-ACse4 and ChIP-qPCR to validate CENP-A enrichment on the HIR-associated putative centromere loci. However, once the transformation protocol in this organism is established, it will be possible for us to perform this experiment. We also plan to perform Hi-C experiments to improve the genome assembly and observe genomic features of C. sojae in a future study.

2) The link to chromothripsis (defined in its originally proposed form) is not clear to me. This has to be either elaborated more or just removed. The events mentioned are consistent with multiple interchromosomal translocations, not shattering of a chromosome.

In the initial draft of our manuscript, we used the phrase “chromothripsis-like event” to explain the inter-centromeric translocations observed in the last common ancestor of C. albicans and C. tropicalis. In this revised version, we have removed the same.

3) Results from Figure 3C and 3D need to be consolidated. I am not sure how ICSB density can be higher at CP compared to TP (Figure 3C) but yet the lengths of orthoblocks are in general shorter for TP compared to CP (Figure 3D).

Figure 3C shows the ICSB density on six chromosomes of C. tropicalis (except Chr6, which does not carry any ICSB) as a function of the distance from the centromere. On the other hand, Figure 3D compares the length of orthoblocks present at centromere-proximal (CP), centromere-distal (CD), and telomere-proximal (TP) zones. These two genomic features are independent of each other. For example, the ICSB density at CP is higher than TP (Figure 3C), but the lengths of orthoblocks are shorter at TP than CP (Figure 3D). This is due to the clustering of majority (28/39) of ICSBs at CP, while few (7/39) smaller blocks are present at TP.

4) Several other narrower spikes in 3C coverage for chr1 and chr2 (Figure 1D). How were these determined to not be duplicated regions? The methodological details of how these decisions were made are missing.

We thank the reviewer for the suggestion. Now we have exploited a more comprehensive approach to detect CNVs across the genome using a published tool CNAtra (Khalil et al., 2020). The estimated copy numbers are computed for each region, and those with estimated copy number >2.5 are considered as duplicated regions. Details of the CNV detection method have been described in Materials and methods.

5) How can the 3Cseq data be generated using HindIII for digestion and can be binned at 5bp bins or 2kb bins?

We performed the 3C-seq experiment using DpnII and not HindIII (see Materials and methods). To determine the copy number of orphan haplotigs, paired-end 3C-seq reads were mapped to orphan haplotigs and a control locus (Figure 1—figure supplement 2A) using paired-end alignment mode. The reads were mapped per 5-bp bin and normalized to per million mapped reads. These values were then used to show that specific regions of the orphan haplotigs are present in one copy, while the control locus is present in two copies (Figure 1—figure supplement 2A – B).

Minor:

6) Figure 1 caption mentions B in place of C, and vice versa, compared to the figure.

We have corrected it in the revised manuscript.

7) Figure 2C and other heatmaps. The exact values of the color scale need to be reported rather than high vs low.

We have now modified the figures to show the exact values of the color scale.

- scatter-pot – scatter-plot, conseqence – consequence

They have been corrected in the revised manuscript.

8) Replace roman numerals in Figure 3C with distance ranges, it is confusing. Figure 3D in the caption is marked as "E.", replace it with D.

We have now replaced the Roman numerals with the distance range. Figure 3D in the caption was marked as "E.," now we have replaced it with D.

9) How large are the genomic regions in Figure 4E? This needs a scale bar to show the size in kb

We have now mentioned the length of each of the locus in the modified Figure 4E and Figure 4—figure supplement 2D.

10) Figure 5 caption: repeat associated-associated.

We have corrected this error.

Reviewer #3 (Significance (Required)):

- A new chr-level assembly for C. tropicalis

- Hypotheses and some supporting information about the evolution of centromere formation in related species.

- Important for yeast biologists

- I have expertise in computational analysis of conformation capture data and analysis of such data in related species

Thanks for highlighting the significant findings of this study.

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Data Citations

    1. Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Candida tropicalis and Candida sojae. NCBI BioProject. PRJNA596050
    2. Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Whole genome sequencing of Candida tropicalis isolates. NCBI BioProject. PRJNA604451

    Supplementary Materials

    Source data 1. Source_data_combined.
    elife-58556-data1.xlsx (378.5KB, xlsx)
    Supplementary file 1. Assembly C with 12 contigs.
    elife-58556-supp1.pptx (38.3KB, pptx)
    Supplementary file 2. Assembly of sub-telomeres and filling up N-gaps in the genome assembly of C. tropicalis using de contigs.
    elife-58556-supp2.pptx (37.1KB, pptx)
    Supplementary file 3. Statistics for different versions of genome the assembly of C. tropicalis (MYA-3404) generated in this study.
    elife-58556-supp3.pptx (41.8KB, pptx)
    Supplementary file 4. A comparative analysis of Assembly A and the improved Assembly2020 of C. tropicalis.
    elife-58556-supp4.pptx (39.7KB, pptx)
    Supplementary file 5. Features of centromere DNA elements in C. sojae.
    elife-58556-supp5.pptx (39.3KB, pptx)
    Supplementary file 6. Features of centromere DNA elements in C. viswanathii.
    elife-58556-supp6.pptx (36.3KB, pptx)
    Supplementary file 7. Centromere coordinates used for identifying conserved DNA sequence motifs in Candida species.
    elife-58556-supp7.pptx (49.6KB, pptx)
    Supplementary file 8. List of strains used in this study.
    elife-58556-supp8.pptx (40KB, pptx)
    Supplementary file 9. List of primers used in this study.
    elife-58556-supp9.pptx (52.8KB, pptx)
    Supplementary file 10. List of plasmids used in this study.
    elife-58556-supp10.pptx (34.6KB, pptx)
    Transparent reporting form

    Data Availability Statement

    All sequencing data reported in the study and the genome assembly of C. tropicalis and C. sojae have been submitted to NCBI under the BioProject accession numbers PRJNA596050 and PRJNA604451.

    The following datasets were generated:

    Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Candida tropicalis and Candida sojae. NCBI BioProject. PRJNA596050

    Guin K, Chen Y, Mishra R, Muzaki SRBM, Thimmappa BC, O'Brien CE, Butler G, Sanyal A, Sanyal K. 2020. Whole genome sequencing of Candida tropicalis isolates. NCBI BioProject. PRJNA604451


    Articles from eLife are provided here courtesy of eLife Sciences Publications, Ltd

    RESOURCES