Abstract
Distillation of fermented sugarcane juice produces both rum and cachaça, significant sources of revenue in Brazil and elsewhere. In this study, we provide a genomic analysis of a Saccharomyces cerevisiae strain isolated from a cachaça distillery in Brazil. We determined the complete genome sequence of a strain with high flocculation capacity, high tolerance to ethanol, osmotic and heat shock stress and high fermentation rates and compared the sequence with that of the reference S288c genome as well as those of two other cachaça strains. Single-nucleotide polymorphism analysis identified alterations in genes involved in nitrogen and organic compound metabolism, integrity of organelles and ion homeostasis. The strain exhibited fragmentation of several flocculation genes relative to the reference genome, as well as loss of a stop codon in the FLO8 gene, which encodes a transcription factor required for FLO gene expression. The strain contained no genes not present in the reference genome strain but did lack several genes, including asparaginase genes, maltose utilization loci, and several genes from the tandem array of the DUP240 family. The three cachaça strains lacked different sets of genes, but the asparaginase genes and several of the DUP240 genes were common deficiencies. This study provides new insights regarding the selective pressure of sugarcane fermentation on the genome of yeast strains and offers additional genetic resources for modern synthetic biology and genome editing tools.
Supplementary Information
The online version contains supplementary material available at 10.1007/s42770-021-00444-z.
Keywords: Cachaça production, Fermentative process, Yeast, Whole-genome sequencing
Introduction
Distilled spirits derived from fermentation of sugarcane juice, primarily rum and cachaça, represent a significant fraction of the worldwide spirits market, and is expected to grow by 8.1% annually [1]. Brazil currently produces 1.3 billion litres annually of cachaça, the third most popular distilled beverage in the world [2]. More than fifty companies in Brazil exported 7.26 million litres of cachaça to 83 countries in 2019, generating revenues of greater than 14 million USD [3].
Traditional cachaça fermentation begins with yeasts that occur naturally in sugarcane juice. The sugarcane juice added daily is a source of new populations of yeast and bacteria, and as these microorganisms multiply, they change the characteristics of the must. The natural starter culture is unique to each alembic, which contributes to the variations in the production, yield, and quality of cachaça in different regions [4]. Each cycle of cachaça fermentation lasts from 13 to 30 h and reaches temperatures up to 41 °C and ethanol concentration up to 8% [5, 6]. Cycles occur repeatedly over several months, with addition of fresh sugarcane extract and water at each cycle.
The heat and ethanol stresses during fermentation impose strong selective pressure on the strains present. Since cachaça characteristics and quality are correlated with the particular microbial community present in the vats, identifying and characterizing the specific organisms responsible for the fermentation process is quite important. To date, identification of the fermenting yeast relied on morphological and physiological characteristics. However, these methods are limited and new molecular techniques such as whole-genome analyses and comparative genomics allow for a more accurate identification and description of the fermenting yeast strains [2, 7]. A recent phylogenetic study using whole-genome sequencing found that cachaça strains from traditional distilleries are polyphyletic. Rather than forming a simple phylogenetic clade, most cachaça strains reside in two clades, with a few other cachaça yeast scattered at isolated positions in the phylogenetic tree [8]. These results reinforce the idea that different yeast strains evolved under different selective pressures present in each individual alembic.
Here, we describe the whole-genome sequencing analysis of a cachaça strain used for traditional cachaça production in Brazil. This strain, BT0510, was previously isolated from fermentation vats of cachaça distillery in the State of Espírito Santo, Brazil, and exhibited high flocculation capacity, high tolerance to ethanol, osmotic and heat shock stress and high fermentation rates [9]. Moreover, several studies conducted with BT0510 documented its advantages in biotechnological processes such as beverage, baking and biofuel production [9–11]. Through de novo assembly, annotation and alignment to the reference strain S288c, we determined gene content, genomic variants and gene gains and losses in the BT0510 strain. We also evaluated genomic similarities and differences with respect to other cachaça strains. The cachaça yeast strains were selected from each clade to represent them, and derive from different regions of Brazil. This study provides baseline knowledge on the effects of selective pressure involved in the cachaça fermentation process. In addition, these analyses of the BT0510 yeast strain will provide genetic information to improve the quality of cachaça, a highly important beverage in Brazil, culturally and economically.
Materials and methods
Yeast Strain and Reference Sequences
The cachaça yeast strain used in this study (BT0510) was identified as S. cerevisiae by Bravim et al. [9] and is currently stored at the Federal University of Pernambuco Culture Collection (Recife, PE, Brazil) under the code 6670. The R64 2-1 release of the reference S. cerevisiae S288c genome was downloaded from the Saccharomyces Genome Database website (http://sgd-archive.yeastgenome.org/sequence/S288C_reference/genome_releases/) and used as a reference throughout this work.
DNA Isolation, Genome Sequencing and Assembly
Cells from frozen glycerol stocks were grown on YPD plates at 30 °C for 24 h. A single colony was grown in 50 ml YPD at 30 °C for 24 h. Fresh cells were collected, and genomic DNA was prepared using the DNeasy® Blood & Tissue Kit (Qiagen, Germantown, MD, USA) with a protocol designed for purification of DNA from yeast cells. DNA/RNA quantity was determined using a NanoDrop 1000 (Thermo Scientific, MA, USA), and integrity was determined with an Agilent 2100 Bioanalyzer (Agilent, Santa Clara, CA, USA).
The genome sequence of S. cerevisiae BT0510 was obtained on an Illumina MiSeq500, and the data for this study have been deposited in the European Nucleotide Archive (ENA) at the EMBL-EBI under accession number PRJEB36870 (https://www.ebi.ac.uk/ena/data/view/PRJEB36870) and sample accession SAMEA6593250. Sequencing primers and low-quality read regions were trimmed and removed using Cutadapt [12]. Illumina reads were assembled de novo using SPAdes 3.7.1 [13], and contigs shorter than 200 bp were discarded. We assessed assembly quality using BUSCO 21 v3.0.2 [14].
Protein-coding genes were predicted using Augustus 3.0.3 [15] trained on the S. cerevisiae S288c dataset. Annotation of protein-coding genes was performed using BLASTP search against S. cerevisiae S288c proteins and a non-redundant protein sequence database.
For comparative genomic analysis, we used Illumina reads from other cachaça yeast strains deposited at the European Nucleotide Archive (ENA) under the accession code PRJEB24932 [8]. Illumina reads were downloaded from the archive database and then assembled de novo as above into contigs using SPAdes 3.7.1 [13].
Variation Identification and Genome Diversity Analysis
Illumina reads were mapped to S. cerevisiae strain S288c reference genome using Bowtie 2 [16]. We used FreeBayes [17] to find genetic variants (including single-nucleotide polymorphisms (SNPs) and insertions/deletions) and SnpEff [18] to identify genomic variant annotations, functional effects and dN/dS ratios.
Genes of S. cerevisiae S288c Missing in the Analysed Yeast Strains
Illumina sequencing reads obtained for strain BT0510 were mapped to the reference genome using Bowtie 2 [16], and the coverage percent of each gene was calculated using Bedtools genome coverage [19]. A gene was considered missing if sequence covered less than 50% of the gene in the reference sample.
Non-reference Genes Present in the Analysed Yeast Strains
All genes annotated in the de novo assembly genome were compared with S. cerevisiae S288c genes using BLASTN search. A gene was considered “new” if more than 80% of the gene was less than 70% identical to the reference genome.
Gene Ontology (GO) Enrichment Analysis and List Comparison
Gene sets and ORFs were analysed using YeastMine [20]. GO analyses were performed using the DAVID algorithm [21] with the Benjamini correction. Differences were considered statistically significant when the P value was smaller than 0.1.
Analysis of the Genome Variations Between BT0510 and Cachaça Strains
The recent genome-based phylogenetic analysis from Barbosa et al. [8] with cachaça strains show that most cachaça strains belong to two clades (C1 and C2), and one cachaça strain from each clade was selected to represent it. The genome sequences of cachaça strains UFMG-CM-Y627 (belonging to the C1 clade) and UFMG-CM-Y632 (belonging to the C2 clade) were downloaded from ENA (European Nucleotide Archive) under the accession code PRJEB24932 (19 June 2019, date last accessed). The UFMG-CM-Y627 strain was isolated in the state of Rio de Janeiro, Brazil, and the UFMG-CM-Y632 strain in Tocantins state, Brazil. The differences between the strain BT0510 and UFMG-CM-Y627 and UFMG-CM-Y632 were analysed at sequencing read levels. The genome sequences of strain UFMG-CM-Y627 and UFMG-CM-Y632 were aligned and compared with the laboratory strain S288c through Bowtie2, and the results in SAM format were visualized and manually investigated in Integrative Genomics Viewer. In this work, we coded the strain UFMG-CM-Y627 as Y627 and the strain UFMG-CM-Y632 as Y632.
Other Analysis Tools
Routine sequence visualization and manipulation of nucleotide sequences were performed with Qlucore (Qlucore Omics Explorer, Lund, Sweden), ClicO FS [22] and IGV [23].
Results
Genome Sequencing and Assembly of the Cachaça Yeast Strain BT0510 Genome
Genomic DNA from S. cerevisiae strain BT0510 was sequenced on an Illumina MiSeq platform at 263X coverage. The sequencing of a DNA library generated 17,190,402 nucleotide single-end reads, each sequence with length between 35 and 251 bp and GC content of 38%.
De novo assembly of Illumina reads was performed using the SPAdes assembly version 3.13.1 and resulted in 2900 contigs with an N50 value of 27,902 bp for all contigs. The assembly results are summarized in Table 1. The completeness and assembly assessment were provided by Benchmarking Universal Single-Copy Orthologs (BUSCO) that identified the majority of genes from database Saccharomycetales 0db9. From a total of 1711 BUSCO groups searched, we found 1622 complete BUSCOs (C):1611 complete and single-copy (S), 11 complete and duplicated (D), 55 fragmented (F) and 34 missing (M). The percentages are described below:
Table 1.
Statistic of sequencing, assembly and annotation of S. cerevisiae BT0510
| Attributes | Value |
|---|---|
| Coverage | 263X |
| Number of reads | 17,190,402 bp |
| N50 | 27,902 bp |
| Total assembly length | 12,612,565 bp |
| Total number of contigs | 2900 |
| Largest contig | 281,813 bp |
| Largest alignment | 17,033 bp |
| Protein-coding genes | 5687 |
Genome Variations in BT0510 Compared with the Model Strain S288c
To examine genetic variation in BT0510, the Illumina reads were mapped to the S288c reference genome, which identified 66,327 SNPs and 6037 Insertions/Deletions (InDels) in BT0510 (Table 2). The BT0510 strain is diploid and sequencing the strain in its ploidy state allowed analysis of multiallelic sites, and, although the strain is almost completely homozygous, 1316 heterozygous sites were found and 98 of these sites were SNPs.
Table 2.
Number of variants by type in the strain BT0510
| Variant by type | Total |
|---|---|
| Single-nucleotide polymorphisms | 66,327 |
| Multinucleotide polymorphisms | 5018 |
| Insertions | 2967 |
| Deletions | 3070 |
| Mixed | 1068 |
The variant rate throughout the genome was on average one variant every 154 bases but varied in each chromosome (Supplementary Table S1). We noted lower variant rates in chromosome I (1 in 97) and the mitochondrial chromosome (1 in 72). We identified deleted regions and areas with lower coverage depth in these chromosomes after the alignment with the reference genome. Some reference genes were fully or partially located in these regions (Fig. 1). We found significant SNV (SNPs and Indels) enrichment in some “hot spots” including subtelomeric regions of several chromosomes.
Fig. 1.
Sequencing coverage depth after alignment with S288c. The two BT0510 chromosomes that displayed lower variant rate, mitochondrial chromosome (a) and chromosome I (b) were represented using IGV software. Coverage depth in the mitochondrial chromosome (a) was irregular throughout the chromosome with areas without coverage. Chromosome I (b) contains an irregular coverage with low coverage depth in specific areas and in the extremities. The data range is located at the top left of the coverage track. Reference genes that were fully or partially localized in deleted/low coverage areas are mapped underneath it
The regions affected by the variants were classified using SNPeff. Approximately 41% of all detected variants lie in downstream non-coding regions of genes, 43% were in upstream non-coding regions and 6% were intergenic. More than 37% of intragenic SNPs resulted in changes to the encoded protein sequence and the dS/dN ratio was 0.59. We identified 514 genes with variants that eliminated/added stop/start codons or that created a frameshift. In fact, the addition of a stop codon and a frameshift variant are more likely to have an impact than the change in start codons. For this reason, we filtered 392 genes presenting these two variations and performed a GO analysis via DAVID. The enrichment analysis returned significant alterations in “cell component”, “biological process” and “molecular function” categories. The “biological process” category included “nucleic acid phosphodiester bond hydrolysis”, “transposition, RNA-mediated”, “transmembrane transport”, “DNA recombination”, “RNA splicing” and “transcription from RNA polymerase II promoter”. The “cell component” category included “integral component of membrane”, “membrane” and “retrotransposon nucleocapsid”. The “molecular function” category included “ATP binding”, “DNA binding”, “cytochrome-c oxidase activity”, “sequence-specific DNA binding”, “transporter activity”, “transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding” and others. This analysis points to the importance of processes related to transcription, nucleic acid binding and transport, all of them essential for cellular functioning, that could have alteration in the BT0510.
We have also identified a set of 33 genes involved in regulation of gene expression that contain one or more high-impact variant (Supplementary Table S2). One of these genes, FLO8, is a transcription factor required to orchestrate flocculation during the late stages of fermentation [24]. Moreover, the FLO10 gene, which encodes a cell wall protein that participates directly in cell-cell adhesion during flocculation [25], lacks a stop codon and contains two frameshift variants. Also, several other genes from the FLO family—FLO9, FLO1 and FLO5—have undergone substantial alterations. Full-size FLO9, FLO1 and FLO5 genes were not found in the BT0510 strain. The FLO9 gene in the BT0510 encode a 643 amino acids (a.a.) protein rather than the 1322 a.a. protein in S288C, FLO1 encode a 1120 a.a. protein rather than 1537 a.a. and FLO5 encode a 684 a.a. protein rather than 1075 a.a. (Fig. 2). These changes could be a response to selection to increase and improve the flocculation in this strain, which exhibits a high flocculation profile. Substantial alterations were also observed in genes related to maltose metabolism, including maltase (YGR292W) and isomaltase (YOL157C), and in a variety of transposable element genes.
Fig. 2.
Structural variations in genes from the FLO family. The annotation from three genes from the FLO family, FLO1, FLO9 and FLO5 are represented in the figure. All of them are smaller than the corresponding genes in strain S288c
Gene Loss and Gain in the BT0510 Relative to S288c
We searched the 5687 protein-coding gene sequences in BT0510 to identify genes not found in S288c and concluded that none exist. In contrast, 38 genes present in the S288c strain were absent in the BT0510 strain (Table 3). The majority of the missing genes were classified as unknown genes or dubious open reading frame. However, we also observed the absence of the cell wall associated asparaginase genes, several MAL genes (transcriptional factor MAL13 and maltose transporter MAL11), and genes of the DUP240 family. We also observed the absence of genes related to vesicle formation, COPII binding, enolase regulation and sulphate metabolism, suggesting alterations in carbon metabolism, intracellular transport and sulphur metabolism.
Table 3.
Genes in S288c that are missing in strain BT0510
| Group | List of genes |
|---|---|
| Protein of unknown function | YAL064W YFR026C YIR041W (PAU15) YIR042C YLR156W YLR157W-D YLR159W YLR157W-E YLR161W YLR162W YOL164W-A YOL164W YOL163W YOL162W Q0075 (AI5_BETA) Q0255 |
| Dubious open reading frame | YAR030C YAR053W YEL074W YGL052W YGR290W YGR291C |
| DUP240 Family | YAR029W YGL051W (MST27) YAR033W (MST28) YAR027W (UIP3) YAR031W (PRM9) YGL053W (PRM8) |
| Acetyltransferase activity | YJL218W |
| Enolase regulation | YJL217W (REE1) |
| Asparaginase activity | YLR155C (ASP3-1) YLR157C (ASP3-2) YLR157C (ASP3-3) YLR160C (ASP3-4) |
| Maltose fermentation | YGR288W (MAL13) YGR289C (MAL11) |
| Sulphate metabolism | YOL164W (BDS1) |
The Genome Differences Between BT0510 Strain and Two Other Cachaça Strains
We compared the genome of BT0510 with two other cachaça strains that were isolated in different geographic locations in Brazil and that belong to two different clades to analyse differences in gene content. Strain Y627 was isolated in the state of Rio de Janeiro and Y632 in the Tocantins state. The lists of missing genes in each strain were compared with identify similarities and unique differences in gene content (Fig. 3). Nineteen of the 38 genes missing in the BT0510 relative to S288c were absent in the genomes of both Y627 and Y632. Among them there were genes with unknown function, two genes from the DUP240 family, one related to intracellular transport and the cell wall associated asparaginase gene cluster. Six genes were missing in both BT0510 and Y632, three of which are involved in vesicle organization or are part of the DUP240 family.
Fig. 3.

Missing genes in the genome of the strains BT0510, UFMG-CM-Y627 and UFMG-CM-Y632. Shown is the Venn diagram indicating the number of unique and common gene deletions in the three cachaça strains
Six genes were missing in both Y627 and Y632 but not in BT0510. Most of these genes are dubious reading frame or with unknown function, while one gene is a putative aryl-alcohol dehydrogenase (AAD15/YOL165C) involved in the metabolism of cellular aldehyde. AAD15 is a non-essential gene and a null diploid mutant exhibited increased innate thermotolerance [26], a characteristic important for cachaça production. Among the thirteen genes missing exclusively in BT0510 were those involved in maltose metabolism and Golgi to endosome transport. Strain Y627 uniquely lacks thirteen genes and these are related to transpositions, especially RNA-mediated. Finally, Y632 strain uniquely lacks 64 genes, which are involved in base pairing with mRNA, iron chelate transport, mitochondrial translation elongation, mitochondrial electron transport and ATP metabolic process. In addition, and consistent with the missing nuclear-encoded mitochondrial genes, the mitochondrial genome in this strain is substantially truncated. We also mapped the scaffolds from the three yeasts into the S288c mtDNA and noted the loss of several mitochondrial genes, especially in the Y632 strain (Fig. 4).
Fig. 4.
mtDNA genomes in strains BT0510, Y627 and Y632. The assembled contigs from the three strains were mapped onto the S288c mtDNA to illustrate the mitochondrial coverage in these strains. Genes (green) and coverage (red) are indicated. The grey line represents the whole S288c mitochondrial chromosome
Discussion
Cachaça is a quintessential Brazilian distilled spirit, and its fermentation exhibits several unique features: short fermentative cycle (18–30 h), high temperatures, high alcohol content and daily additions of fresh sugarcane juice within the entire 4–6-month cycle. These conditions exert strong selective pressure over the yeast strains used during the process [27]. However, the complete genome assemblies of cachaça strains have not been reported to our knowledge, making it difficult to identify features that would allow improvements in the quality of this spirit and the fermentation process. The purpose of the current study was to analyse the genome of a cachaça strain, BT0510, isolated in Espírito Santo State, and compare its characteristics with other strains that belong to different clades and were isolated in different states in Brazil.
De novo assembly of BT0510 yielded genome size and gene organization quite similar to the laboratory strain S288c. However, our genomic analysis revealed a complex panorama of genetic variants in the BT0510 compared with S288c. Whether the observed differences resulting from SNPs, InDels and gene loss in pathways related to carbohydrate metabolism, regulation and response to stress and nitrogen starvation are associated with the adaptation of these strains to cachaça production awaits further analysis.
Flocculation is controlled by members of flocculation family (mainly FLO genes) and a series of regulatory genes [28]. The high flocculation capacity of BT0510 might be explained by the significant changes observed in genes related to the FLO family. A shift in the spectrum of the FLO family genes has also been found in flor yeast strains, a specialized group of S. cerevisiae yeasts used for biological wine aging and which have a high capacity to form biofilms [29]. The CAT-1 yeast strain, widely used in the Brazilian fuel ethanol industries, is known to be defective in flocculation and foam production. Its genome analysis show gaps in FLO10, FLO5 and FLO11, and the absence of FLO1 an FLO9 [30]. This set of characteristics, principally the missing genes, explains the lack of flocculation and foam forming phenotype in this industrial strain. These rearrangements in genes from the FLO family should be further analysed to better determine their influence in different phenotypes. Therefore, identifying strains with different abilities and the genetic bases for these traits are important for strain and processes improvements [31–34]. Both biofilm formation and flocculation are features not recommended in yeast strains used in the fuel ethanol industries; however, in the spirit production industries both biofilm formation and flocculation are a desirable trait to separate the yeast and the must at the end of the fermentation.
Industrial yeast strains are recognized for their robustness and high fermentative capacity. Nevertheless, the LBCM1047 strain, isolated from cachaça fermentation vats, show ethanol yield and cellular viability similar to PE-2, the most commercialized strain in the Brazilian fuel ethanol industries, during all fermentative cycles [35]. Brexó et al. [36] presented a cachaça starter strain with biomass yield, ethanol yield and productivity similar to or higher than PE-2 and CAT-1. Therefore, other characteristics, in addition to the higher fermentative capacity, have been observed in cachaça strains suggesting their applicability in the bioethanol industries. Genome comparison within the three cachaça strains revealed differences in gene loss that could affect carbon metabolism, tolerance to fermentative stress and mitochondria functionality. Of note, the absence of several nuclear genes encoding mitochondrial proteins as well as the substantial deletion of the mitochondrial genome in strain Y632 precludes the strain from performing oxidative phosphorylation and aerobic respiration. This may have arisen from selection for maintaining high levels of ethanol following fermentation by ensuring that ethanol produced during fermentation is not subsequently metabolized by the yeast.
Laboratory strains of S. cerevisiae carry up to five MAL loci and genetic mapping assigned each MAL locus to separate chromosomes. The presence of any of these enables the yeast to ferment maltose [37]. The MAL1 locus contains a cluster of three genes (MAL11, MAL12 and MAL13), but even though MAL13 and MAL11 were absent in the strain BT0510, deletions in this locus are a common event in natural populations and may not reflect in phenotypic effects regarding maltose utilization [38]. However, the absence of MAL11 (AGT1) does have phenotypic consequences for fermentation of other sugars transported by this permease, including maltotriose [39]. Maltotriose is the second predominant sugar in beer wort (15–20%) and has a special importance for the brewing industry. Studies with S cerevisiae strains indicate that AGT1 permease is required for efficient maltotriose consumption and fermentation, highlighting the importance of this membrane transporter protein to improve sluggish maltotriose fermentations [40, 41]. The absence of the MAL11 gene in the BT0510 strain precludes its use in beer-producing industries but does not prevent its use in other processes, such as bioethanol production, as mentioned in other studies with cachaça strains [5, 35]. While the loss of the MAL genes might have been an evolutionary feature of a strain adapted to ferment sugarcane juice, the loss of these genes is restricted to the BT0510 strain and was not found in the other two cachaça strains analysed. These genome differences highlight how the same fermentative process occurring in different geographic locations can be accomplished by strains with quite distinct genomic endowments. Whether the differences among the strains are due to genetic drift or distinct selective pressures in the separate locales will require additional studies.
Only a limited number of missing genes are observed in all three cachaça strains and these losses are not exclusive for these groups of yeast. Deletion of the cell wall–associated asparaginase gene cluster is observed in many yeast strains including those used in wine aging [29]. Likewise, the expansion and contraction of the DUP240 family is highly documented although this chromosomal instability is poorly understood [42]. The DUP240 family consists of 10 members of unknown function in the reference strain S288c. Seven genes are arranged as tandem repeat on chromosome I and VII and seem to be privileged sites of gene birth and death [42]. BT0510 lost six genes that belong to the tandemly repeated DUP240 genes present in chromosomes I and VII, and the DUP240 genes missing in all three strains belong to the chromosome VII. Identifying both differences and similarities among the different cachaça strains is important to defining their individual evolutionary trajectories and to understand how distinct environments affect the genotypes of these strains.
Genealogy constructed by populational-level sequencing of strains used for production of rum, a beverage also made with sugarcane juice, clusters these strains with a cachaça strain isolated in Brazil, as well as with other strains isolated from bioethanol distilleries from South America [43]. The genealogical tree constructed by Legras et al. [43] was inferred from SNPs of the strains analysed, and our results besides bringing more variation data (SNPs) for this specific group also carry gene content information for future evolutionary analysis. These data could be also important for transferring genetic traits between these yeast strains to improve the sugarcane juice fermentation process. These improvements can be made by the gene loss information that we identify in a strain with multiple resistance to stress.
Cachaça strains are polyphyletic [8], and since different strains appear to be associated with specific geographical locations, they might be used as a unique identifier to define protected denominations of origin (PDO) [33]. This designation represents an opportunity to protect local culture and specify a product with recognized qualities and characteristics [33, 44]. The use of microsatellite amplification to detect polymorphisms in yeast strains with the aim of bio-geographical identification has been described [33]. Here we demonstrated that whole-genome analysis is also a tool with high specificity to determine subtle changes between strains that could be used to obtain the PDO.
In conclusion, NGS has allowed detailed characterization of a cachaça strain that is used for different biotechnological products. The results of our whole-genome analysis will be useful for understanding critical characteristics of cachaça fermentation strains and provides additional genetic resources suitable for modern synthetic biology and genome editing tools.
Supplementary information
(PDF 22 kb)
Acknowledgements
P.M.B. Fernandes acknowledges Conselho Nacional de Pesquisa (CNPq) for her research productivity award (303432/2018-7). A.C.T. Costa acknowledges Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) for her scholarship (88882.385022/2009-01). T.F.S. Antunes acknowledges CAPES for her fellowship.
Code availability
Not applicable
Author contribution
All authors contributed to the study conception and design. Material preparation, data collection and first analysis were performed by ACTC, JH, TFSA and AMCS. The first draft of the manuscript was written by ACTC, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.
Funding
This work was supported by CNPq grant 458029/2014-9 to PMBF and by NIH grant GM076562 to JRB.
Data Availability
The genome sequences of S. cerevisiae BT0510 strain have been deposited in the European Nucleotide Archive (ENA) at the EMBL-EBI under accession number PRJEB36870 (https://www.ebi.ac.uk/ena/data/view/PRJEB36870) and sample accession SAMEA6593250.
Declarations
Ethics approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Consent to participate
Not applicable
Consent for publication
Not applicable
Conflict of interest
The authors declare that they have no conflict of interest.
Footnotes
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
References
- 1.Statista (2020) Spirits – worldwide. https://www.statista.com/outlook/10020000/100/spirits/worldwide. Accessed 21 May 2020
- 2.Badotti F, Gomes FCO, Rosa CA. Brazilian cachaça: fermentation and production. In: Hui YH, Evranuz EÖ, editors. Handbook of Plant-Based Fermented Food and Beverage Technology. Second. Florida: CRC Press; 2012. pp. 639–648. [Google Scholar]
- 3.Ministry of Industry FT and S (2020) Summary Export 2019 – NCM 22084000. http://comexstat.mdic.gov.br/pt/geral/12679. Accessed 21 May 2020
- 4.Campos CR, Silva CF, Dias DR, Basso LC, Amorim HV, Schwan RF (2009) Features of Saccharomyces cerevisiae as a culture starter for the production of the distilled sugar cane beverage, cachaça in Brazil. J Appl Microbiol. 10.1111/j.1365-2672.2009.04587.x [DOI] [PubMed]
- 5.Vianna CR, Silva CLC, Neves MJ, Rosa CA. Saccharomyces cerevisiae strains from traditional fermentations of Brazilian cachaça: trehalose metabolism, heat and ethanol resistance. Antonie Van Leeuwenhoek. 2008;93:205–217. doi: 10.1007/s10482-007-9194-y. [DOI] [PubMed] [Google Scholar]
- 6.Pataro C, Guerra JB, Petrillo-Peixoto ML, Mendonca-Hagler LC, Linardi VR, Rosa CA. Yeast communities and genetic polymorphism of Saccharomyces cerevisiae strains associated with artisanal fermentation in Brazil. J Appl Microbiol. 2000;89:24–31. doi: 10.1046/j.1365-2672.2000.01092.x. [DOI] [PubMed] [Google Scholar]
- 7.Oliveira VA, Vicente MA, Fietto LG, de Miranda CI, Coutrim MX, Schuller D, Alves H, Casal M, de Oliveira SJ, Araujo LD, da Silva PHA, Brandao RL. Biochemical and molecular characterization of Saccharomyces cerevisiae strains obtained from sugar-cane juice fermentations and their impact in cachaca production. Appl Environ Microbiol. 2008;74:693–701. doi: 10.1128/AEM.01729-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Barbosa R, Pontes A, Santos RO, Montandon GG, de Ponzzes-Gomes CM, Morais PB, Gonçalves P, Rosa CA, Sampaio JP. Multiple rounds of artificial selection promote microbe secondary domestication—the case of cachaça yeasts. Genome Biol Evol. 2018;10:1939–1955. doi: 10.1093/gbe/evy132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Bravim F, Palhano FL, Fernandes AAR, Fernandes PMB. Biotechnological properties of distillery and laboratory yeasts in response to industrial stresses. J Ind Microbiol Biotechnol. 2010;37:1071–1079. doi: 10.1007/s10295-010-0755-0. [DOI] [PubMed] [Google Scholar]
- 10.Bravim F, Lippman SI, da Silva LF, Souza DT, Fernandes AAR, Masuda CA, Broach JR, Fernandes PMB. High hydrostatic pressure activates gene expression that leads to ethanol production enhancement in a Saccharomyces cerevisiae distillery strain. Appl Microbiol Biotechnol. 2013;97:2093–2107. doi: 10.1007/s00253-012-4356-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Bravim F, Mota MM, Fernandes AAR, Fernandes PMB. High hydrostatic pressure leads to free radicals accumulation in yeast cells triggering oxidative stress. FEMS Yeast Res. 2016;16:fow052. doi: 10.1093/femsyr/fow052. [DOI] [PubMed] [Google Scholar]
- 12.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011;17:10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
- 13.Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M, Kulikov AS, Lesin VM, Nikolenko SI, Pham S, Prjibelski AD, Pyshkin AV, Sirotkin AV, Vyahhi N, Tesler G, Alekseyev MA, Pevzner PA. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 2012;19:455–477. doi: 10.1089/cmb.2012.0021. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–3212. doi: 10.1093/bioinformatics/btv351. [DOI] [PubMed] [Google Scholar]
- 15.Stanke M, Morgenstern B. AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res. 2005;33:W465–W467. doi: 10.1093/nar/gki458. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Garrison E, Marth G (2012) Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907(q-bio.GN)
- 18.Cingolani P, Platts A, Wang LL, Coon M, Nguyen T, Wang L, Land SJ, Lu X, Ruden DM. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff. Fly (Austin) 2012;6:80–92. doi: 10.4161/fly.19695. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Balakrishnan R, Park J, Karra K, Hitz BC, Binkley G, Hong EL, Sullivan J, Micklem G, Michael Cherry J (2012) YeastMine—an integrated data warehouse for Saccharomyces cerevisiae data as a multipurpose tool-kit. Database. 10.1093/database/bar062 [DOI] [PMC free article] [PubMed]
- 21.Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc. 2009;4:44–57. doi: 10.1038/nprot.2008.211. [DOI] [PubMed] [Google Scholar]
- 22.Cheong W-H, Tan Y-C, Yap S-J, Ng K-P. ClicO FS: an interactive web-based service of Circos: Fig. 1. Bioinformatics. 2015;31:3685–3687. doi: 10.1093/bioinformatics/btv433. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Robinson JT, Thorvaldsdóttir H, Winckler W, Guttman M, Lander ES, Getz G, Mesirov JP. Integrative genomics viewer. Nat Biotechnol. 2011;29:24–26. doi: 10.1038/nbt.1754. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Kobayashi O, Suda H, Ohtani T, Sone H. Molecular cloning and analysis of the dominant flocculation geneFLO8 fromSaccharomyces cerevisiae. MGG Mol Gen Genet. 1996;251:707–715. doi: 10.1007/BF02174120. [DOI] [PubMed] [Google Scholar]
- 25.Guo B, Styles CA, Feng Q, Fink GR. A Saccharomyces gene family involved in invasive growth, cell-cell adhesion, and mating. Proc Natl Acad Sci. 2000;97:12158–12163. doi: 10.1073/pnas.220420397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Delneri D, Gardner DC, Oliver SG. Analysis of the seven-member AAD gene set demonstrates that genetic redundancy in yeast may be more apparent than real. Genetics. 1999;153:1591–1600. doi: 10.1093/genetics/153.4.1591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Gomes FCO, Silva CLC, Marini MM, Oliveira ES, Rosa CA. Use of selected indigenous Saccharomyces cerevisiae strains for the production of the traditional cachaça in Brazil. J Appl Microbiol. 2007;103:2438–2447. doi: 10.1111/j.1365-2672.2007.03486.x. [DOI] [PubMed] [Google Scholar]
- 28.Li Q, Wang J, Liu C. Beers. In: Pandey A, Sanromán MÁ, Du G, Soccol CR, Dussap C-G, editors. Current developments in biotechnology and bioengineering. Amsterdam: Elsevier; 2017. pp. 305–351. [Google Scholar]
- 29.Eldarov MA, Beletsky AV, Tanashchuk TN, Kishkovskaya SA, Ravin NV, Mardanov AV. Whole-genome analysis of three yeast strains used for production of sherry-like wines revealed genetic traits specific to flor yeasts. Front Microbiol. 2018;9:9. doi: 10.3389/fmicb.2018.00965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Babrzadeh F, Jalili R, Wang C, Shokralla S, Pierce S, Robinson-Mosher A, Nyren P, Shafer RW, Basso LC, de Amorim HV, de Oliveira AJ, Davis RW, Ronaghi M, Gharizadeh B, Stambuk BU. Whole-genome sequencing of the efficient industrial fuel-ethanol fermentative Saccharomyces cerevisiae strain CAT-1. Mol Gen Genomics. 2012;287:485–494. doi: 10.1007/s00438-012-0695-7. [DOI] [PubMed] [Google Scholar]
- 31.Verstrepen KJ, Klis FM. Flocculation, adhesion and biofilm formation in yeasts. Mol Microbiol. 2006;60:5–15. doi: 10.1111/j.1365-2958.2006.05072.x. [DOI] [PubMed] [Google Scholar]
- 32.Soares TL, Silva CF, Schwan RF. Acompanhamento do processo de fermentação para produção de cachaça através de métodos microbiológicos e físico-químicos com diferentes isolados de Saccharomyces cerevisiae. Ciência Tecnol Aliment. 2011;31:184–187. doi: 10.1590/S0101-20612011000100027. [DOI] [Google Scholar]
- 33.Barbosa EA, Souza MT, Diniz RHS, Godoy-Santos F, Faria-Oliveira F, Correa LFM, Alvarez F, Coutrim MX, Afonso RJCF, Castro IM, Brandão RL. Quality improvement and geographical indication of cachaça (Brazilian spirit) by using locally selected yeast strains. J Appl Microbiol. 2016;121:1038–1051. doi: 10.1111/jam.13216. [DOI] [PubMed] [Google Scholar]
- 34.Alvarez F, da Mata Correa LF, Macedo Araújo T, Fernandes Mota BE, Ribeiro da Conceição LEF, de Miranda CI, Lopes Brandão R. Variable flocculation profiles of yeast strains isolated from cachaça distilleries. Int J Food Microbiol. 2014;190:97–104. doi: 10.1016/j.ijfoodmicro.2014.08.024. [DOI] [PubMed] [Google Scholar]
- 35.Araújo TM, Souza MT, Diniz RHS, Yamakawa CK, Soares LB, Lenczak JL, de Castro Oliveira JV, Goldman GH, Barbosa EA, Campos ACS, Castro IM, Brandão RL. Cachaça yeast strains: alternative starters to produce beer and bioethanol. Antonie Van Leeuwenhoek. 2018;111:1749–1766. doi: 10.1007/s10482-018-1063-3. [DOI] [PubMed] [Google Scholar]
- 36.Brexó RP, Andrietta MGS, Sant’Ana AS. Artisanal cachaça and brewer’s spent grain as sources of yeasts with promising biotechnological properties. J Appl Microbiol. 2018;125:409–421. doi: 10.1111/jam.13778. [DOI] [PubMed] [Google Scholar]
- 37.de Winde JH. Functional genetics of industrial yeasts. Berlin: Springer; 2003. [Google Scholar]
- 38.Naumov GI, Naumova ES, Michels CA. Genetic variation of the repeated MAL loci in natural populations of Saccharomyces cerevisiae and Saccharomyces paradoxus. Genetics. 1994;136:803–812. doi: 10.1093/genetics/136.3.803. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Trichez D, Knychala MM, Figueiredo CM, Alves SL, da Silva MA, Miletti LC, de Araujo PS, Stambuk BU. Key amino acid residues of the AGT1 permease required for maltotriose consumption and fermentation by Saccharomyces cerevisiae. J Appl Microbiol. 2019;126:580–594. doi: 10.1111/jam.14161. [DOI] [PubMed] [Google Scholar]
- 40.Alves SL, Herberts RA, Hollatz C, Trichez D, Miletti LC, de Araujo PS, Stambuk BU. Molecular analysis of maltotriose active transport and fermentation by Saccharomyces cerevisiae reveals a determinant role for the AGT1 permease. Appl Environ Microbiol. 2008;74:1494–1501. doi: 10.1128/AEM.02570-07. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Duval EH, Alves SL Jr, Dunn B, Sherlock G, Stambuk BU (2010) Microarray karyotyping of maltose-fermenting Saccharomyces yeasts with differing maltotriose utilization profiles reveals copy number variation in genes involved in maltose and maltotriose utilization. J Appl Microbiol. 10.1111/j.1365-2672.2009.04656.x [DOI] [PMC free article] [PubMed]
- 42.Leh-Louis V, Wirth B, Potier S, Souciet J-L, Despons L. Expansion and contraction of the DUP240 multigene family in Saccharomyces cerevisiae populations. Genetics. 2004;167:1611–1619. doi: 10.1534/genetics.104.028076. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Legras J-L, Galeote V, Bigey F, Camarasa C, Marsit S, Nidelet T, Sanchez I, Couloux A, Guy J, Franco-Duarte R, Marcet-Houben M, Gabaldon T, Schuller D, Sampaio JP, Dequin S. Adaptation of S. cerevisiae to fermented food environments reveals remarkable genome plasticity and the footprints of domestication. Mol Biol Evol. 2018;35:1712–1727. doi: 10.1093/molbev/msy066. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Etaio I, Gil PF, Ojeda M, Albisu M, Salmerón J, Pérez Elortondo FJ. Improvement of sensory quality control in PDO products: an example with txakoli white wine from Bizkaia. Food Qual Prefer. 2012;23:138–147. doi: 10.1016/j.foodqual.2011.03.008. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
(PDF 22 kb)
Data Availability Statement
The genome sequences of S. cerevisiae BT0510 strain have been deposited in the European Nucleotide Archive (ENA) at the EMBL-EBI under accession number PRJEB36870 (https://www.ebi.ac.uk/ena/data/view/PRJEB36870) and sample accession SAMEA6593250.



