Abstract
Teinturier grapevines, known for their pigmented flesh berries due to anthocyanin production, are valuable for enhancing the pigmentation of wine, for potential health benefits, and for investigating anthocyanin production in plants. Here, we assembled and annotated the Dakapo and Rubired genomes, two teinturier varieties. For Dakapo, we combined Nanopore sequencing, Illumina sequencing, and scaffolding to the existing grapevine assembly to generate a final assembly of 508.5 Mbp. Combining de novo annotation and lifting over annotations from the existing grapevine reference produced annotation 36,940 gene annotations for Dakapo. For Rubired, PacBio HiFi reads were assembled, scaffolded, and phased to generate a diploid assembly with two haplotypes 474.7–476.0 Mbp long. De novo annotation of the diploid Rubired genome yielded annotations for 56,681 genes. Both genomes are highly contiguous and complete. The Dakapo and Rubired genome assemblies provide genetic resources for investigations into berry flesh pigmentation and other traits of interest in grapevine.
Context
Domesticated grapevine (Vitis vinifera) is the fifth most produced fruit globally [1], with 80.1 million tonnes produced in 2022 alone [2]. Various grapevine varieties have been bred since its estimated domestication ∼11–15,000 years ago [3, 4], for both consumption as table grapes and winemaking purposes. This has resulted in the selection of numerous diverse phenotypes with significant variation in traits, including berry color and aromatic compounds, as well as more utilitarian traits like yield or biotic and abiotic stress resistance. The berry color is of particular importance in wine grapes due to how it influences wine color and quality. Generally, both consumers and experts prefer red wines with darker colorations [5, 6], making strong pigmentation in berries advantageous for wine producers. Pigmentation typically occurs only in the skin of ripened grapevine berries, with most grapevine varieties having white-colored flesh. The pigmentation within the berry skin is due to the production of anthocyanins, which are colored flavonoids that also act as antioxidants [7]. As anthocyanins significantly influence both the quality of wines and their health benefits, the genetic and molecular pathways involved in anthocyanin produced in berry skin have been of high interest and well-characterized [8].
Teinturier (also known as “dyer”) varieties produce berries with pigmented skin and flesh, as well as pigmented leaves. They are highly favorable for use in red wine blends, as they provide a deeper color. They also remain valuable resources for understanding the production of anthocyanins outside of berry skin. Dakapo and Rubired are two teinturier varieties that are widely grown, and Rubired was the 8th most crushed grapevine variety in California in 2022 [9]. Both varieties are descendants of Teinturier du Cher, a teinturier grape variety used in the 19th century to breed most teinturier varieties existing today, but of distinct generations. Dakapo was initially bred through a cross between Deckrot and Blauer Portugieser, with Deckrot being a direct descendant of Teinturier du Cher. Rubired is a hybrid grapevine variety bred through a cross between Tinto Cão and Alicante Ganzin, with Alicante Gazin being a fourth-generation descendent of Teinturier du Cher. While both Dakapo and Rubired are descendants of Teinturier du Cher, their ancestors were likely distinct clones of Teinturier du Cher based on previous genetic work in teinturier grapes [10] (Figure 1).
Figure 1.
The ancestry of Dakapo and Rubired, with berry skin and flesh color shown. Dakapo and Rubired are thought to have been bred from Teinturier du Cher clones with differing copy numbers of a 408 bp repeat within the promoter of VvMybA1, which is noted in the figure.
Teinturier varieties have substantially higher anthocyanin content in their berries than non-teinturier varieties due to the accumulation of anthocyanins within their berry flesh, which results in berry flesh pigmentation. Previous work showed that the juice produced with Dakapo berries had 39–91 times more anthocyanin content than commercial red grape juice [11]. Teinturier varieties themselves vary in anthocyanin content and the profiles of anthocyanins present within the berry flesh [10, 12]. Previous studies have made progress in investigating the genetic basis of berry flesh pigmentation and variation in overall anthocyanin production within teinturier grapes. A previous study [10] demonstrated that increased copies of a 408 bp repeat in the promoter of the gene VvMybA1, known as the grapevine color enhancer (GCE), is directly linked to increased anthocyanin production in teinturier berries. VvMybA1 plays a significant role in regulating anthocyanin production alongside VvMybA2, and many berry color mutants are the result of mutations impacting VvMybA1 [13–16]. A single copy of this 408 bp repeat is present upstream of VvMybA1 alleles for red- and white-skinned grapes with unpigmented flesh as well [10]; however, the allele responsible for white berry skin color also contains a Gret1 retrotransposon upstream of coding sequences [13]. In the allele responsible for white berry skin color, this Gret1 retrotransposon is present between VvMybA1 and the GCE, and is thought to block the expression of VvMybA1, causing a loss of pigmentation in berry skin [13]. Previous work [10] demonstrated that the copy number of this 408 bp sequence varied between varieties and that varieties derived from Teinturier du Cher had either two, three, or five copies of this 408 bp sequence within the promoter region, with Rubired and Dakapo having alleles with two and three copies of the repeat, respectively. More than one copy of this repeat in tandem was associated with berry flesh pigmentation in teinturier grapes. Increased copies of these repeats were correlated with increased expression of VvMybA1 and increased anthocyanin content within berry skin, berry flesh, and leaves [10]. Additional past work demonstrated that the alleles enabling berry flesh pigmentation in teinturier varieties appear to be dominant [17]. While past work greatly illuminated the genetic basis of increased anthocyanin production in Teinturier du Cher descendants, it is still unclear why different teinturier grapes have distinct anthocyanin profiles in berry flesh, regardless of the number of copies of the 408 bp sequence they have [12]. While overall anthocyanin content does correlate with the number of 408 bp repeats upstream of VvMybA1 [10], large differences in concentrations of specific anthocyanins exist between teinturier varieties with the same number of copies of the 408 bp repeat [12]. For example, the concentration of a specific type of anthocyanin, Cyanidin-3-O-glucoside, can vary from 1.0 to 21.0 mg/L when comparing the anthocyanin content within the flesh of teinturier berries from different varieties that all contain two copies of the 408 bp repeat [12]. The assembly and annotation of the genome of the Yan73 teinturier grapevine variety, which contains three copies of this 408 bp repeat, were recently generated and provided additional insight into the regulation of anthocyanin accumulation in Yan73 berry flesh [18]. However, the lack of additional genomic resources for teinturier grapes has inhibited further investigations into differences between teinturier varieties, and the genetic basis for these large differences in anthocyanin composition remains unclear.
Here, we sequenced, assembled, and annotated the Dakapo and Rubired genomes to provide additional resources for understanding teinturier varieties and to further enable their use in breeding programs. These genomes will greatly facilitate future work into understanding the regulation of anthocyanins within berry flesh. Beyond anthocyanin production, Dakapo and Rubired have also been utilized to research other traits in grapevine. A QTL mapping population of Dakapo × Cabernet Sauvignon) has been established and utilized to investigate Botrytis bunch rot in grapevine [19]. Additionally, Rubired is notable for being highly resistant to Xylella fastidiosa, which causes Pierce’s disease in grapevine [20–22]. As a result, we believe these high-quality reference genome assemblies and annotations will be a useful resource for the grapevine and plant science communities.
Methods
Plant material
Vitis vinifera plants of the Dakapo variety were planted in Madera, California, USA in 2011. Young leaf tissue samples for Oxford Nanopore Technologies (ONT) long-read sequencing were collected in July 2021. The samples were frozen and shipped on dry ice overnight. Plant material used in this study was also utilized in a previous study [23] as the “Dakapo WT” samples. For Rubired tissue, young leaves were collected from the accession Rubired Foundation Plant Services (FPS) clone 02 maintained by the Foundation Plant Services at the University of California, Davis.
DNA extraction and sequencing of Dakapo tissue
High molecular weight DNA was extracted, and a sequencing library was prepared for ONT sequencing by the Genomics Core at Michigan State University, as previously described [23], using the Oxford Nanopore Technologies Ligation Sequencing Kit (SQK-LSK109). The library was sequenced on a PromethION FLO-PRO002 flow cell (R9.4.1; Oxford Nanopore Technologies) on a PromethION24 (Oxford Nanopore Technologies) running MinKNOW Release 21.11.7 (Oxford Nanopore Technologies) [24], resulting in 61.9 Gbps of sequence (∼123.8× coverage) with a read length N50 of 13.2 kbp. Base calling and demultiplexing were performed using Guppy v5.1.13 (RRID:SCR_023196, Oxford Nanopore Technologies) with the High Accuracy base calling model. An additional 9.5 Gbps (∼18.9× coverage) of ONT sequencing data and 25.6 Gbps (∼51.3× coverage) of Illumina paired-end sequencing data previously published from the “Dakapo WT” sample described previously [23] were also utilized for this study (available on the NCBI Sequence Read Archive under BioProject PRJNA1020818; complete sequencing statistics in Supplementary Table S1 on GigaDB [25]). The additional ONT sequencing data used from previously published work [23] was generated using the same tissue and methods described here.
Dakapo genome assembly
Raw ONT sequencing data from this study and previous work with the same plant material [23] were combined. Adapters were trimmed using Porechop v0.2.4 [26] with the following settings: -- min_trim_size 5, -- extra_end_trim 2, -- end_threshold 80, -- middle_threshold 90, -- extra_middle_trim_good_side 2, -- extra_middle_trim_bad_side 50, and -- min_split_read_size 300. Reads mapping to the lambda phage genome were removed using NanoLyse v1.2.0 (RRID:SCR_024125) [27]. NanoFilt (RRID:SCR_016966) v2.8.0, with the flags -q 0 and -l 300, was used to remove low-quality reads and reads shorter than 300 base pairs (bp) [27]. The quality of the reads was analyzed using FastQC v0.11.9 (RRID:SCR_014583) [28], NanoStat v1.6.0 [27], and NanoPlot v1.38.0 (RRID:SCR_024128) [29]. ONT reads were then assembled using Flye v2.8.3-b1695 (RRID:SCR_017016) [30] for two iterations. One round of polishing was performed on the assembly using the ONT reads with Racon v1.4.20 (RRID:SCR_017642) [31] and the following settings: -- include-unpolished, -m 8, -x -6, -g -8, and -w 500. The assembly was then scaffolded to the 12X.v2 grapevine genome assembly [32] using RagTag v2.0.1 “scaffold” [33] with the following settings: -f 1000, -d 100000, -i 0.2, -a 0.0, -s 0.0, -r, -g 100, and -m 100000.
Paired-end Illumina reads were used for the final polishing of the scaffolded assembly. These were first trimmed using Trimmomatic v0.39 (RRID:SCR_011848) [34] to remove adapters and low-quality sequences, with the following settings: -phred33, ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:4:TRUE, LEADING:3, TRAILING:3, SLIDINGWINDOW:4:15, and MINLEN:30. The reads were then mapped to the draft genome assembly using BWA-MEM v0.7.17-r1188 (RRID:SCR_010910) [35] with the -M flag. PCR duplicate reads were removed using Picard MarkDuplicates v2.15.0 [36] with the -- REMOVE_DUPLICATES TRUE flag. The mapped reads with marked duplicates were then used to polish the draft assembly using Pilon v1.24 (RRID:SCR_014731) [37] with the -- fix all flag used to correct all errors identified and the -- diploid flag. Two iterations of Pilon polishing were performed.
Following polishing, haplotigs were removed using Purge Haplotigs v1.1.2 [38]. To do so, all prepped ONT reads were mapped to the Dakapo draft assembly using minimap2 v2.23-r1111 (RRID:SCR_018550) [39] with the flags -ax map-ont and -L, and purge_haplotigs hist was then run with default settings to generate a read-depth histogram of these mapped reads. Based on the histogram generated, purge_haplotigs cov was run with the previous output file and the following flags: -low 15, -mid 88, and -high 195. Finally, purge_haplotigs purge was run using the previous output file to purge haplotigs from the Dakapo draft assembly [38].
Before finalizing the Dakapo genome assembly, chr00 was split apart manually at gaps since it is an artificial chromosome of unmapped contigs from the 12X.v2 grapevine genome assembly [32] used for scaffolding. The assembly was also searched for microbial contamination using the gather-by-contig.py script adapted from [40], which utilizes sourmash (RRID:SCR_024347) and its pre-built database “GTDB R06-RS202 genomic representatives” [41]. No contamination was found from this process. The chromosome names were maintained from the scaffolding to the 12X.v2 grapevine genome assembly [32]. All other contigs were sorted and renamed in order of length, including the contigs split apart from chr00, using the custom script sort_rename_fasta.sh.
To assess the quality of the Dakapo genome assembly, we used BUSCO v5.2.2 (RRID:SCR_015008) [42] to check the completeness of the assembly when compared to the eudicots_odb10 dataset at each step of genome assembly and polishing. Assembly statistics were calculated using assembly-stats v1.0.1 (RRID:SCR_023963) [43]. Finally, the quality of repetitive sequences and intergenic space was also assessed by calculating the long terminal repeats (LTR) Assembly Index (LAI) for the Dakapo assembly using LTRs annotated by the Extensive de-novo transposable element (TE) Annotator (EDTA) (TE annotation methods are reported below).
Dakapo genome annotation
TEs and repeats in the Dakapo genome assembly were annotated using EDTA v1.9.4 (RRID:SCR_022063) [44] with the following flags: -- species others, -- step all, -- overwrite 1, -- sensitive 1, -- anno 1, -- evaluate 0, and -- force 1.
MAKER (RRID:SCR_005309) was used for de novo annotation of genes in the Dakapo genome. Before running MAKER, RNA-seq reads from diverse tissues in grapevine (including leaves, seeds, fruits, roots, and various floral components) and protein sequences from related species were used to provide the initial support for gene models. To do so, RNA-seq samples from previous studies [45–50] were downloaded from the NCBI Sequence Read Archive (SRA) using fasterq-dump v2.10.7 from the sra-toolkit (RRID:SCR_024350) [51] (Supplementary Table S2 on GigaDB [25] shows the SRA IDs of the specific files used). Trimmomatic v0.39 [34] was used to trim adapters from Illumina RNA-seq reads with the flags: -- phred33 and ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10. These were then mapped to the Dakapo genome assembly using HISAT2 v2.2.1 (RRID:SCR_015530) [52] with the -- phred33 flag. PacBio RNA-seq reads were mapped to the Dakapo genome assembly using minimap2 v2.23-r1111 [39] with the flags -ax splice:hq and -uf. Transcripts from these mapped RNA-seq reads were assembled using StringTie v2.2.1 (RRID:SCR_016323) [53] with the following flags: -c 1, -f 0.01, -m 200, -a 10, -j 1, -M 1, -s 4.75 (for mapped Illumina reads) or 1.5 (for mapped PacBio reads), and -g 50 (for mapped Illumina reads) or 0 (for mapped PacBio reads). The output files for all RNA-seq samples were converted to gff3 files using gffread v0.12.7 (RRID:SCR_018965) [54], combined, and then sorted using gff3_sort v2.1.0 [55].
Before protein sequences were aligned to the Dakapo genome assembly, repeats in the assembly were masked using RepeatMasker v4.1.2-p1 (RRID:SCR_012954) [56] with the TE library generated using EDTA [44] and the following flags: -e rmblast, -s, -norna, -xsmall, -gff, -html, and -source. Protein sequences from the Arabidopsis (Arabidopsis thaliana) Araport11 annotation [57], the Oryza sativa Release 7 annotation [58], and the Viridiplantae UniProtKB/Swiss-Prot reviewed protein sequence dataset from UniProt release 2023_05 [59] were aligned to the masked Dakapo genome assembly using exonerate v2.4.0 (RRID:SCR_016088) [60] with the following flags: -- model protein2genome, -- bestn 5, -- minintron 10, -- maxintron 5000, -- querychunktotal 5, -- targetchunktotal 10, -- showtargetgff yes, -- showalignment no, -- showvulgar no, -- ryo “>%qi length=%ql alnlen=%qal∖n>%ti length=%tl alnlen=%tal∖n”. The outputs for each dataset were combined, reformatted using the custom script reformat_exonerate_protein_gff.pl, and sorted using gff3_sort v2.1.0 [55].
MAKER v3.01.04 [61] was initially run on the Dakapo genome assembly with the gff files generated through transcript assembly and protein sequence alignment. These initial annotations generated by MAKER were then used to train SNAP (RRID:SCR_007936) and AUGUSTUS (RRID:SCR_008417). To train SNAP, maker2zff from MAKER was first used to convert genes to the ZFF format with the flag -x 0.1. This input was used with SNAP v2013_11_29 [62] to first categorize genes by running the command fathom to produce reformatted files, followed by the command forge to estimate parameters. Hidden Markov Models (HMMs) were created using hmm-assembler.pl from SNAP [62]. To train AUGUSTUS, maker2zff from MAKER was first used to convert genes to the ZFF format with the following flags: -c 0.5, -e 0.5, -o 0.5, -a 0, -t 0, -l 200, and -x 0.2. The fathom command from SNAP [62] followed by the custom script fathom_to_genbank.pl were then run to reformat the files and keep only 600 randomly sampled annotations. Fasta files of the subsetted genes were then generated using the custom script get_subset_of_fastas.pl. These subsetted genes were split into training and test files, and then autoAug.pl from AUGUSTUS v3.4.0 [63] was run to produce batch scripts that were then run. This step was repeated using the following flags with autoAug.pl: -useexisting and -- index=1. The sensitivity and specificity of the AUGUSTUS HMMs were evaluated by running the augustus command. A second round of MAKER v3.01.04 [61] was then run using the HMMs from SNAP and AUGUSTUS to produce gene annotations.
We then filtered annotations and flagged genes that may have actually been transposons annotated as genes using methods described previously [64]. To ensure that our annotations were as complete as possible, we used Liftoff v1.6.2 [65] to transfer annotations from the PN40024.v4 grapevine genome assembly [66] to the Dakapo genome. We then used the methods described previously [64] to assign “pseudogene” and “gene” labels to the lifted genes based on the confidence of the lifted gene model.
Finally, gene functions were assigned to each annotated gene by first using InterProScan v5.66-98.0 (RRID:SCR_005829) [67] to assign Pfam domains and corresponding gene ontology (GO) terms using the following flags: -appl pfam, -goterms, -pa, -dp, -iprlookup, -t p, and -f TSV. Then, Arabidopsis orthologs were identified by running DIAMOND v2.0.15.153 (RRID:SCR_009457) [68] with protein sequences from Dakapo and the TAIR10 Arabidopsis annotation [69] and the following flags: -- evalue 1e-6, -- max-hsps 1, -- max-target-seqs 5, and -- outfmt 0. The results from InterProScan, DIAMOND, and Arabidopsis gene functions and GO terms of orthologs [70] were all combined to generate a file with functional descriptions for each gene using the custom script create_functional_annotation_file.pl.
DNA extraction and sequencing of Rubired tissue
High-molecular-weight genomic DNA was extracted using the method previously described [71]. The PacBio highly accurate long reads (HiFi) library preparation and sequencing were performed as previously described [72]. The HiFi library fraction with a length >15 kbp was sequenced in two SMRT cells on a PacBio Sequel IIe platform at the DNA Technology Core Facility, University of California, Davis. The sequencing generated 31.3 Gbp sequences corresponding to 62.6× coverage with an N50 of 11.5 kbp.
Rubired genome assembly
The pseudomolecules of the Rubired genome were assembled, phased, and scaffolded using methods described previously [72]. Briefly, after testing multiple Hifiasm v.0.16.1-r374 (RRID:SCR_021069) [73] parameters, the best assembly obtained with the configuration ‘-a 4 -k 41 -w 71 -f 25 -r 4 -s 0.7 -D 3 -N 100 -n 25 -z 20’ was selected: it consisted of 273 contigs with an N50 = 12.9 Mb. An integrated phasing and scaffolding procedure further led to the construction of chromosome-scale pseudomolecules using HaploSync [74] combined with a high-density consensus map [75]. Two runs of HaploSplit were performed, followed by two runs of the HaploFill module [74]. The quality and completeness of the assembly for the Dakapo genome were assessed as described above.
Rubired genome annotation
The structural and functional annotation of the Rubired genome followed the exhaustive annotation pipeline previously described [76]. Briefly, high-quality Iso-Seq data from V. vinifera Cabernet Sauvignon [47], quality-based filtered RNA-Seq data from V. rupestris [77], and external databases were used to generate a collection of assemblies, alignments, ab initio predictions, and transcript/protein evidence. High-quality gene models were generated using PASA v2.3.3 (RRID:SCR_014656) [78] for training gene predictors, including Augustus v.3.0.3 [79], GeneMark v.3.47 (RRID:SCR_011930) [80], and SNAP v.2006-07-28 [62]. Ab initio predictions were produced using the aforementioned tools and BUSCO v.3.0.2 [81]. In parallel, repeat annotations were obtained using RepeatMasker v.open-4.0.6 [56]. Combined with the transcript alignments from PASA and protein alignments generated with Exonerate v.2.2.0 [60], all forms of evidence were merged into consensus gene models using EVidenceModeler v.1.1.1 (RRID:SCR_014659) [82]. Finally, functional annotations were attributed with Blast2GO v.4.1.9 (RRID:SCR_005828) [83], using results from DIAMOND blastp v.2.0.15.153 [68] against the Refseq plant protein database [84] and InterProScan v.5.28-67.0 [67].
Investigating the VvMybA1 sequences in the Dakapo and Rubired genomes
Teinturier grape varieties are known to have tandem copies of a 408 bp sequence within the promoter region of VvMybA1, a key gene in anthocyanin biosynthesis that leads to increased anthocyanin accumulation within their berries. Dakapo and Rubired contain three and two copies of this repeat, respectively [11]. To investigate the sequence similarities of these repeat sequences in our assembled genomes, we first used blastn (RRID:SCR_001598) from BLAST v2.10.0+ (RRID:SCR_004870) [85] to search for the locations of VvMybA1 within the Dakapo genome and both Rubired haplotypes. We then extracted the sequences of VvMybA1 and the 10 kbp surrounding the gene from the genomes using BEDTools v2.27.1 (RRID:SCR_006646) getfasta [86]. We searched these sequences for the 408 bp repeat identified previously [11] both manually and using blastn from BLAST v2.10.0+ [85].
Exploring synteny among various grapevine genomes
GENESPACE v1.3.1 [87] was used with MCScanX (RRID:SCR_022067) [88] and Orthofinder v2.5.5 (RRID:SCR_017118) [89] to align and plot protein sequences from the following chromosome-scale grapevine genomes: Dakapo, Rubired, Cabernet Franc (FPS clone 04) [74], Cabernet Sauvignon (FPS clone 08) [74], Chardonnay (FPS clone 04) [90], and Pinot Noir (FPS clone 123) [91]. Individual chromosomes from the Dakapo and Rubired assemblies were aligned using MUMmer v4.0.0rc1 (RRID:SCR_018171) [92]. To do so, first, the nucmer command was run with default settings. The command delta-filter was then run with the following flags: -i 90 -l 5000. Finally, plots were generated using the mummerplot command.
Data description and quality control
The Dakapo genome assembly and annotation
The Dakapo genome was assembled using ONT reads representing 142.4× coverage (based on a genome size of 500 Mbps). This draft assembly was then scaffolded to the 12X.v2 grapevine reference genome [32] and polished using both ONT reads (142.4× coverage) and Illumina reads (51.3× coverage) to produce the final genome assembly of 508.5 Mbp. The final genome assembly comprises 19 chromosomes and 542 unplaced contigs, with 96.3% of the Dakapo assembly sequence located on the chromosomes and 2,644 gaps of unknown sequence. The final genome assembly is highly contiguous, with an N50 of 25.6 Mbp, slightly higher than the PN40024.v4 assembly [66] and similar to the most recent PN40024 telomere-to-telomere (PN_T2T) assembly [93]. The Dakapo assembly has a high BUSCO score of 97.7% complete BUSCOs (94.5% single-copy BUSCOs and 3.2% duplicated BUSCOs), similar to prior PN40024 reference assemblies (97.8% for 12X.v2 [32], 98.3% for PN40024.v4 [66], and 98.4% for PN_T2T [93]). In addition, the Dakapo genome received a raw LAI score of 12.22 and thus contains a reference-quality assembly of repetitive/intergenic sequences [94] (Table 1).
Table 1.
Assembly statistics of the Dakapo and Rubired assemblies, along with previous grapevine reference genome assemblies. Comparison of Dakapo and Rubired genome assembly and annotation statistics with previous grapevine reference genomes (12X.v2 [32], PN40024.v4 [66], and PN_T2T [93]). The Rubired whole assembly contains both haplotypes and unplaced sequences.
Dakapo | Rubired whole assembly | Rubired haplotype-1 | Rubired haplotype-2 | 12X.v2 [32] | PN40024.v4 [66] | PN_T2T [93] | |
---|---|---|---|---|---|---|---|
Assembly size (Mbp) | 508.5 | 983.8 | 476.0 | 474.7 | 486.2 | 475.6 | 494.9 |
Number of contigs | 561 | 185 | 19 | 19 | 20 | 22 | 19 |
N50 (Mbp) | 25.6 | 24.9 | 24.7 | 24.9 | 24.3 | 24.4 | 25.9 |
Number of gaps | 2,644 | 97 | 38 | 59 | 15,325 | 4,019 | 0 |
Total complete BUSCO | 97.7% | 98.7% | 98.3% | 97.3% | 97.8% | 98.3% | 98.4% |
Raw LAI | 12.22 | N/A∗ | 15.22 | 15.62 | 9.4∗∗ | 13.97 | 14.29 |
Genes annotated | 36,940 | 56,681 | 27,586 | 27,799 | 42,414 | 35,230 | 37,534 |
∗The raw LAI score was not calculated for the whole assembly due to high sequence similarity between haplotypes, which would prevent an accurate calculation. ∗∗Previously calculated [94].
The Dakapo genome was annotated using a combination of de novo annotations using MAKER [61] and annotations lifted from PN40024.v4 [66] using Liftoff [65]. This resulted in 36,940 genes being annotated. We also annotated both TEs and repeat sequences and found that these comprised 45.38% of the genome, similar to what was previously reported in grapevine (41.4–51.1% [90, 95, 96]). LTRs make up a majority of the repetitive sequences annotated in the Dakapo genome and comprise 30.48% of the genome, with Gypsy LTRs specifically being the most abundant type, comprising 12.88% of the Dakapo genome sequence (Supplementary Table S3 on GigaDB [25]).
The Rubired genome assembly and annotation
The Rubired genome was sequenced with highly accurate long-read sequencing, generating 62.6× HiFi coverage (using a haploid genome size of 500 Mbp as reference). Pseudomolecules were constructed by scaffolding and phasing the assembly using HaploSync [74], generating two haplotypes comprising 19 chromosomes and averaging a total length of ∼475 Mbp. With complete BUSCO scores of 98.3% and 97.3% for haplotype-1 and haplotype-2, respectively (between 95.5–96.3% single copy BUSCOs and between 1.8–2.0% duplicated BUSCOs, respectively), and only 33 Mbp of unplaced sequences in the diploid assembly of the Rubired genome, the Rubired assembly is highly complete. Both genomes for the two Rubired haplotypes also have high raw LAI scores (15.22 for haplotype-1 and 15.62 for haplotype-2), demonstrating that the diploid Rubired genome contains a reference-quality assembly of repetitive/intergenic sequences that are likely more complete than the 12X.v2 [32], PN40024.v4 [66], and PN_T2T [93] assemblies (Table 1). The gene annotation resulted in 56,681 genes for the whole assembly, showing a chromosome anchoring of 97.7%, further supporting the reference quality of the assembly. A similar number of genes was annotated for both haplotypes (27,586 for haplotype-1 and 27,799 for haplotype-2), and very few were annotated on unplaced contigs (1,296). Overall, the genome was composed of 50.46% repetitive sequences with a clear accumulation in the unplaced sequences, with 74.34% of its sequences annotated as repeats. The repeat distribution was similar to the Dakapo genome, with Gypsy LTRs as the predominant repeats type, corresponding to 13.91% of the genome sequence (Supplementary Table S4 on GigaDB [25]).
Re-use potential
Grapevine varieties have been bred to produce berries in a variety of colors, commonly divided into red-, black-, and white-skinned berries that typically have white flesh. However, there are several varieties of teinturier grapes, which contain pigmented skin and pigmented flesh, including the Dakapo and Rubired varieties sequenced here. Here, we present high-quality genome assemblies and annotations for these two teinturier grape varieties. These two genomes were generated using distinct sequencing technologies, resulting in different assembly/annotation methods being used. As a result, the Rubired reference genome is haplotype-resolved and both more contiguous and complete than the Dakapo reference genome. Nonetheless, both assemblies are highly contiguous and complete, and will greatly facilitate future research. By assembling these genomes, we fully assembled VvMybA1 and the tandem repeat associated with anthocyanin content in teinturier grapes [10]. As expected, we found three tandem copies of this repeat within the promoter region of VvMybA1 (the VvMybA1t3 allele) in the Dakapo genome, exactly as described previously [10]. All three repeats contain identical 408 bp repeat sequences (Figure 2A). In addition, the Rubired haplotype-2 assembly contained two tandem copies of this repeat at the exact expected location (the VvMybA1t2 allele) [10], with both copies containing the same 408 bp sequence as those in Dakapo (Figure 2B). The Rubired haplotype-1 assembly did not contain teinturier-associated alleles but instead contained the VvMybA1a allele responsible for white berry skin color [13], as expected based on previous findings [10]. The VvMybA1a allele is distinct from teinturier alleles and other functional VvMybA1 alleles due to the presence of the Gret1 retrotransposon upstream of coding sequences [13]. However, it does contain the 408 bp repeat upstream of the Gret1 retrotransposon [10]. This repeat sequence is not perfectly identical to the repeat in Dakapo or Rubired haplotype-2 and instead contains three single base pair mutations within the sequence (Figure 2C).
Figure 2.
Diagrams of the VvMybA1 alleles in (A) the Dakapo assembly, (B) the Rubired haplotype-2 assembly, and (C) the Rubired haplotype-1 assembly. VvMybA1 is represented by the dark blue arrow, and the 408 bp repeats are shown in light blue boxes. Dakapo contains three tandem copies of the 408 bp repeat, while the Rubired haplotype-2 assembly contains two tandem copies. The Rubired haplotype-1 assembly contains the nonfunctional VvMybA1a allele with the Gret1 retrotransposon shown upstream of VvMybA1 in a light green box, truncated to fit in the figure. The three single nucleotide variants within the 408 bp repeat of the nonfunctional allele in Rubired haplotype-1 are indicated by arrows.
Beyond fully sequencing the VvMybA1 alleles of Dakapo and Rubired, these genomes will enable more insight into grapevine berry color by providing two high-quality teinturier grapevine genomes for future studies. As previously mentioned, teinturier grapes differ in the composition of total anthocyanins produced, and this phenomenon does not seem to be driven by differences in VvMybA1 alleles [12]. These genomes will provide resources for investigating the genetic mechanisms driving this phenomenon. Focusing on the berry color locus on chromosome 2 [97–99] and the anthocyanin locus on chromosome 14 [100], in particular, may provide insight into the regulation of specific anthocyanin molecules within the flesh of teinturier berries.
The Dakapo and Rubired genomes and annotations will also offer additional resources for future work in grapevines. Beyond berry flesh color, the Dakapo and Rubired genomes will also provide resources for investigating additional traits. For example, Dakapo is both frost-susceptible [101] and Botrytis-susceptible [19], while Rubired is notably highly mildew-resistant [102]. A QTL-mapping population generated through a cross between Dakapo × Cabernet Sauvignon has also been previously established [19], so the availability of this reference genome will greatly aid future studies with this population. These genomes will ultimately provide new resources for investigating a variety of grapevine traits, enabling advances in grapevine breeding and agriculture and allowing for comparisons between grapevine genomes (Figure 3).
Figure 3.
Synteny of several grapevine genomes with chromosome-scale assemblies, organized by berry and flesh color.
Initial comparisons of these assemblies to other grapevine genomes even revealed a putative large (1.82 Mbp) inversion on chromosome 10 within Dakapo (Figure 4) that contains 274 genes (Supplementary Table S5 on GigaDB [25]). This inversion appears to be absent in the other chromosome-scale grapevine-genome-assemblies compared in Figure 3. To ensure that this putative inversion in Dakapo was not due to scaffolding errors, we split apart chromosome 10 of Dakapo at gaps (introduced through scaffolding) to verify that no gaps were near the predicted inversion breakpoint and that these contigs still showed evidence of being inverted compared to other genomes. No assembly gaps were near the inversion breakpoints, with the closest gaps being ∼600 kbp and ∼1.3 Mbp away from the inversion breakpoints. Aligning contigs from the split apart chromosome 10 of Dakapo to chromosome 10 of the PN_T2T reference [93], using MUMmer v4.0.0rc1 [92] to produce dotplots as previously described, also showed evidence for linear yet inverted alignments in the region of the putative inversion (Supplementary Figure 1 on GigaDB [25]). To further verify the presence of the inversion, we mapped the trimmed Dakapo Illumina paired-end sequencing data used to assemble the genome to the Dakapo reference genome using bwa-mem2 v2.2.1 (RRID:SCR_022192) [103], marked duplicates using Picard v2.15.0 [36], and filtered the mapped reads using SAMtools v1.17 (RRID:SCR_002105) [104]. These paired-end reads mapped as expected to the Dakapo reference genome, with nearly all reads having proper insertion sizes and orientations, supporting the presence of this putative inversion (Supplementary Figures 2 and 3 on GigaDB [25]). Ultimately, the breakpoints of the putative inversion will need to be validated using PCR amplification and Sanger sequencing. This would also confirm if the putative inversion is heterozygous or homozygous in Dakapo.
Figure 4.
Alignment of chromosome 10 in Dakapo versus chromosome 10 in (A) the Rubired haplotype-1 assembly and (B) the Rubired haplotype-2 assembly, showing the putative 1.82 Mbp inversion present in Dakapo. The dotplots show forward matches in red and reverse matches in blue.
Inversions can cause changes in gene expression depending on various genetic factors [64, 105, 106]; hence, we were interested in whether the inversion of the putative Dakapo chromosome 10 could contribute to Dakapo’s increased cold susceptibility and/or increased pathogen susceptibility. Several genes within the inversion do appear to be involved in cold- and/or pathogen-responsive pathways, including VvDak_v1.10g0003381, whose Arabidopsis ortholog (AT3G07650) regulates the expression of genes within the cold acclimation pathway [107], and VvDak_v1.10g0003951, whose Arabidopsis and rice orthologs (AT4G03960 and OsPFA-DSP2, respectively) negatively regulate pathogen response pathways [108]. The implications of this inversion remain unclear; however, future research could unveil the potential phenotypic impacts of this putative inversion.
Grapevine is a useful model system due to its unique life and domestication history and is one of few lianas (woody vines) with robust genomic resources. In addition, grapevine breeding and propagation have been ongoing for millennia, resulting in a fascinating array of phenotypes and an abundance of accumulated somatic variants. The assemblies and annotations of the Dakapo and Rubired genomes add to a growing number of grapevine genomes that will provide valuable tools for both grapevine breeders and geneticists.
Availability of source code and requirements
Project name: Genome assembly and annotation for two teinturier grapevine varieties
Project home page: https://github.com/eleanore-ritter/teinturier-grapevine-genomes
Operating system: Platform-independent
Programming languages: bash, perl, python, and R
Other requirements: packages described in methods
License: Apache-2.0.
Acknowledgements
We are grateful to Dan Chitwood, Emily Josephs, and Robin Buell for helpful discussions on this work and feedback on this manuscript. We are grateful to the Genomics Core at Michigan State University, the Institute for Cyber-Enabled Research at Michigan State University, and the UC Davis DNA Technology Core for their services. We would like to thank Kevin Childs for providing guidance and custom scripts for genome annotation. We acknowledge Rosa Figueroa-Balderas for processing the samples, extracting the nucleic acids, and preparing the sequencing libraries for the Rubired genome. Lastly, we are grateful to the reviewers for their thorough review and suggestions, which have greatly strengthened the manuscript.
Funding Statement
The Dakapo genome work was supported by Michigan State University and the USDA National Institute of Food and Agriculture MICL02572. Oxford Nanopore Technologies provided sequencing for the Dakapo assembly. The Rubired genome was supported by E. & J. Gallo Winery and the NSF grant #1741627.
Data availability
The genomes and annotations for both Dakapo and Rubired are available on grapegenomics.com, Zenodo [108, 109], and GigaDB (which also includes snapshots of the code) [25]. Sequencing data from this study are provided on the NCBI Sequence Read Archive under BioProjects PRJNA1094988 and PRJNA1085245. Supplementary tables and figures are available on GigaDB [25].
List of abbreviations
EDTA, Extensive de-novo transposable element Annotator; FPS, Foundation Plant Services; GCE, grapevine color enhancer; GO, gene ontology; HiFi, highly accurate long reads; HMMs, Hidden Markov Models; LAI, long terminal repeats Assembly Index; LTR, long terminal repeats; ONT, Oxford Nanopore Technologies; PN_T2T, PN40024 telomere-to-telomere; SRA, Sequence Read Archive; TE, transposable element.
Declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
Peter Cousins is an employee of E. & J. Gallo Winery.
Authors’ contributions
CN envisioned the project, secured the funding, and supervised research on Dakapo. DC envisioned the project, secured the funding, and supervised research on Rubired. EJR assembled and annotated the Dakapo genome, designed and executed comparative analyses within the study, and wrote the first draft of the manuscript. NC and AM assembled and annotated the Rubired genome. PC led Dakapo vine cultivation and tissue sample collection and helped conceptualize comparative analyses within the study. All authors assisted with the final draft of the manuscript.
Funding
The Dakapo genome work was supported by Michigan State University and the USDA National Institute of Food and Agriculture MICL02572. Oxford Nanopore Technologies provided sequencing for the Dakapo assembly. The Rubired genome was supported by E. & J. Gallo Winery and the NSF grant #1741627.
References
- 1.FAO . Agricultural production statistics 2000–2022. FAOSTAT Analytical Briefs, No. 79. 2023; 10.4060/cc9205en. [DOI]
- 2.Statistics Department of the International Organisation of Vine and Wine . Annual Assessment of the World Vine and Wine Sector in 2022. International Organisation of Vine and Wine. 2022; https://www.oiv.int/sites/default/files/documents/OIV_Annual_Assessment-2023.pdf.
- 3.Dong Y, Duan S, Xia Q et al. Dual domestications and origin of traits in grapevine evolution. Science, 2023; 379: 892–901. doi: 10.1126/science.add8655. [DOI] [PubMed] [Google Scholar]
- 4.Xiao H, Liu Z, Wang N et al. Adaptive and maladaptive introgression in grapevine domestication. Proc. Natl. Acad. Sci. USA, 2023; 120: e2222041120. doi: 10.1073/pnas.2222041120. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Parpinello GP, Versari A, Chinnici F et al. Relationship among sensory descriptors, consumer preference and color parameters of Italian Novello red wines. Food Res. Int., 2009; 42: 1389–1395. doi: 10.1016/j.foodres.2009.07.005. [DOI] [Google Scholar]
- 6.Sáenz-Navajas M-P, Echavarri F, Ferreira V et al. Pigment composition and color parameters of commercial Spanish red wine samples: linkage to quality perception. Eur. Food Res. Technol., 2011; 232: 877–887. doi: 10.1007/s00217-011-1456-2. [DOI] [Google Scholar]
- 7.Flamini R, Mattivi F, De Rosso M et al. Advanced knowledge of three important classes of grape phenolics: anthocyanins, stilbenes and flavonols. Int. J. Mol. Sci., 2013; 14: 19651–19669. doi: 10.3390/ijms141019651. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.He F, Mu L, Yan G-L et al. Biosynthesis of anthocyanins and their regulation in colored grapes. Molecules, 2010; 15: 9057–9091. doi: 10.3390/molecules15129057. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.California Department of Food and Agriculture . California Grape Crush Final Report March 10, 2023. 2023; https://www.cdfa.ca.gov/mkt/grapecrush.html.
- 10.Röckel F, Moock C, Braun U et al. Color intensity of the red-fleshed berry phenotype of Vitis vinifera Teinturier grapes varies due to a 408 bp duplication in the promoter of VvmybA1 . Genes, 2020; 11: 891. doi: 10.3390/genes11080891. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Fröhling B, Patz CD, Dietrich H et al. Anthocyanins, total phenolics and antioxidant capacities of commercial red grape juices, black currant and sour cherry nectars. Fruit Process., 2012; 3: 100–104. ISSN 0939-4435. [Google Scholar]
- 12.Kőrösi L, Molnár S, Teszlák P et al. Comparative study on grape berry anthocyanins of various Teinturier varieties. Foods, 2022; 11(22): 3668. doi: 10.3390/foods11223668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kobayashi S, Goto-Yamamoto N, Hirochika H. . Retrotransposon-induced mutations in grape skin color. Science, 2004; 304: 982. doi: 10.1126/science.1095011. [DOI] [PubMed] [Google Scholar]
- 14.Walker AR, Lee E, Robinson SP. . Two new grape cultivars, bud sports of Cabernet Sauvignon bearing pale-coloured berries, are the result of deletion of two regulatory genes of the berry colour locus. Plant Mol. Biol., 2006; 62: 623–635. doi: 10.1007/s11103-006-9043-9. [DOI] [PubMed] [Google Scholar]
- 15.Yakushiji H, Kobayashi S, Goto-Yamamoto N et al. A skin color mutation of grapevine, from black-skinned Pinot Noir to white-skinned Pinot Blanc, is caused by deletion of the functional VvmybA1 allele. Biosci. Biotechnol. Biochem., 2006; 70: 1506–1508. doi: 10.1271/bbb.50647. [DOI] [PubMed] [Google Scholar]
- 16.Ferreira V, Pinto-Carnide O, Arroyo-García R et al. Berry color variation in grapevine as a source of diversity. Plant Physiol. Biochem., 2018; 132: 696–707. doi: 10.1016/j.plaphy.2018.08.021. [DOI] [PubMed] [Google Scholar]
- 17.Guan L, Fan P, Li S-H et al. Inheritance patterns of anthocyanins in berry skin and flesh of the interspecific population derived from teinturier grape. Euphytica, 2019; 215: 1–14. doi: 10.1007/s10681-019-2342-4. [DOI] [Google Scholar]
- 18.Zhang K, Du M, Zhang H et al. The haplotype-resolved T2T genome of teinturier cultivar Yan73 reveals the genetic basis of anthocyanin biosynthesis in grapes. Hortic. Res., 2023; 10: uhad205. doi: 10.1093/hr/uhad205. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Herzog K, Schwander F, Kassemeyer H-H et al. Towards sensor-based phenotyping of physical barriers of grapes to improve resilience to Botrytis Bunch Rot. Front. Plant Sci., 2021; 12: 808365. doi: 10.3389/fpls.2021.808365. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Rashed A, Daugherty MP, Almeida RPP. . Grapevine genotype susceptibility to Xylella fastidiosa does not predict vector transmission success. Environ. Entomol., 2011; 40: 1192–1199. doi: 10.1603/EN11108. [DOI] [PubMed] [Google Scholar]
- 21.Rashed A, Kwan J, Baraff B et al. Relative susceptibility of Vitis vinifera cultivars to vector-borne Xylella fastidiosa through time. PLoS One, 2013; 8: e55326. doi: 10.1371/journal.pone.0055326. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22.Wallis CM, Wallingford AK, Chen J. . Effects of cultivar, phenology, and Xylella fastidiosa infection on grapevine xylem sap and tissue phenolic content. Physiol. Mol. Plant Pathol., 2013; 84: 28–35. doi: 10.1016/j.pmpp.2013.06.005. [DOI] [Google Scholar]
- 23.Ritter EJ, Cousins P, Quigley M et al. From buds to shoots: insights into grapevine development from the Witch’s Broom bud sport. BMC Plant Biol., 2024; 24: 283. doi: 10.1186/s12870-024-04992-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.MinKNOW Release 21.11.7 . (Oxford Nanopore Technologies). https://nanoporetech.com/document/experiment-companion-minknow. Accessed 2 May 2024.
- 25.Ritter EJ, Cochetel N, Minio A. . Supporting data for “The assembly and annotation of two teinturier grapevine varieties, Dakapo and Rubired”. GigaScience Database, 2025; 10.5524/102660. [DOI] [Google Scholar]
- 26.Wick RR, Judd LM, Gorrie CL et al. Completing bacterial genome assemblies with multiplex MinION sequencing. Microb. Genom., 2017; 3: e000132. doi: 10.1099/mgen.0.000132. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.De Coster W, D’Hert S, Schultz DT et al. NanoPack: visualizing and processing long-read sequencing data. Bioinformatics, 2018; 34: 2666–2669. doi: 10.1093/bioinformatics/bty149. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Andrews S, et al. FastQC: A Quality Control Tool for High Throughput Sequence Data. Cambridge, UK: Babraham Bioinformatics, Babraham Institute, 2010; https://www.bioinformatics.babraham.ac.uk/projects/fastqc/. Accessed 2 May 2024. [Google Scholar]
- 29.De Coster W, Rademakers R. . NanoPack2: population-scale evaluation of long-read sequencing data. Bioinformatics, 2023; 39(5): btad311. doi: 10.1093/bioinformatics/btad311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.Kolmogorov M, Yuan J, Lin Y et al. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol., 2019; 37: 540–546. doi: 10.1038/s41587-019-0072-8. [DOI] [PubMed] [Google Scholar]
- 31.Vaser R, Sović I, Nagarajan N et al. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res., 2017; 27: 737–746. doi: 10.1101/gr.214270.116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Canaguier A, Grimplet J, Di Gaspero G et al. A new version of the grapevine reference genome assembly (12X.v2) and of its annotation (VCost.v3). Genom. Data, 2017; 14: 56–62. doi: 10.1016/j.gdata.2017.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Alonge M, Lebeigle L, Kirsche M et al. Automated assembly scaffolding using RagTag elevates a new tomato system for high-throughput genome editing. Genome Biol., 2022; 23: 258. doi: 10.1186/s13059-022-02823-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Bolger AM, Lohse M, Usadel B. . Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics, 2014; 30: 2114–2120. doi: 10.1093/bioinformatics/btu170. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Li H. . Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Genomics, 2013; 10.48550/arXiv.1303.3997. [DOI] [Google Scholar]
- 36.Broad Institute . Picard Toolkit (Version 2.15.0). 2017; https://github.com/broadinstitute/picard/releases/tag/2.15.0. Accessed 2 May 2024.
- 37.Walker BJ, Abeel T, Shea T et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One, 2014; 9: e112963. doi: 10.1371/journal.pone.0112963. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Roach MJ, Schmidt SA, Borneman AR. . Purge Haplotigs: allelic contig reassignment for third-gen diploid genome assemblies. BMC Bioinform., 2018; 19: 460. doi: 10.1186/s12859-018-2485-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Li H. . New strategies to improve minimap2 alignment accuracy. Bioinformatics, 2021; 37: 4572–4574. doi: 10.1093/bioinformatics/btab705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Titus Brown C. . Detecting microbial contamination in long-read assemblies (from known microbes). 2018; http://ivory.idyll.org/blog/2018-detecting-contamination-in-long-read-assemblies.html. Accessed 2 May 2024.
- 41.Titus Brown C, Irber L. . sourmash: a library for MinHash sketching of DNA. J. Open Source Softw., 2016; 1: 27, doi: 10.21105/joss.00027. [DOI] [Google Scholar]
- 42.Manni M, Berkeley MR, Seppey M et al. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Mol. Biol. Evol., 2021; 38: 4647–4654. doi: 10.1093/molbev/msab199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Pathogen Informatics, Wellcome Sanger Institute . assembly-stats (Version 1.0.1). 2020; https://github.com/sanger-pathogens/assembly-stats/releases/tag/v1.0.1-docker1. Accessed 2 May 2024.
- 44.Ou S, Su W, Liao Y et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol., 2019; 20: 275. doi: 10.1186/s13059-019-1905-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Perazzolli M, Moretto M, Fontana P et al. Downy mildew resistance induced by Trichoderma harzianum T39 in susceptible grapevines partially mimics transcriptional changes of resistant genotypes. BMC Genom., 2012; 13: 660. doi: 10.1186/1471-2164-13-660. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Da Silva C, Zamperin G, Ferrarini A et al. The high polyphenol content of grapevine cultivar tannat berries is conferred primarily by genes that are not shared with the reference genome. Plant Cell, 2013; 25: 4777–4788. doi: 10.1105/tpc.113.118810. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Minio A, Massonnet M, Figueroa-Balderas R et al. Iso-seq allows genome-independent transcriptome profiling of grape berry development. G3, 2019; 9: 755–767. doi: 10.1534/g3.118.201008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Vannozzi A, Palumbo F, Magon G et al. The grapevine (Vitis vinifera L.) floral transcriptome in Pinot noir variety: identification of tissue-related gene networks and whorl-specific markers in pre- and post-anthesis phases. Hortic Res., 2021; 8: 200. doi: 10.1038/s41438-021-00635-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Daldoul S, Hanzouli F, Hamdi Z et al. The root transcriptome dynamics reveals new valuable insights in the salt-resilience mechanism of wild grapevine (Vitis vinifera subsp. sylvestris). Front. Plant Sci., 2022; 13: 1077710. doi: 10.3389/fpls.2022.1077710. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Ma S-H, He G-Q, Navarro-Payá D et al. Global analysis of alternative splicing events based on long- and short-read RNA sequencing during grape berry development. Gene, 2023; 852: 147056. doi: 10.1016/j.gene.2022.147056. [DOI] [PubMed] [Google Scholar]
- 51.SRA Toolkit Development Team . sra-tools (Version 2.10.7). 2020; https://github.com/ncbi/sra-tools/releases/tag/2.10.7. Accessed 2 May 2024.
- 52.Kim D, Paggi JM, Park C et al. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol., 2019; 37: 907–915. doi: 10.1038/s41587-019-0201-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Shumate A, Wong B, Pertea G et al. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput Biol., 2022; 18: e1009730. doi: 10.1371/journal.pcbi.1009730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Pertea G, Pertea M. . GFF utilities: GffRead and GffCompare. F1000Res., 2020; 9: 304. doi: 10.12688/f1000research.23297.2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Chen M-JM, Lin H, Chiang L-M et al. The GFF3toolkit: QC and merge pipeline for genome annotation. Methods Mol Biol., 2019; 1858: 75–87. doi: 10.1007/978-1-4939-8775-7_7. [DOI] [PubMed] [Google Scholar]
- 56.Smit AFA, Hubley R, Green P. . RepeatMasker Open-4.0. [cited 2023 January 18]. 2013–2015; https://www.repeatmasker.org/.
- 57.Cheng C-Y, Krishnakumar V, Chan AP et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J., 2017; 89: 789–804. doi: 10.1111/tpj.13415. [DOI] [PubMed] [Google Scholar]
- 58.Kawahara Y, de la Bastide M, Hamilton JP et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice, 2013; 6: 4. doi: 10.1186/1939-8433-6-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.UniProt Consortium . UniProt: the Universal Protein Knowledgebase in 2023. Nucleic Acids Res., 2023; 51: D523–D531. doi: 10.1093/nar/gkac1052. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Slater GSC, Birney E. . Automated generation of heuristics for biological sequence comparison. BMC Bioinform., 2005; 6: 31. doi: 10.1186/1471-2105-6-31. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Holt C, Yandell M. . MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform., 2011; 12: 491. doi: 10.1186/1471-2105-12-491. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Korf I. . Gene finding in novel genomes. BMC Bioinform., 2004; 5: 59. doi: 10.1186/1471-2105-5-59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Stanke M, Diekhans M, Baertsch R et al. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 2008; 24: 637–644. doi: 10.1093/bioinformatics/btn013. [DOI] [PubMed] [Google Scholar]
- 64.Kollar LM, Stanley LE, Raju SKK et al. The role of breakpoint mutations, supergene effects, and ancient nested rearrangements in the evolution of adaptive chromosome inversions in the yellow monkey flower, Mimulus guttatus . bioRxiv. 2023; 10.1101/2023.12.06.570460. [DOI]
- 65.Shumate A, Salzberg SL. . Liftoff: accurate mapping of gene annotations. Bioinformatics, 2021; 37: 1639–1643. doi: 10.1093/bioinformatics/btaa1016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Velt A, Frommer B, Blanc S et al. An improved reference of the grapevine genome reasserts the origin of the PN40024 highly homozygous genotype. G3, 2023; 13. doi: 10.1093/g3journal/jkad067. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Jones P, Binns D, Chang H-Y et al. InterProScan 5: genome-scale protein function classification. Bioinformatics, 2014; 30: 1236–1240. doi: 10.1093/bioinformatics/btu031. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Buchfink B, Xie C, Huson DH. . Fast and sensitive protein alignment using DIAMOND. Nat. Methods, 2015; 12: 59–60. doi: 10.1038/nmeth.3176. [DOI] [PubMed] [Google Scholar]
- 69.Lamesch P, Berardini TZ, Li D et al. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic Acids Res., 2012; 40: D1202–D1210. doi: 10.1093/nar/gkr1090. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 70.Berardini TZ, Reiser L, Li D et al. The Arabidopsis information resource: making and mining the “gold standard” annotated reference plant genome. Genesis, 2015; 53: 474–485. doi: 10.1002/dvg.22877. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 71.Chin C-S, Peluso P, Sedlazeck FJ et al. Phased diploid genome assembly with single-molecule real-time sequencing. Nat. Methods, 2016; 13: 1050–1054. doi: 10.1038/nmeth.4035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Minio A, Cochetel N, Massonnet M et al. HiFi chromosome-scale diploid assemblies of the grape rootstocks 110R, Kober 5BB, and 101–14 Mgt. Sci. Data, 2022; 9: 1–8. doi: 10.1038/s41597-022-01753-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Cheng H, Concepcion GT, Feng X et al. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods, 2021; 18: 170–175. doi: 10.1038/s41592-020-01056-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 74.Minio A, Cochetel N, Vondras AM et al. Assembly of complete diploid-phased chromosomes from draft genome sequences. G3, 2022; 12(8): jkac143. doi: 10.1093/g3journal/jkac143. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Zou C, Karn A, Reisch B et al. Haplotyping the Vitis collinear core genome with rhAmpSeq improves marker transferability in a diverse genus. Nat. Commun., 2020; 11: 413. doi: 10.1038/s41467-019-14280-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 76.Cochetel N, Minio A, Massonnet M et al. Diploid chromosome-scale assembly of the Muscadinia rotundifolia genome supports chromosome fusion and disease resistance gene expansion during Vitis and Muscadinia divergence. G3, 2021; 11(4): jkab033. doi: 10.1093/g3journal/jkab033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Cochetel N, Minio A, Guarracino A et al. A super-pangenome of the North American wild grape species. Genome Biol., 2023; 24: 290. doi: 10.1186/s13059-023-03133-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Haas BJ, Delcher AL, Mount SM et al. Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res., 2003; 31: 5654–5666. doi: 10.1093/nar/gkg770. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Stanke M, Keller O, Gunduz I et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res., 2006; 34: W435–W439. doi: 10.1093/nar/gkl200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Lomsadze A, Ter-Hovhannisyan V, Chernoff YO et al. Gene identification in novel eukaryotic genomes by self-training algorithm. Nucleic Acids Res., 2005; 33: 6494–6506. doi: 10.1093/nar/gki937. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Waterhouse RM, Seppey M, Simão FA et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol. Biol. E, 2018; 35: 543–548. doi: 10.1093/molbev/msx319. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Haas BJ, Salzberg SL, Zhu W et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol., 2008; 9: R7. doi: 10.1186/gb-2008-9-1-r7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Götz S, García-Gómez JM, Terol J et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res., 2008; 36: 3420–3435. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.O’Leary NA, Wright MW, Brister JR et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res., 2016; 44: D733–D745. doi: 10.1093/nar/gkv1189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.Camacho C, Coulouris G, Avagyan V et al. BLAST+: architecture and applications. BMC Bioinform., 2009; 10: 421. doi: 10.1186/1471-2105-10-421. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 86.Quinlan AR, Hall IM. . BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 2010; 26: 841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 87.Lovell JT, Sreedasyam A, Schranz ME et al. GENESPACE tracks regions of interest and gene copy number variation across multiple genomes. Elife, 2022; 11: e78526. doi: 10.7554/eLife.78526. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Wang Y, Tang H, Debarry JD et al. MCScanX: a toolkit for detection and evolutionary analysis of gene synteny and collinearity. Nucleic Acids Res., 2012; 40: e49. doi: 10.1093/nar/gkr1293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Emms DM, Kelly S. . OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol., 2019; 20: 238. doi: 10.1186/s13059-019-1832-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Zhou Y, Minio A, Massonnet M et al. The population genetics of structural variants in grapevine domestication. Nat. Plants, 2019; 5: 965–979. doi: 10.1038/s41477-019-0507-8. [DOI] [PubMed] [Google Scholar]
- 91.Cantu Lab . Vitis vinifera cv. Pinot Noir cl. FPS123. grapegenomics.com. 2024; https://www.grapegenomics.com/pages/VvPinNoir/VvPinNoir123/. Accessed 2 May 2024.
- 92.Marçais G, Delcher AL, Phillippy AM et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol., 2018; 14: e1005944. doi: 10.1371/journal.pcbi.1005944. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Shi X, Cao S, Wang X et al. The complete reference genome for grapevine (Vitis vinifera L.) genetics and breeding. Hortic. Res., 2023; 10: uhad061. doi: 10.1093/hr/uhad061. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Ou S, Chen J, Jiang N. . Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res., 2018; 46: e126. doi: 10.1093/nar/gky730. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 95.Minio A, Massonnet M, Figueroa-Balderas R et al. Diploid genome assembly of the wine grape Carménère. G3, 2019; 9: 1331–1337. doi: 10.1534/g3.119.400030. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Jaillon O, Aury J-M, Noel B et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature, 2007; 449: 463–467. doi: 10.1038/nature06148. [DOI] [PubMed] [Google Scholar]
- 97.Doligez A, Bouquet A, Danglot Y et al. Genetic mapping of grapevine (Vitis vinifera L.) applied to the detection of QTLs for seedlessness and berry weight. Theor. Appl. Genet., 2002; 105: 780–795. doi: 10.1007/s00122-002-0951-z. [DOI] [PubMed] [Google Scholar]
- 98.Walker AR, Lee E, Bogs J et al. White grapes arose through the mutation of two similar and adjacent regulatory genes. Plant J., 2007; 49: 772–785. doi: 10.1111/j.1365-313X.2006.02997.x. [DOI] [PubMed] [Google Scholar]
- 99.Azuma A, Kobayashi S, Goto-Yamamoto N et al. Color recovery in berries of grape (Vitis vinifera L.) ‘Benitaka’, a bud sport of ‘Italia’, is caused by a novel allele at the VvmybA1 locus. Plant Sci., 2009; 176: 470–478. doi: 10.1016/j.plantsci.2008.12.015. [DOI] [PubMed] [Google Scholar]
- 100.Matus JT, Cavallini E, Loyola R et al. A group of grapevine MYBA transcription factors located in chromosome 14 control anthocyanin synthesis in vegetative organs with different specificities compared with the berry color locus. Plant J., 2017; 91: 220–236. doi: 10.1111/tpj.13558. [DOI] [PubMed] [Google Scholar]
- 101.Lisek J. . Winter frost injury of buds on one-year-old grapevine shoots of cultivars and interspecific hybrids in Poland. Folia Hortic., 2012; 24: 97–103. doi: 10.2478/v10245-012-0010-4. [DOI] [Google Scholar]
- 102.Doster MA. . Effects of leaf maturity and cultivar resistance on development of the powdery mildew fungus on grapevines. Phytopathology, 1985; 75: 318–321. doi: 10.1094/PHYTO-75-318. [DOI] [Google Scholar]
- 103.Vasimuddin M, Misra S, Li H et al. Efficient architecture-aware acceleration of BWA-MEM for multicore systems. In: 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS). IEEE, 2019; pp. 314–324, doi: 10.1109/IPDPS.2019.00041. [DOI] [Google Scholar]
- 104.Danecek P, Bonfield JK, Liddle J et al. Twelve years of SAMtools and BCFtools. Gigascience, 2021; 10(2): giab008. doi: 10.1093/gigascience/giab008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 105.Puig M, Casillas S, Villatoro S et al. Human inversions and their functional consequences. Brief. Funct. Genomics, 2015; 14: 369–379. doi: 10.1093/bfgp/elv020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 106.Loveland JL, Lank DB, Küpper C. . Gene expression modification by an autosomal inversion associated with three male mating morphs. Front. Genet., 2021; 12: 641620. doi: 10.3389/fgene.2021.641620. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 107.Li Y, Shi Y, Li M et al. The CRY2-COP1-HY5-BBX7/8 module regulates blue light-dependent cold acclimation in Arabidopsis. Plant Cell, 2021; 33: 3555–3573. doi: 10.1093/plcell/koab215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 108.He H, Su J, Shu S et al. Two homologous putative protein tyrosine phosphatases, OsPFA-DSP2 and AtPFA-DSP4, negatively regulate the pathogen response in transgenic plants. PLoS One, 2012; 7: e34995. doi: 10.1371/journal.pone.0034995. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 109.Ritter E, Cousins P, Niederhuth C. . Genome Release: Vitis vinifera cv. Dakapo v1 [Data set]. Zenodo. 2024; 10.5281/zenodo.12728405. [DOI]
- 110.Minio A, Cochetel N, Figueroa-Balderas R et al. Grapegenomics.com - Genome release: Vitis interspecific cross - Rubired cl. FPS02 - chromosome scale [Data set]. Zenodo. 2024; 10.5281/zenodo.10543934. [DOI]