De novo transcriptome assembly of cold stressed clones of the hexaploid Sequoia sempervirens (D. Don) Endl

Natalie Breidenbach; Vadim V Sharov; Oliver Gailing; Konstantin V Krutovsky

doi:10.1038/s41597-020-00576-1

. 2020 Jul 17;7:239. doi: 10.1038/s41597-020-00576-1

De novo transcriptome assembly of cold stressed clones of the hexaploid Sequoia sempervirens (D. Don) Endl.

Natalie Breidenbach ¹, Vadim V Sharov ^2,³, Oliver Gailing ^1,⁴, Konstantin V Krutovsky ^1,^4,^5,^6,^7,^✉

PMCID: PMC7367877 PMID: 32681057

Abstract

Coast redwood is a very important endemic conifer timber species in Southern Oregon and Northern California in the USA. Due to its good wood properties and fast growth rate it can be considered as a prospective timber species also in other countries with similar or changing toward similar climatic conditions due to global climate warming, such as Germany. In general, it is frost sensitive and suffers from freezing temperatures. To study genetic mechanisms of frost resistance in this species and to select the most frost tolerant trees we tested 17 clones in climate control chamber experiments and generated two de novo assemblies of the coast redwood transcriptome from a pooled RNA sample using Trinity and CLC Genomic Workbench software, respectively. The hexaploid nature of the coast redwood genome makes it very challenging to successfully assemble and annotate the coast redwood transcriptome. The de novo transcriptome assembly generated by Trinity and CLC considering only reads with a minimum length of 180 bp and contigs no less than 200 bp long resulted in 634,772 and 788,464 unigenes (unique contigs), respectively.

Subject terms: Gene expression, Abiotic

Measurement(s)	RNA • transcriptome • sequence_assembly • sequence feature annotation
Technology Type(s)	RNA sequencing • sequence assembly process • sequence annotation
Factor Type(s)	temperature treatment
Sample Characteristic - Organism	Sequoia sempervirens

Open in a new tab

Machine-accessible metadata file describing the reported data: 10.6084/m9.figshare.12554549

Background & Summary

Coast redwood (Sequoia sempervirens (D. Don) Endl.) is an endemic forest tree conifer species occupying a narrow range along the Pacific Northwest coast in southern Oregon and northern California, USA. It is a valuable timber species characterized by fast growth rate and good quality wood¹. The species has been planted successfully in some other countries for commercial wood production², but in Germany it is used currently rather as a decorative, exotic species, mainly due to its sensitivity to freezing temperatures. However, some coast redwood trees survived freezing temperatures in Germany demonstrating cold-tolerance. Considering global warming and climate change, this species can be potentially considered as a prospective commercial timber species for future German sustainable forestry. To study genetic mechanisms of cold-resistance and to select frost-resistant coast redwood trees we tested replicates of 17 different coast redwood clones of diverse origin (Table 1) in a climate control chamber under a freezing temperature of up to −10 °C. Samples included the ‘Filoli-phenotype’ clones and clones from two trees growing in Germany that are considered as frost resistant. RNA isolated from 12 clones from different temperature treatments was used to generate two de novo assemblies of a coast redwood transcriptome using Trinity and CLC Genomic Workbench software considering only reads with a minimum length of 180 bp and contigs no less than 200 bp long. Coast redwood is a hexaploid species and is very difficult to study. Its genome has only recently been sequenced, and the genome assembly has been made publicly available (https://nealelab.ucdavis.edu/redwood-genome-project-rgp, accessed in May 2019), but it is still unpublished and not annotated. Published transcriptome data are also limited³. Two transcriptome assemblies obtained in our study provide additional invaluable genomic resources and can support further coast redwood genetic studies including those concerning response of this and other conifer species to frost stress or other environmental stresses in general. We also hope that our experience with de novo sequencing, assembling and annotating the transcriptome of this difficult non-model polyploid species can help other similar studies.

Table 1.

Coast redwood clones tested in climate control chamber at freezing temperature −10 °C.

Clone	Origin	Latitude	Longitude
ANG3*	Angwin, USA	38.534967	−122.429347
ANG4*	Angwin, USA	38.534967	−122.429347
B167	Freshwater Creek, USA	40.75	−124.05
BLU71*	Filoli Phenotype, USA	unknown	unknown
BLU94*	Filoli Phenotype, USA	unknown	unknown
L19	Patrick Creek, USA	41.816667	−123.933333
L20	Patrick Creek, USA	41.816667	−123.933333
NAV1*	Navarro, USA	39.151944	−123.541944
NAV3	Navarro, USA	39.151944	−123.541944
NO1*	Northern California, USA	unknown	unknown
NO3*	Northern California, USA	unknown	unknown
SA1*	Santa Cruz, USA	36.971944	−122.026389
SA2	Santa Cruz, USA	36.971944	−122.026389
SF1*	Sequoiafarm Kaldenkirchen, Germany	51.308117	6.171964
SF3*	Sequoiafarm Kaldenkirchen, Germany	51.308117	6.171964
WI3	Winchuk, USA	42.05	−124.215278
WI4*	Winchuk, USA	42.05	−124.215278

Open in a new tab

*These 11 clones with successful RNA extraction were included in the pooled sample used for the sequencing and de novo transcriptome assembly.

Methods

Plant material

In January 2018, 17 nine-month-old clones were tested in a climate chamber under controlled light and temperature conditions (Fig. 1). The frost experiment followed a modified version of the experimental design used by Arbaoui et al.⁴ and consisted of a hardening phase at 5 °C for 48 h and at 0 °C for 72 h with 12 h of dark and a low light intensity for 12 h followed by freezing temperatures at −10 °C for 12 h simulating a freezing winter night and 12 h at 0 °C with lights on simulating a winter day, respectively, repeated twice. The experiment started with lights off. For each temperature treatment at 5 °C, 0 °C, and −10 °C, the positions of 2–4 ramets per clone were randomly rearranged within the climate control chamber to minimize possible effects of micro-spatial climatic differences in the chamber (Fig. 1). After each treatment a single entire ramet of each clone was harvested and immediately frozen in liquid nitrogen. The samples were stored at −60 °C until RNA extraction.

RNA extraction

Needle tissue was ground in liquid nitrogen, and its RNA was extracted following the CTAB protocol of Chang et al.⁵ After extraction, each sample was treated with 1 µl DNAse (Thermo Fisher, Waltham, MA, USA). RNA quality and integrity were assessed using the Fragment Analyzer System and standard sensitivity RNA Analysis Kit DNF-471 (Agilent Technologies, Inc., Santa Clara, CA, USA). All samples selected for sequencing had an RNA integrity number over 8.

RNA sequencing

Only clones with sufficient quality RNA at all three temperature treatments were used for sequencing. In total, ramets of 11 clones representing all three temperature treatments resulted in 93 samples that were equimolarily pooled into one sample and sequenced at the NGS Integrative Genomics Core Unit, University Medical Center, Göttingen (Fig. 2). A paired-end (PE) RNA-seq library was prepared using the pooled RNA sample and a non-stranded, massively-parallel cDNA sequencing (mRNA-Seq) protocol with the TruSeq mRNA prep Kit (Cat. No. RS-122-2101) from Illumina, Inc. (San Diego, CA, USA). The ligation step in the protocol was optimized by diluting the adapter concentrations to increase ligation efficiency (>94%), and the number of PCR cycles was reduced to avoid PCR duplication artefacts as well as primer dimers in the final library. The fluorometric based QuantiFluor™dsDNA System (Promega GmbH, Mannheim, Germany) was used for accurate quantitation of the cDNA library. The size of the final cDNA library was determined by using the dsDNA 905 Reagent Kit (Agilent Technologies, Inc., Santa Clara, CA, USA) with sizing range of 35–500 bp and resolution of 3–5 bp at 300 bp on average. The PE library was sequenced in two flow cell lanes on the Illumina HiSeq 2500 with a rapid mode and 2 × 250 cycles. Sequence images were translated to BCL files by the Illumina software BaseCaller and then demultiplexed to fastq files using bcl2fastq v2.17.1.14 software. In total, ~370.7 M paired-end reads were generated, and after the quality trimming, minimum length filtering at 180 bp, and contamination removing ~95.9 M paired-end reads with an average length of 424 bp and total ~40.7 Gb were used for transcriptome assembly and submitted to the NCBI Genbank SRA public database⁶.

Fig. 2 — Overview of the experiment and analysis. 17 coast redwood clones were tested at three temperature levels. RNA was isolated in sufficient quality and quantity for sequencing from 93 samples representing 11 different clones and equimolarily pooled. In total, ~370.7 M paired-end reads were generated. *De novo* assembly was done with Trinity and CLC Genomic Workbench software using ~95.9 M paired-end quality trimmed and filtered by minimum size of 180 bp reads, respectively. The Trinity *de novo* assembly based on the filtered reads was annotated using Blast2GO Pro.

Transcriptome de novo assembly

The reads were trimmed using Phred quality score 30 and base call accuracy of 99.9%. As a result, 1.35% of the reads were trimmed, which reduced the average read length by about 0.5 bp. Then, only reads with a minimum length of 180 bp were used for transcriptome assembly. Based on these reads two transcriptome de novo assemblies were generated using two programs - Trinity and CLC Genomics Workbench, respectively. The Trinity assembly was normalized to a maximum read coverage of 30X. This value is less than the Trinity default of 50X, but it is in agreement with a value of at least 30, which is recommended by the authors of this algorithm in the supplement (S4) to Haas et al.⁷ The CLC assembly was carried out with default settings considering a minimum unigene (contig) length of 200 bp. Assemblies from both softwares showed signs of inflation and overrepresentation (Table 2). The best results were obtained for the transcriptome assembled using the Trinity software. This assembly based on the filtered reads was annotated using Blast2GO Pro. MIcroSAtellite (MISA) identification online tool⁸ was used with default parameters to identify microsatellite loci (Summary of the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads⁹) with di-, tri-, tetra-, penta- and hexanucleotide motifs in this Trinity assembly, and PCR primers (PCR primers designed for the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads using Primer3 online tool⁹) were designed for these loci using the Primer3 tool.

Table 2.

Summary statistics of two coast redwood transcriptome de novo assemblies generated using Trinity and CLC Genomics Workbench software considering only reads with a minimum length of 180 bp and contigs no less than 200 bp long, respectively.

Unigenes/contigs	Trinity	CLC
Total number	622955	773507
L50	89696	206876
Max length, bp	29218	21583
N50, bp	1391	419
N80, bp	457	240
Total length, Mbp	522.0	306.1

Open in a new tab

Transcriptome functional annotation

Using blastx search with gilist taxid option for “Green plants” homologs were identified for the contigs of the Trinity assembly in the GenBank nr database. Then, the blastx output data were sorted out by the Blast2GO PRO program using the “Gene Ontology Mapping” function.

Data Records

The filtered and cleaned original RNA sequencing data have been deposited at the NCBI Sequence Read Archive under the SRA study accession SRP227297 (https://identifiers.org/ncbi/insdc.sra:SRP227297). The contigs for the Trinity¹⁰ and CLC¹¹ transcriptome assemblies have been deposited as Transcriptome Shotgun Assembly (TSA) projects at DDBJ/EMBL/NCBI GenBank under the accession numbers GIBU00000000 (https://identifiers.org/ncbi/insdc:GIBU00000000) and GIDF00000000 (https://identifiers.org/ncbi/insdc:GIDF00000000), respectively. Functional annotation of the Trinity transcriptome assembly is available as a supplementary gff file at figshare (Functional annotation of the Trinity transcriptome assembly⁹). Summary of the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads is available as a supplementary excel file at figshare (Summary of the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads⁹). PCR primers designed for these microsatellite loci using Primer3 online tool are available as a supplementary excel file at figshare (PCR primers designed for the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads using Primer3 online tool⁹).

Technical Validation

Quality control

The quality check was done using FastQC¹² v. 0.11.5. Using blastx and Blast2GO, 418,576 (67%) out of total 622,955 unigenes (contigs) were mapped and 176,683 (28%) annotated; 130,013 (21%) had no blast hits, and 316,259 (51%) had hits but were not annotated (Fig. 3). The largest number of blast hits represented Picea sitchensis, followed by the algae Coccomyxa subellipsoidea C-169 and Quercus suber (Fig. 4).

Fig. 3 — Annotation results for the Trinity assembly based on the filtered reads.

Fig. 4 — Number of top-hit coast redwood sequences matching other species sequences based on blastx.

Gene Ontology analysis

Distribution of the Gene Ontology (GO, http://geneontology.org) terms demonstrated that within the biological processes the most frequent were metabolic and cellular processes that were represented by more than 90 000 unigene (contig) sequences. Response to stimulus was the third most common process represented by approximately 20 000 unigene (contig) sequences (Fig. 5). More than 100 000 unigene (contig) sequences were associated with catalytic activity in the metabolic functions. Unigene (contig) sequences associated with cell parts and cell membrane were the most common in the cellular components level. The KEGG (Kyoto Encyclopedia of Genes and Genomes; https://www.kegg.jp) annotation revealed that annotated sequences represented mostly carbohydrate, amino acid, cofactor, and vitamin related metabolism (Fig. 6). Many of them could be potentially involved in response to frost.

Fig. 5 — The Gene Ontology (GO) term distributions for biological processes (green), metabolic function (blue) and cellular components (yellow).

Fig. 6 — Number of enzymes (upper chart) and unigenes (lower chart) partitioned into 17 classes according to the KEGG (Kyoto Encyclopedia of Genes and Genomes) biological pathways.

BUSCO analysis

To test transcriptome assemblies for completeness, a search for conserved orthologous genes was done in both transcriptome assemblies using the Benchmarking Universal Single-Copy Orthologs (BUSCO) program¹³. The plant databases viridiplantae_odb10 (“green plants”, creation date: 2019–11–20, number of species: 57, number of BUSCOs: 425) and embryophyta_odb10 (“land plants”, creation date: 2019–11–20, number of species: 50, number of BUSCOs: 1614) were used as lineage-specific datasets. The results are presented in Table 3 and demonstrate that both assemblies are rather complete, but the Trinity assembly is almost twice as complete as the CLC one and includes 1409 (87.3%) or 397 (93.4%) of complete BUSCOs depending on the Embryophyta or Viridiplantae dataset, respectively.

Table 3.

Summary statistics of the BUSCO analysis of two coast redwood transcriptome de novo assemblies generated using Trinity and CLC Genomics Workbench software.

Transcriptome assembly	Trinity	CLC
Viridiplantae dataset
Complete BUSCOs	397 (93.4%)	205 (48.2%)
Complete and single copy BUSCOs	136 (32.0%)	176 (41.4%)
Complete and duplicated BUSCOs	261 (61.4%)	29 (6.8%)
Fragmented BUSCOs	25 (5.9%)	186 (43.8%)
Missing BUSCOs	3 (0.7%)	34 (8.0%)
Total BUSCO groups searched	425	425
Embryophyta dataset
Complete BUSCOs	1409 (87.3%)	570 (35.3%)
Complete and single copy BUSCOs	517 (32.0%)	509 (31.5%)
Complete and duplicated BUSCOs	892 (55.3%)	61 (3.8%)
Fragmented BUSCOs	87 (5.4%)	491 (30.4%)
Missing BUSCOs	118 (7.3%)	553 (34.3%)
Total BUSCO groups searched	1614	1614

Open in a new tab

Mapping transcripts to the reference coast redwood genome assembly

Transcripts from both transcriptome assemblies were mapped to the reference coast redwood genome assembly (NCBI Assembly accession number GCA_007258455.1) using magic-blast (https://ncbi.github.io/magicblast/). In total, 96.9% and 98.5% of transcripts in the CLC and Trinity assemblies, respectively, were mapped to the genome. It is worth noting that we tried also a few other programs such as STAR, HISAT2, exonerate, and nucmer, but they could not handle mapping the large transcriptome assemblies to the large genome, and some of these programs (for example STAR) were designed to map short reads rather than relatively long transcripts. It is hard to predict how many genes can be expected in a coast redwood genome considering its hexaploid nature. The coast redwood draft genome assembly is neither annotated nor published yet. There are also only a few conifer species (all diploid) with annotated genomes (see Table 3 in Mosca et al.¹⁴ for review). Based on these data it ranges from 47,602 in Pinus taeda to 102,915 in Picea glauca. Therefore, we can easily expect as many as 600,000 genes. However, we have to emphasize that the presented assemblies are raw de novo ones and are likely highly redundant.

Microsatellite discovery and testing

The MISA search of the 622,955 unigene (contig) sequences found 37,164 microsatellite loci in 31,968 sequences. Among them, 19,048 SSRs represented microsatellite loci with mononucleotide motifs, 9,795 - dinucleotide, 7,346 - trinucleotide, 669 - tetranucleotide, 132 - pentanucleotide, and 174 - hexanucleotide motifs (Summary of the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads⁹). Using the online software Primer3 PCR primer pairs were successfully designed for 28,285 microsatellites: 14,806 with mononucleotide motifs, 6,226 - dinucleotide, 5,601 - trinucleotide, 432 - tetranucleotide, 77 - pentanucleotide, 95 – hexanucleotide, and 1,048 compound or complex motifs (PCR primers designed for the microsatellite loci identified by the MISA tool in the Trinity assembly based on the filtered reads using Primer3 online tool⁹). Twenty PCR primer pairs with unique single-copy annealing sites in both Trinity transcriptome and reference genome assemblies and with similar melting temperatures for multiplexing were selected, and respective oligos were synthesized with forward primers containing the M13 tail (5′-CACGACGTTGTAAACGAC-3′) and reverse primers containing the pig-tail (5′-GTTTCTT-3′). The M13 primer were labelled either by 6-FAM or HEX (Sigma Aldrich Inc., St. Louis, MO). The same touch-down PCR program was used for all 20 PCR primer pairs following the protocol described in Breidenbach et al.¹⁵ The PCR products were separated and visualized using the ABI Genetic Analyser 3130xl with GENSCAN ROX 500 as an internal size standard. The primers were tested in a population sample of eight trees. Their DNA was isolated from needles or cambium using the DNeasy Plant Kit (Qiagen, Hilden, Germany) following the manufacturer’s instructions. The isolated DNA was diluted in ddH₂O 1:10 for PCR amplification and stored at −20 °C. All primer pairs amplified alleles of expected size, and 14 markers were polymorphic (20 PCR primer pairs tested⁹) and can be used in different applications.

Acknowledgements

This project was funded by the “Fachagentur Nachwachsende Rohstoffe (FNR) des Bundesministeriums für Ernährung und Landwirtschaft (BMEL)” (Agency for Renewable Resources of the Federal Ministry of Food and Agriculture, Germany), grant number FKZ 22018614. We thank Andreas Bürkert and Rainer Braukmann from the Department of Organic Plant Production and Agroecosystems Research in the Tropics and Subtropics (Kassel University, Witzenhausen, Germany) for their support and providing the climate chamber. We also thank Barbara Vornam, Gerold Dinkel, Alexandra Dolynska, and other colleagues at the Department of Forest Genetics and Forest Tree Breeding for their support during lab work and help with climate chamber experiments. We thank Gabriela Salinas, Susanne Luthin, and Fabian Ludewig from the NGS Integrative Genomics Core Unit, University Medical Center, Göttingen for the RNA sequencing service and Christof Niehues and Dennis Gockel from Allerweltsgrün (Köln, Germany) for providing us with plant material. We also acknowledge support by the German Research Foundation (DFG) and the Open Access Publication Funds of the University of Göttingen.

Author contributions

K.V.K. and N.B. conceived and designed the experiments and the analytical strategy. N.B. prepared plant samples, conducted experiments and performed laboratory work. N.B., V.S. and K.V.K. performed data analysis. K.V.K. and O.G. advised and supervised the project. N.B. and K.V.K. drafted the manuscript. All authors provided feedback on the draft manuscript and approved the final manuscript for submission.

Code availability

Blast2GO PRO: https://www.blast2go.com/blast2go-pro

BUSCO v4.0.5: https://busco.ezlab.org

FastQC v0.11.5: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

Magic-BLAST v1.5.0: https://ncbi.github.io/magicblast

MISA: http://pgrc.ipk-gatersleben.de/misa/misa.html

PRIMER3: https://github.com/primer3-org/primer3

Trimmomatic v.0.35: http://www.usadellab.org/cms/?page=trimmomatic

Trinity v2.8.4: https://github.com/trinityrnaseq/trinityrnaseq/wiki

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

1.Roy, D. F. Silvical characteristics of redwood (Sequoia sempervirens [D. Don] Endl.). U.S. Forest Service Research Paper PSW-28 (1966).
2.Ahuja, M. R. (Ed.) Woody Plant Biotechnology (Plenum Press, New York and London, 1991).
3.Scott AD, Stenz NWM, Ingvarsson PK, Baum DA. Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers. New Phytol. 2016;211:186–193. doi: 10.1111/nph.13930. [DOI] [PubMed] [Google Scholar]
4.Arbaoui M, Balko C, Link W. Study of faba bean (Vicia faba L.) winter-hardiness and development of screening methods. Field Crop Res. 2008;106:60–67. doi: 10.1016/j.fcr.2007.10.015. [DOI] [Google Scholar]
5.Chang S, Pureyear J, Cairney J. A simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 1993;11:113–116. doi: 10.1007/BF02670468. [DOI] [Google Scholar]
6.2020. NCBI Sequence Read Archive. SRP227297
7.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2020. Transcriptome de novo assembly based on cold stressed clones of the hexaploid coast redwood (Sequoia sempervirens (D. Don) Endl.) figshare. [DOI] [PMC free article] [PubMed]
10.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIBU00000000 [DOI] [PMC free article] [PubMed]
11.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIDF00000000 [DOI] [PMC free article] [PubMed]
12.Andrews, S. FastQC a quality-control tool for high-throughput sequence data, (http://www.bioinformatics.babraham.ac.uk/projects/fastqc).
13.Seppey, M., Manni, M., Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. In Gene Prediction. Methods in Molecular Biology (ed. Kollmar, M.), vol. 1962, pp. 227–245 (Humana, New York, NY. 2019). [DOI] [PubMed]
14.Mosca E, et al. A reference genome sequence for the European silver fir (Abies alba Mill.): a community-generated genomic resource. G3-Genes Genomes Genetics. 2019;9(7):g3.400083.2019. doi: 10.1534/g3.119.400083. [DOI] [PMC free article] [PubMed] [Google Scholar]
15.Breidenbach N, Gailing O, Krutovsky KV. Development of novel polymorphic nuclear and chloroplast microsatellite markers in coast redwood (Sequoia sempervirens) Plant Genetic Resources: Characterization and Utilization. 2019;17(3):293–297. doi: 10.1017/S147926211800045X. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

2020. NCBI Sequence Read Archive. SRP227297
Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2020. Transcriptome de novo assembly based on cold stressed clones of the hexaploid coast redwood (Sequoia sempervirens (D. Don) Endl.) figshare. [DOI] [PMC free article] [PubMed]
Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIBU00000000 [DOI] [PMC free article] [PubMed]
Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIDF00000000 [DOI] [PMC free article] [PubMed]

Data Availability Statement

Blast2GO PRO: https://www.blast2go.com/blast2go-pro

BUSCO v4.0.5: https://busco.ezlab.org

FastQC v0.11.5: https://www.bioinformatics.babraham.ac.uk/projects/fastqc

Magic-BLAST v1.5.0: https://ncbi.github.io/magicblast

MISA: http://pgrc.ipk-gatersleben.de/misa/misa.html

PRIMER3: https://github.com/primer3-org/primer3

Trimmomatic v.0.35: http://www.usadellab.org/cms/?page=trimmomatic

Trinity v2.8.4: https://github.com/trinityrnaseq/trinityrnaseq/wiki

[CR1] 1.Roy, D. F. Silvical characteristics of redwood (Sequoia sempervirens [D. Don] Endl.). U.S. Forest Service Research Paper PSW-28 (1966).

[CR2] 2.Ahuja, M. R. (Ed.) Woody Plant Biotechnology (Plenum Press, New York and London, 1991).

[CR3] 3.Scott AD, Stenz NWM, Ingvarsson PK, Baum DA. Whole genome duplication in coast redwood (Sequoia sempervirens) and its implications for explaining the rarity of polyploidy in conifers. New Phytol. 2016;211:186–193. doi: 10.1111/nph.13930. [DOI] [PubMed] [Google Scholar]

[CR4] 4.Arbaoui M, Balko C, Link W. Study of faba bean (Vicia faba L.) winter-hardiness and development of screening methods. Field Crop Res. 2008;106:60–67. doi: 10.1016/j.fcr.2007.10.015. [DOI] [Google Scholar]

[CR5] 5.Chang S, Pureyear J, Cairney J. A simple and efficient method for isolating RNA from pine trees. Plant Mol. Biol. Rep. 1993;11:113–116. doi: 10.1007/BF02670468. [DOI] [Google Scholar]

[CR6] 6.2020. NCBI Sequence Read Archive. SRP227297

[CR7] 7.Haas BJ, et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 2013;8:1494–1512. doi: 10.1038/nprot.2013.084. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR8] 8.Beier S, Thiel T, Münch T, Scholz U, Mascher M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 2017;33:2583–2585. doi: 10.1093/bioinformatics/btx198. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR9] 9.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2020. Transcriptome de novo assembly based on cold stressed clones of the hexaploid coast redwood (Sequoia sempervirens (D. Don) Endl.) figshare. [DOI] [PMC free article] [PubMed]

[CR10] 10.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIBU00000000 [DOI] [PMC free article] [PubMed]

[CR11] 11.Breidenbach N, Sharov VV, Gailing O, Krutovsky KV. 2019. TSA: Sequoia sempervirens, transcriptome shotgun assembly. GenBank. GIDF00000000 [DOI] [PMC free article] [PubMed]

[CR12] 12.Andrews, S. FastQC a quality-control tool for high-throughput sequence data, (http://www.bioinformatics.babraham.ac.uk/projects/fastqc).

[CR13] 13.Seppey, M., Manni, M., Zdobnov, E.M. BUSCO: Assessing Genome Assembly and Annotation Completeness. In Gene Prediction. Methods in Molecular Biology (ed. Kollmar, M.), vol. 1962, pp. 227–245 (Humana, New York, NY. 2019). [DOI] [PubMed]

[CR14] 14.Mosca E, et al. A reference genome sequence for the European silver fir (Abies alba Mill.): a community-generated genomic resource. G3-Genes Genomes Genetics. 2019;9(7):g3.400083.2019. doi: 10.1534/g3.119.400083. [DOI] [PMC free article] [PubMed] [Google Scholar]

[CR15] 15.Breidenbach N, Gailing O, Krutovsky KV. Development of novel polymorphic nuclear and chloroplast microsatellite markers in coast redwood (Sequoia sempervirens) Plant Genetic Resources: Characterization and Utilization. 2019;17(3):293–297. doi: 10.1017/S147926211800045X. [DOI] [Google Scholar]

PERMALINK

De novo transcriptome assembly of cold stressed clones of the hexaploid Sequoia sempervirens (D. Don) Endl.

Natalie Breidenbach

Vadim V Sharov

Oliver Gailing

Konstantin V Krutovsky

Abstract

Background & Summary

Table 1.

Methods

Plant material

Fig. 1.

RNA extraction

RNA sequencing

Fig. 2.

Transcriptome de novo assembly

Table 2.

Transcriptome functional annotation

Data Records

Technical Validation

Quality control

Fig. 3.

Fig. 4.

Gene Ontology analysis

Fig. 5.

Fig. 6.

BUSCO analysis

Table 3.

Mapping transcripts to the reference coast redwood genome assembly

Microsatellite discovery and testing

Acknowledgements

Author contributions

Code availability

Competing interests

Footnotes

References

Associated Data

Data Citations

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases