. 2022 Dec 15;13:1030890. doi: 10.3389/fpls.2022.1030890

Table 2.

List of softwares used in De novo assembly pipeline with their functions.

Name of Software	Raw data	Data filtration	Output
Through Sequencer ( Leng et al., 2012 )	RNA-Seq Raw Reads	Short Reads- Obtained from the common NGS platforms, including Illumina, SOLiD and 454, are often very short bases (35-500 bp). Long Reads- Oxford Nanopore/PacBio Sequencers can sequence up to long 5 to 100Kb reads.	Massively parallel millions to billions sequence that offers high- throughput, scalability, and takes lesser time.
FastQc ( Andrews, 2010 )	Quality Check (QC)	Quality assessment-Evaluate the raw read quality, identify the adaptor contaminations, and identify low quality samples	Good/Bed- According to Phred Score quality (Q-value)
Trimgalore	Read clean up (If contaminated)	Trimming- Removes the bad bases (adaptor sequences and low-quality bases) at start and end of the reads Filtering- Removes contaminants, low complexity reads (repeats), short reads less than 20 bases	K-mer- Shorter nucleotides than the read length De Bruijn graph- Several transcriptome assembly programs. Every path in the graph denotes a potential transcript for transcriptome assembly.
De novo assembly ( Zhao et al., 2011 )	Trinity (Henschel et al., 2012)	Quality/Phred Score quality (Q-score)- prediction of the probability (P) of an error in base calling. Q(phred) = -10 log₁₀P Or P = 10^–Q/10	De novo Assembler- A novel method for the efficient and robust de novo reconstruction of transcriptomes. Software modules- Inchworm, Chrysalis, and Butterfly
RSEM ( Li and Dewey, 2011 )	Transcript Abundance Estimation	Assembly Statistics	N50 length is defined as the shortest sequence length at 50% of the transcriptome
Cd-hit ( Suzek et al., 2007 )	Transcript clustering	Generate unigenes	Group of transcript sequences
edgeR ( Robinson et al., 2010 )	Differential Expression Analysis	edgeR can be applied to differential expression at the gene, exon, transcript, or tag level. In fact, any genetic feature can be utilized to calculate read counts. There are two testing methods: likelihood ratio tests and quasi-likelihood F-tests.	The key abilities of package, and then gives several fully worked case studies, from counts to list of genes
TransDecoder ( Wang et al., 2009 )	Coding DNA Prediction	CDS prediction from unigenes	Segments of a gene’s (mRNA) that code for protein.
Blast2GO ( Martin et al., 2004 )	Gene Ontology (GO)	Mapping and annotation	In detail, describe a gene/gene product, including three main characteristics: molecular function (MF), Biological process (BP), cellular compound (CC)
Trinolate ( Wang et al., 2009 )	Functional Annotation	(COG) Clusters of Orthologous Groups (for prediction of individual proteins function), Ven diagram (to identify common genes of all software), Pfam domain (to identification of protein family), Volcano plot (gene expression), Scattered plot (for normalization of obtained values), Heatmap (for highly significant differential expressed genes)	Portion identification and Gene prediction (process of collecting information about and describing a gene’s)
KAAS ( Moriya et al., 2007 )	Pathway Prediction	Pathway analysis against KEGG databases	Identification of biological functions
DESeq2 ( Love et al., 2017 )	Differential Expression Analysis	Normalization, differential analysis and visualization of high- dimensional count data	Count matrices can be collapsed using collapse Replicates, which helps to combine counts from technical replications into single columns.