. 2020 May 28;21(11):3828. doi: 10.3390/ijms21113828

Table 1.

Bioinformatics pipelines implemented on CSI NGS Portal.

Bioinformatics Pipeline	Analysis Steps	Tools and Packages	Sequencing Types	Normal/Control/Reference Samples	Replicate Samples ^a	Overall Runtime
1. DNA-Seq	Genome alignment	BWA (mem) [6]	Single/Paired end	Optional ^b	NA	~1 day
	Mutation calling	GATK4 Mutect2 [7,8]
	Mutation annotation	ANNOVAR [9]
2. RNA-Seq	Genome alignment	STAR [10]	Single/Paired end	NA	NA	~2 h
	Gene expression	HTSeq-count [11]
	Isoform expression	Salmon [12]
	Alternative splicing	in-house Perl
3. Diff-Exp	Genes table	Bioconductor DESeq2 [5]	Single/Paired end ^c	Required	Required (min 2 samples)	~10 min
	Genes report	Bioconductor regionReport [13]
	Heatmap	Superheat [14]
	Volcano	ggplot2 (Wickham 2016)
	Pathway enrichment	Bioconductor ReactomePA [15]
	Gene set enrichment analysis	GSEA [16]
	Isoforms report	Bioconductor DEXSeq [17]
4. Pathway-Enrichment	Enrichment plots	Bioconductor ReactomePA [15], enrichplot [18]	NA	NA	NA	~1 min
5. RNA-Editing	Genome alignment	BWA (mem) [6]	Single/Paired end	NA	NA	~7 h
	Variant calling	Samtools mpileup [19]
	Candidates selection	adapted from [20]
	AEI calculation	RNAEditingIndexer [21]
	UCSC track hub	in-house Bash
6. smallRNA	Genome alignment	NovoAlign	Single/Paired end	NA	NA	~1 h
6. smallRNA	smallRNA expression	in-house Perl	Single/Paired end	NA	NA	~1 h
7. 4C-Seq	Genome alignment	BWA (mem) [6]	Single/Paired end	Optional	Optional (2 samples)	~10 min
	Interactions	Bioconductor r3Cseq [22]
	Report	Bioconductor r3Cseq [22]
8. ChIP-Seq	Genome alignment	Bowtie2 [23]	Single/Paired end	Required	NA	~2 h
	Peak calling	MACS2 [24]
	Motif enrichment	HOMER [25]
	UCSC track hub	in-house Bash
9. RIP-Seq	Genome alignment	STAR [10]	Paired end	Required	Optional (2–10 samples)	~8 h
	Peak calling	in-house Bash
	UCSC track hub	in-house Bash
10. SHAPE-Seq	Transcriptome alignment	Bowtie2 [23]	Single/Paired end	Required	NA	~10 h
	Reactivity calculation	icSHAPE [26]
	Structure prediction	RNAfold [27,28]
11. rMATS	Genome alignment	STAR [10]	Single/Paired end	Required	Required(2–10 samples)	~2 h
11. rMATS	Alternative splicing	rMATS [29]	Single/Paired end	Required	Required(2–10 samples)	~2 h
12. circRNA	Genome alignment	STAR [10]	Single/Paired end	NA	NA	~1 h
12. circRNA	circRNA expression	in-house Perl	Single/Paired end	NA	NA	~1 h
13. eCLIP-Seq	Demultiplexing	eclipdemux [30,31]	Single/Paired end	Required	NA	~1 day
	Mapping	STAR [10]
	Peak calling	clipper [32]
	Peak normalisation	eCLIP [30,31]
	Peak annotation	HOMER [25]
	Motif enrichment	HOMER [25]
	UCSC track hub	in-house Bash
14. Bisulfite-Seq	Genome alignment	bowtie2 [23]	Single/Paired end	NA	NA	~3 days
	Methylation calling	Bismark [33]
	UCSC track hub	in-house Bash
	DMRs	metilene [34]
15. scRNA-Seq	Genome alignment	STAR [10]	Paired end	NA	NA	~4 h
15. scRNA-Seq	Single cell analysis	Cell Ranger (10× Genomics)	Paired end	NA	NA	~4 h
16. ngsplot-deepTools	Genome alignment	STAR [10], Bowtie2 [23]	Single/Paired end	NA	NA	~4 h
	Plots	ngsplot [35]
	Plots	deepTools [36]

^a Ideally technical replicates rather than biological replicates. Numbers in parentheses denote the samples in total. ^b For somatic mutation calling, a matched normal DNA sample is highly recommended. Use of “tumor-only mode” is useful only for specific purposes. ^c Not directly applicable to “Diff-Exp” pipeline, it instead refers to the samples from the target jobs where this pipeline starts from. Seq: Sequencing, BWA: Burrows-Wheeler Aligner, GATK: GenomeAnalysisToolkit, ANNOVAR: Annotate Variation, STAR: Spliced Transcripts Alignment to a Reference, GSEA: Gene Set Enrichment Analysis, AEI: Alu Editing Index, 4C: Chromosome Conformation Capture-on-Chip, ChIP: Chromatin Immunoprecipitation, RIP: RNA Immunoprecipitation, MACS: Model-based Analysis of ChIP-Seq, icSHAPE: in vivo click Selective 2-Hydroxyl Acylation and Profiling Experiment, MATS: Multivariate Analysis of Transcript Splicing, eCLIP: enhanced Crosslinking and Immunoprecipitation, DMR: Differentially Methylated Regions, NA: Not Applicable. Usage of the tools and packages in CSI NGS Portal and website links to their original sources are given in Supplementary Table S1. The detailed descriptions, expected input and output of the pipelines are given in Supplementary Data and on the website Docs page.Overall runtime is the approximate time elapsed for one sample to finish all the analysis steps once the job starts running, and may vary depending on the data size, pipeline parameters and server load. However, runtime for additional samples under the same job do not multiply proportionally due to the parallelisation. In case of multiple samples, all the samples start off running as soon as there are available resources on the server and keep running in parallel until they all finish. This provides an efficient means of utilising system resources, while providing results to the user as quickly as possible.