Skip to main content
. 2020 May 28;21(11):3828. doi: 10.3390/ijms21113828

Table 1.

Bioinformatics pipelines implemented on CSI NGS Portal.

Bioinformatics
Pipeline
Analysis
Steps
Tools and
Packages
Sequencing
Types
Normal/Control/Reference Samples Replicate
Samples a
Overall
Runtime
1. DNA-Seq Genome alignment BWA (mem) [6] Single/Paired end Optional b NA ~1 day
Mutation calling GATK4 Mutect2 [7,8]
Mutation annotation ANNOVAR [9]
2. RNA-Seq Genome alignment STAR [10] Single/Paired end NA NA ~2 h
Gene expression HTSeq-count [11]
Isoform expression Salmon [12]
Alternative splicing in-house Perl
3. Diff-Exp Genes table Bioconductor DESeq2 [5] Single/Paired end c Required Required
(min 2 samples)
~10 min
Genes report Bioconductor regionReport [13]
Heatmap Superheat [14]
Volcano ggplot2 (Wickham 2016)
Pathway enrichment Bioconductor ReactomePA [15]
Gene set enrichment analysis GSEA [16]
Isoforms report Bioconductor DEXSeq [17]
4. Pathway-Enrichment Enrichment plots Bioconductor ReactomePA [15],
enrichplot [18]
NA NA NA ~1 min
5. RNA-Editing Genome alignment BWA (mem) [6] Single/Paired end NA NA ~7 h
Variant calling Samtools mpileup [19]
Candidates selection adapted from [20]
AEI calculation RNAEditingIndexer [21]
UCSC track hub in-house Bash
6. smallRNA Genome alignment NovoAlign Single/Paired end NA NA ~1 h
smallRNA expression in-house Perl
7. 4C-Seq Genome alignment BWA (mem) [6] Single/Paired end Optional Optional
(2 samples)
~10 min
Interactions Bioconductor r3Cseq [22]
Report Bioconductor r3Cseq [22]
8. ChIP-Seq Genome alignment Bowtie2 [23] Single/Paired end Required NA ~2 h
Peak calling MACS2 [24]
Motif enrichment HOMER [25]
UCSC track hub in-house Bash
9. RIP-Seq Genome alignment STAR [10] Paired end Required Optional
(2–10 samples)
~8 h
Peak calling in-house Bash
UCSC track hub in-house Bash
10. SHAPE-Seq Transcriptome alignment Bowtie2 [23] Single/Paired end Required NA ~10 h
Reactivity calculation icSHAPE [26]
Structure prediction RNAfold [27,28]
11. rMATS Genome alignment STAR [10] Single/Paired end Required Required(2–10 samples) ~2 h
Alternative splicing rMATS [29]
12. circRNA Genome alignment STAR [10] Single/Paired end NA NA ~1 h
circRNA expression in-house Perl
13. eCLIP-Seq Demultiplexing eclipdemux [30,31] Single/Paired end Required NA ~1 day
Mapping STAR [10]
Peak calling clipper [32]
Peak normalisation eCLIP [30,31]
Peak annotation HOMER [25]
Motif enrichment HOMER [25]
UCSC track hub in-house Bash
14. Bisulfite-Seq Genome alignment bowtie2 [23] Single/Paired end NA NA ~3 days
Methylation calling Bismark [33]
UCSC track hub in-house Bash
DMRs metilene [34]
15. scRNA-Seq Genome alignment STAR [10] Paired end NA NA ~4 h
Single cell analysis Cell Ranger (10× Genomics)
16. ngsplot-deepTools Genome alignment STAR [10], Bowtie2 [23] Single/Paired end NA NA ~4 h
Plots ngsplot [35]
Plots deepTools [36]

a Ideally technical replicates rather than biological replicates. Numbers in parentheses denote the samples in total. b For somatic mutation calling, a matched normal DNA sample is highly recommended. Use of “tumor-only mode” is useful only for specific purposes. c Not directly applicable to “Diff-Exp” pipeline, it instead refers to the samples from the target jobs where this pipeline starts from. Seq: Sequencing, BWA: Burrows-Wheeler Aligner, GATK: GenomeAnalysisToolkit, ANNOVAR: Annotate Variation, STAR: Spliced Transcripts Alignment to a Reference, GSEA: Gene Set Enrichment Analysis, AEI: Alu Editing Index, 4C: Chromosome Conformation Capture-on-Chip, ChIP: Chromatin Immunoprecipitation, RIP: RNA Immunoprecipitation, MACS: Model-based Analysis of ChIP-Seq, icSHAPE: in vivo click Selective 2-Hydroxyl Acylation and Profiling Experiment, MATS: Multivariate Analysis of Transcript Splicing, eCLIP: enhanced Crosslinking and Immunoprecipitation, DMR: Differentially Methylated Regions, NA: Not Applicable. Usage of the tools and packages in CSI NGS Portal and website links to their original sources are given in Supplementary Table S1. The detailed descriptions, expected input and output of the pipelines are given in Supplementary Data and on the website Docs page.Overall runtime is the approximate time elapsed for one sample to finish all the analysis steps once the job starts running, and may vary depending on the data size, pipeline parameters and server load. However, runtime for additional samples under the same job do not multiply proportionally due to the parallelisation. In case of multiple samples, all the samples start off running as soon as there are available resources on the server and keep running in parallel until they all finish. This provides an efficient means of utilising system resources, while providing results to the user as quickly as possible.