Table 1.
Bioinformatics Pipeline |
Analysis Steps |
Tools and Packages |
Sequencing Types |
Normal/Control/Reference Samples | Replicate Samples a |
Overall Runtime |
---|---|---|---|---|---|---|
1. DNA-Seq | Genome alignment | BWA (mem) [6] | Single/Paired end | Optional b | NA | ~1 day |
Mutation calling | GATK4 Mutect2 [7,8] | |||||
Mutation annotation | ANNOVAR [9] | |||||
2. RNA-Seq | Genome alignment | STAR [10] | Single/Paired end | NA | NA | ~2 h |
Gene expression | HTSeq-count [11] | |||||
Isoform expression | Salmon [12] | |||||
Alternative splicing | in-house Perl | |||||
3. Diff-Exp | Genes table | Bioconductor DESeq2 [5] | Single/Paired end c | Required | Required (min 2 samples) |
~10 min |
Genes report | Bioconductor regionReport [13] | |||||
Heatmap | Superheat [14] | |||||
Volcano | ggplot2 (Wickham 2016) | |||||
Pathway enrichment | Bioconductor ReactomePA [15] | |||||
Gene set enrichment analysis | GSEA [16] | |||||
Isoforms report | Bioconductor DEXSeq [17] | |||||
4. Pathway-Enrichment | Enrichment plots | Bioconductor ReactomePA [15], enrichplot [18] |
NA | NA | NA | ~1 min |
5. RNA-Editing | Genome alignment | BWA (mem) [6] | Single/Paired end | NA | NA | ~7 h |
Variant calling | Samtools mpileup [19] | |||||
Candidates selection | adapted from [20] | |||||
AEI calculation | RNAEditingIndexer [21] | |||||
UCSC track hub | in-house Bash | |||||
6. smallRNA | Genome alignment | NovoAlign | Single/Paired end | NA | NA | ~1 h |
smallRNA expression | in-house Perl | |||||
7. 4C-Seq | Genome alignment | BWA (mem) [6] | Single/Paired end | Optional | Optional (2 samples) |
~10 min |
Interactions | Bioconductor r3Cseq [22] | |||||
Report | Bioconductor r3Cseq [22] | |||||
8. ChIP-Seq | Genome alignment | Bowtie2 [23] | Single/Paired end | Required | NA | ~2 h |
Peak calling | MACS2 [24] | |||||
Motif enrichment | HOMER [25] | |||||
UCSC track hub | in-house Bash | |||||
9. RIP-Seq | Genome alignment | STAR [10] | Paired end | Required | Optional (2–10 samples) |
~8 h |
Peak calling | in-house Bash | |||||
UCSC track hub | in-house Bash | |||||
10. SHAPE-Seq | Transcriptome alignment | Bowtie2 [23] | Single/Paired end | Required | NA | ~10 h |
Reactivity calculation | icSHAPE [26] | |||||
Structure prediction | RNAfold [27,28] | |||||
11. rMATS | Genome alignment | STAR [10] | Single/Paired end | Required | Required(2–10 samples) | ~2 h |
Alternative splicing | rMATS [29] | |||||
12. circRNA | Genome alignment | STAR [10] | Single/Paired end | NA | NA | ~1 h |
circRNA expression | in-house Perl | |||||
13. eCLIP-Seq | Demultiplexing | eclipdemux [30,31] | Single/Paired end | Required | NA | ~1 day |
Mapping | STAR [10] | |||||
Peak calling | clipper [32] | |||||
Peak normalisation | eCLIP [30,31] | |||||
Peak annotation | HOMER [25] | |||||
Motif enrichment | HOMER [25] | |||||
UCSC track hub | in-house Bash | |||||
14. Bisulfite-Seq | Genome alignment | bowtie2 [23] | Single/Paired end | NA | NA | ~3 days |
Methylation calling | Bismark [33] | |||||
UCSC track hub | in-house Bash | |||||
DMRs | metilene [34] | |||||
15. scRNA-Seq | Genome alignment | STAR [10] | Paired end | NA | NA | ~4 h |
Single cell analysis | Cell Ranger (10× Genomics) | |||||
16. ngsplot-deepTools | Genome alignment | STAR [10], Bowtie2 [23] | Single/Paired end | NA | NA | ~4 h |
Plots | ngsplot [35] | |||||
Plots | deepTools [36] |
a Ideally technical replicates rather than biological replicates. Numbers in parentheses denote the samples in total. b For somatic mutation calling, a matched normal DNA sample is highly recommended. Use of “tumor-only mode” is useful only for specific purposes. c Not directly applicable to “Diff-Exp” pipeline, it instead refers to the samples from the target jobs where this pipeline starts from. Seq: Sequencing, BWA: Burrows-Wheeler Aligner, GATK: GenomeAnalysisToolkit, ANNOVAR: Annotate Variation, STAR: Spliced Transcripts Alignment to a Reference, GSEA: Gene Set Enrichment Analysis, AEI: Alu Editing Index, 4C: Chromosome Conformation Capture-on-Chip, ChIP: Chromatin Immunoprecipitation, RIP: RNA Immunoprecipitation, MACS: Model-based Analysis of ChIP-Seq, icSHAPE: in vivo click Selective 2-Hydroxyl Acylation and Profiling Experiment, MATS: Multivariate Analysis of Transcript Splicing, eCLIP: enhanced Crosslinking and Immunoprecipitation, DMR: Differentially Methylated Regions, NA: Not Applicable. Usage of the tools and packages in CSI NGS Portal and website links to their original sources are given in Supplementary Table S1. The detailed descriptions, expected input and output of the pipelines are given in Supplementary Data and on the website Docs page.Overall runtime is the approximate time elapsed for one sample to finish all the analysis steps once the job starts running, and may vary depending on the data size, pipeline parameters and server load. However, runtime for additional samples under the same job do not multiply proportionally due to the parallelisation. In case of multiple samples, all the samples start off running as soon as there are available resources on the server and keep running in parallel until they all finish. This provides an efficient means of utilising system resources, while providing results to the user as quickly as possible.