Table 2.
Alignment and Assembly |
A preprocessing step: Extracting a sub-sequence of the genomic sequence. This step is not required, but may be useful for some preliminary tests and protocol validation. It restricts the size of the sequences and expedites the computation |
Input: reads files output of Illumina sequencing pipeline (sequence.txt files) |
Tool: LONI Sub-Sequence extractor |
Server Location:/projects1/idinov/projects/scripts/extract_lines_from_Textfile.sh |
Output: Shorter sequence.fastq file |
Data conversion: File conversion of solexa fastq in sanger fastq format |
Input: reads files output of Illumina sequencing pipeline (sequence.txt files) |
Tool: MAQ (sol2sanger option): Mapping and Assembly with Quality |
Server Location:/applications/maq |
Output: sequence.fastq file |
Binary conversion: Conversion of fastq in a binary fastq file (bfq) |
Input: sequence.fastq file |
Tool: MAQ (fastq2bfq option) |
Server Location:/applications/maq |
Output: sequence.bfq file |
Reference conversion: Conversion of the reference genome (fasta format) in binary fasta |
Input: reference.fasta file (to perform the alignment) |
Tool: MAQ (fasta2bfa option) |
Server Location:/applications/maq |
Output: reference.bfa file |
Sequence alignment: Alignment of data sequence to the reference genome |
Using MAQ: |
Input: sequence.bfq, reference.bfa |
Tool: MAQ (map option) |
Server Location:/applications/maq |
Output: alignment.map file |
Using Bowtie: |
Input: reference.fai, sequence.bfq, |
Tool: Bowtie (map option) |
Server Location:/applications/bowtie |
Output: alignment.sam file |
Indexing: Indexing the reference genome |
Input: reference.fa |
Tool: samtools (faidx option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: reference.fai |
Mapping conversion: |
MAQ2SAM: |
Input: alignment.map file |
Tool: samtools (maq2sam-long option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment.sam file |
SAM to full BAM: |
Input: alignment.sam, reference.fai file |
Tool: samtools (view -bt option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment.bam file |
Removal of duplicated reads: |
Input: alignment.bam file |
Tool: samtools (rmdup) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment.rmdup.bam file |
Sorting: |
Input: alignment. rmdup.bam file |
Tool: samtools (sort option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment. rmdup.sorted.bam file |
MD tagging: |
Input: alignment. rmdup.sorted.bam file and reference REF.fasta file |
Tool: samtools (calmd option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment. rmdup.sorted.calmd.bam file |
Indexing: |
Input: alignment.rmdup.sorted.calmd.bam file |
Tool: samtools (index option) |
Server Location:/applications/samtools-0.1.7_x86_64-linux |
Output: alignment. rmdup.sorted.calmd.bam.bai file |
This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.