Skip to main content
. 2011 Jul 26;12:304. doi: 10.1186/1471-2105-12-304

Table 2.

An example of a hierarchical alignment and assembly protocol specification

Alignment and Assembly
A preprocessing step: Extracting a sub-sequence of the genomic sequence. This step is not required, but may be useful for some preliminary tests and protocol validation. It restricts the size of the sequences and expedites the computation
  Input: reads files output of Illumina sequencing pipeline (sequence.txt files)
  Tool: LONI Sub-Sequence extractor
  Server Location:/projects1/idinov/projects/scripts/extract_lines_from_Textfile.sh
  Output: Shorter sequence.fastq file
Data conversion: File conversion of solexa fastq in sanger fastq format
  Input: reads files output of Illumina sequencing pipeline (sequence.txt files)
  Tool: MAQ (sol2sanger option): Mapping and Assembly with Quality
  Server Location:/applications/maq
  Output: sequence.fastq file
Binary conversion: Conversion of fastq in a binary fastq file (bfq)
  Input: sequence.fastq file
  Tool: MAQ (fastq2bfq option)
  Server Location:/applications/maq
  Output: sequence.bfq file
Reference conversion: Conversion of the reference genome (fasta format) in binary fasta
  Input: reference.fasta file (to perform the alignment)
  Tool: MAQ (fasta2bfa option)
  Server Location:/applications/maq
  Output: reference.bfa file
Sequence alignment: Alignment of data sequence to the reference genome
  Using MAQ:
   Input: sequence.bfq, reference.bfa
   Tool: MAQ (map option)
   Server Location:/applications/maq
   Output: alignment.map file
  Using Bowtie:
   Input: reference.fai, sequence.bfq,
   Tool: Bowtie (map option)
   Server Location:/applications/bowtie
   Output: alignment.sam file
Indexing: Indexing the reference genome
  Input: reference.fa
  Tool: samtools (faidx option)
  Server Location:/applications/samtools-0.1.7_x86_64-linux
  Output: reference.fai
Mapping conversion:
  MAQ2SAM:
   Input: alignment.map file
   Tool: samtools (maq2sam-long option)
   Server Location:/applications/samtools-0.1.7_x86_64-linux
   Output: alignment.sam file
  SAM to full BAM:
   Input: alignment.sam, reference.fai file
   Tool: samtools (view -bt option)
   Server Location:/applications/samtools-0.1.7_x86_64-linux
   Output: alignment.bam file
Removal of duplicated reads:
  Input: alignment.bam file
  Tool: samtools (rmdup)
  Server Location:/applications/samtools-0.1.7_x86_64-linux
  Output: alignment.rmdup.bam file
Sorting:
  Input: alignment. rmdup.bam file
  Tool: samtools (sort option)
  Server Location:/applications/samtools-0.1.7_x86_64-linux
  Output: alignment. rmdup.sorted.bam file
MD tagging:
  Input: alignment. rmdup.sorted.bam file and reference REF.fasta file
  Tool: samtools (calmd option)
  Server Location:/applications/samtools-0.1.7_x86_64-linux
  Output: alignment. rmdup.sorted.calmd.bam file
Indexing:
  Input: alignment.rmdup.sorted.calmd.bam file
  Tool: samtools (index option)
  Server Location:/applications/samtools-0.1.7_x86_64-linux
  Output: alignment. rmdup.sorted.calmd.bam.bai file

This protocol is implemented as a Pipeline graphical workflow and demonstrated in the Results section. Figure 3 shows the corresponding Pipeline graphical workflow implementing this genomics analysis protocol.