Skip to main content
. 2012 May 25;7(5):e37565. doi: 10.1371/journal.pone.0037565

Figure 2. Bioinformatics analysis workflow for SBG.

Figure 2

The Illumina data are first processed to remove low quality reads. The reference sequences are generated by clustering the unique reads present within the dataset. The reads are subsequently aligned to the reference sequences and variation called using the GATK Unified Genotyper. Lastly, the final set of SNPs and genotypes are generated by removing SNPs not meeting the threshold for percentage of missing data and expected genotypic frequencies.