Skip to main content
. Author manuscript; available in PMC: 2016 Jul 28.
Published in final edited form as: Br J Haematol. 2014 Apr 18;166(4):566–570. doi: 10.1111/bjh.12898

Table I.

Whole exome sequencing of donor and recipient DNA, and sequence comparison to generate alloreactivity potential.

TruSeq exome enriched libraries prepared from de-identified, Donor-Recipient pair DNA samples Illumina protocol
1 DNA fragmentation, adapter ligation and amplification performed
2 Libraries validated on BioAnalyser, quantified using real time (quantitative) polymerase chain reaction (qPCR) and pooled
3 Exome enrichment. Two hybridizations performed using target-specific biotinylated oligos followed by binding to magnetic streptavidin beads and three washes. PCR amplification of enriched product performed. Validation and sequencing on Illumina HiSeq 2000 with 4–8 samples per lane
4 The ~100 bp paired end FASTQ reads generated by the sequencer run through the Next-generation Sequencing Quality Control (NGS QC) Toolkit (Patel & Jain, 2011) to select high quality (HQ) reads, i.e., reads where at least 70% of the bases had a quality score of ≥25. An average 20% of reads were excluded due to this HQ filtering
5 HQ reads aligned to the Human Genome (hg18) using CLC Bio Assembly Cell version 3.22. >91% of the HQ reads aligned with at least 95% of the bases matching over 95% of the read length. The alignments converted to the industry-standard Binary sequence Alignment/ Map (BAM) format.
6 Sequence Alignment/Map (SAM) tools (Li et al, 2009) used to remove PCR duplicates from the BAM files as these may bias subsequent single nucleotide polymorphism (SNP) calling. All samples with at least 28× average coverage of the entire human exome, ensuring credible and accurate SNP calling
7 SNP calling performed with preprocessed BAM files using the Broad Institute’s Genome Analysis Toolkit (McKenna et al, 2010) (GATKv1.6). The GATK SNP calling (DePristo et al, 2011) involved three steps; 1. DNA insertion-deletion (INDEL) realignment; 2. Quality score recalibration; 3. SNP discovery and genotyping. The SNP caller generates a multi-sample variant-calls file (VCF)
8 The multi-sample VCF file filtered to remove chromosomal positions that did not have at least 10× coverage and did not exceed 5009 coverage. Insertion/deletion variants removed using VCFtools software (v.0.1.9.0) (Danecek et al, 2011)
9 Each sample was separated from the multi-sample VCF file into individual files and positions containing missing genotype data removed. Given that the original VCF file contained multiple samples, every alternate allele that occurred in any of the samples was represented
10 To annotate the SNPs, the alternate allele and genotype data was transformed into an ANNOVAR-acceptable format primarily consisting of a single alternate allele and a genotype containing only combinations of zero and one
11 Transformed data samples underwent independent comparison and annotation. (i) Recipient samples were compared to the actual donor and to every other donor sample to generate actual matches (recipient with its human leucocyte antigen [HLA]-matched donor) and simulated donor-recipient matches (recipient with other, HLA-unmatched donor). (ii) For annotation, files were first filtered to remove any positions where the genotype was the same as the reference and then annotated using ANNOVAR (v.2012 Mar 08) (Wang et al, 2010)
12 The sample-pair comparison files were then combined with the annotation files by comparing the variant alleles in sample 1 and sample 2, then annotating the variant position based on which sample contains the SNP