Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Dec 7.
Published in final edited form as: Hum Immunol. 2020 Jun 13;81(8):423–429. doi: 10.1016/j.humimm.2020.06.002

Concordance between predicted HLA type using next generation sequencing data generated for non-HLA purposes and clinical HLA type

Ann M Moyer a, Brian Dukek a, Patti Duellman a, Brittany Schneider a, Laurie Wakefield a, Jennifer M Skierka a, Rajeswari Avula a, Aditya V Bhagwate b, Krishna R Kalari b, Justin D Kreuter a, Matthew P Goetz c, Judy C Boughey d, John L Black III a, Manish J Gandhi a,*
PMCID: PMC7721171  NIHMSID: NIHMS1645538  PMID: 32546429

Abstract

We explored the feasibility of obtaining accurate HLA type using pre-existing NGS data not generated for HLA purposes. 83 exomes and 500 targeted NGS pharmacogenomic panels were analyzed using Omixon HLA Explore, OptiType, and/or HLA-Genotyper software. Results were compared against clinical HLA genotyping. 765 (94.2%) Omixon and 769 (94.7%) HLA-Genotyper of 812 germline allele calls across class I/II loci and 402 (99.5%) of 404 OptiType class I calls were concordant to the second field (i.e. HLA-A*02:01). An additional 19 (2.3%) Omixon, 39 (4.8%) HLA-Genotyper, and 2 (0.5%) OptiType allele calls were first field concordant (i.e. HLA-A*02). Using Omixon, four alleles (0.4%) were discordant and 24 (3.0%) failed to call, while 4 alleles (0.4%) were discordant using HLA-Genotyper. Tumor exomes were also evaluated and were 85.4%, 91.6%, and 100% concordant (Omixon and HLA-Genotyper with 96 alleles tested, and Optitype with 48 class I alleles, respectively). The 15 exomes and 500 pharmacogenomic panels were 100% concordant for each pharmacogenomic allele tested. This work has broad implications spanning future clinical care (pharmacogenomics, tumor response to immunotherapy, autoimmunity, etc.) and research applications.

Keywords: Pharmacogenetics, Pharmacogenomics, HLA, Human leukocyte antigen, MHC, Exome, Next generation sequencing

1. Introduction

Pharmacogenetic testing has historically been performed reactively at the time the patient will be prescribed a medication or, in some cases, to explain a toxicity that has already developed. There is increasing interest in preemptive testing of multiple pharmacogenes and storing that data in the patient’s electronic health record (EHR) so that the information can be readily available and used immediately to prescribe medications when needed without a delay for testing. At the same time, the cost of sequencing has dramatically decreased in recent years [1], such that exome sequencing (ES) and even genome sequencing (GS) are increasingly used in the diagnosis of hereditary disorders. In addition, patients are increasingly interested in obtaining genetic information, including pharmacogenomics. As a result, “healthy” exomes and genomes are beginning to be offered [2]. Therefore, exome data is available for rising numbers of patients, many of whom would opt for pharmacogenomic interpretation if available [35].

Several human leukocyte antigen (HLA) loci are currently known to be associated with increased risk for severe medication reactions and are the subject of guidelines [69]. Therefore, when interpreting ES data for pharmacogenomic purposes, ideally, HLA alleles should also be included. However, genes encoding the HLA class I and class II molecules are the most polymorphic in the human genome, with 19,031 class I and 7183 class II alleles documented as of December 2019 [10]. The HLA genes are difficult to genotype, particularly with short-read next generation sequencing (NGS), due to the high degree of polymorphism as well as structural variation. Using traditional techniques, HLA typing can be performed at different resolution depending on which technique is utilized [1113]. For example, sequence-specific oligonucleotide (SSO) typing can be used to generate results to the first field (i.e. HLA-A*02), which corresponds to allele groups by serological activity and semi-accurately to the second field (i.e. HLA-A*02:01), which corresponds to protein level resolution. Sequence-specific primer (SSP) amplification and sequence based typing (SBT) produces results accurate to the second field. These techniques are labor-intensive and are being replaced with NGS techniques [14,15]. Recently, the use of targeted sequencing and/or long-range PCR methods coupled with NGS with read depths of > 50–100 have been shown to produce accurate results, and allow for typing to the third and fourth field (i.e. HLA-A*02:01:01 and HLA-A*02:01:01:01), which corresponds to synonymous polymorphisms and non-coding variants [1618]. The use of ES, in the absence of target enrichment and specific PCR amplification steps, for HLA typing has been less explored to date.

In addition to accurate HLA typing being critical for matching of transplant donors and recipients and for pharmacogenomics, it is also important in the diagnosis of autoimmune diseases and a role in prediction of response of solid tumors to immune checkpoint blockade is also emerging [1923]. Therefore, the ability to generate accurate HLA typing from clinical ES and from existing research data sets would be valuable for both clinical and research purposes. Currently available targeted HLA sequencing by NGS typically requires amplification of individual loci by either long-range PCR or by multiplex PCR targeting exons related to the antigen recognition site, with read depths of at least 50 [24]. In addition, medium read length (150–1000 bp) instruments are typically used. Therefore, we explored the feasibility of accurate HLA typing by filtering to include only reads from the MHC region from existing ES data or a large NGS pharmacogenomic panel that was generated by sequence capture and short (101 bp) NGS reads for clinical diagnostic purposes or as part of a research study.

2. Subjects, materials, and methods

2.1. Clinical data sets

Three separate data sets were used for this study to reflect the possibility of HLA typing for pharmacogenomics and/or other applications from NGS data not originally generated for the specific purpose of HLA typing. These studies were reviewed and approved by the Mayo Clinic Institutional Review Board.

Our first data set consisted of 68 patients enrolled in BEAUTY (NCT02022202) at Mayo Clinic, which included adult women with newly diagnosed breast cancer [25]. Genomic DNA was extracted from peripheral blood from each of the 68 participants and from frozen tumor specimens from 5 of the 68 patients obtained prior to neoadjuvant chemotherapy (NAC). In addition, for 1 of those 5 patients, tumor specimens were also available partway through NAC, after completion of NAC, and at the time of tumor recurrence. Exome sequencing was performed using an Illumina HiSeq2000 after targeted exon capture using the Agilent SureSelect Human All Exon + UTRs 71 MB v4 kit. The TruSeq SBS sequencing kit version 3 was used to generate 2x101 paired-end reads. Reads were mapped using the Novoalign module of Mayo Clinic’s GenomeGPS DNA sequencing analysis pipeline. Each sample had > 50X coverage over 90% of the total capture region.

Our second data set consisted of 15 exomes (5 child/parent trios) run using the standard clinical processes in a Clinical Laboratory Improvement Act-approved and College of American Pathologists-accredited clinical laboratory in the Department of Laboratory Medicine and Pathology at Mayo Clinic for the purpose of validation of the clinical exome sequencing test [26]. This exome sequencing test was designed to include a pharmacogenomic interpretation alongside exome sequencing results generated during testing for diagnosis of hereditary disorders. This test reported positive or negative status for HLA-A*31:01 and HLA-B*15:02 (for carbamazepine), HLA-B*57:01 (for abacavir), and HLA-B*58:01 (for allopurinol), in addition to providing genotype and phenotype data for CYP1A2, CYP2C19, CYP2C9, CYP2D6, CYP3A4, CYP3A5, SLCO1B1, UGT1A1, and VKORC1. Exome capture was performed using a custom reagent developed by the Mayo Clinic and Agilent Technologies, followed by sequencing using an Illumina HiSeq2500 in the rapid run mode with 200 cycles generating 101-base pair paired-end reads followed by alignment using Novoalign.

The third data set included 500 targeted pharmacogenomic panels that were run as part of clinical validation of PGRN-Seq reagent for the RIGHT Protocol at Mayo Clinic [27,28]. The PGRN-Seq capture reagent, which includes 84 genes associated with pharmacogenomic phenotypes and covers 968 kb of sequence, was used followed by sequencing on an Illumina HiSeq2500 in the rapid run mode using the TruSeq Rapid SBS Kit with 200 cycles and 101-base pair paired-end reads. Results of a subset of the 84 genes, including positive/negative status for HLA-A*31:01, HLA-B*15:02, HLA-B*57:01, and HLA-B*58:01 were placed in the electronic health record for each participant.

2.2. HLA genotyping from NGS data

BAM files from each data set were filtered to include only the HLA gene regions, using a command line tool provided by Omixon (Budapest, Hungary). The filtering step removes reads that do not map to the HLA region, resulting in a smaller file limited to reads that uniquely map to the HLA region. The two FASTQ files generated from each BAM file were then loaded into the Omixon HLA Explore software v1.4, which aligns the reads remaining after filtering to the specific HLA loci interrogated. The software was set to produce the best call(s) for each locus; however a full data set of possible calls was also available. This software can produce calls up to the fourth field for some alleles, while for others calls are only made to the second field. OptiType software was also run after using RazerS3 to exclude non-HLA reads on BAM files from the first data set (BEAUTY participants) [29]. This software provides the best matched alleles at each locus and outputs to the second field. Fig. 1 is displaying NGS data aligning to several potential HLA-B reference alleles in the Omixon software (A) and to the two alleles called by the OptiType software (B). Finally, HLA-Genotyper v0.4.2b1 was run, which also provides the best matched alleles at each locus with output to the second field [30]. HLA genotypes for class I and class II loci were generated for each sample by the Omixon HLA Explore or HLA-Genotyper software, while HLA genotypes for class I loci were generated by the OptiType software.

Fig. 1.

Fig. 1.

(A) HLA-B allele alignment of next generation sequencing data for one sample in HLA Explore software (exons 1–4). This sample resulted in a failure or “no call” for the HLA-B locus; however, by actual HLA typing was found to be HLA-B*44:02:01/*56:01:01. The top row shows the correct alignment of reads (represented by stacked grey bars) to the HLA-B*56:01:01 allele from exon 1 (left side) to exon 4 (right side of diagram), while the middle row shows reads unevenly and incorrectly mapping to the HLA-B*83:01 allele, and the bottom row shows a lack of reads mapping to the correct HLA-B*44:02:01 allele. (B) In contrast, the Optitype plots for the same sample are shown. This software produced a call concordant to the second field for each allele.

2.3. HLA genotyping

Due to the limited availability of DNA corresponding to the first data set, low to medium resolution SSO typing for HLA-A, B, C, DRB1, DQA1, and DQB1 was performed on DNA extracted from blood or tumor using the LABType method (One Lambda, Canoga Park, CA) in the clinical Mayo Clinic Tissue Typing Laboratory. Based on this information, HLA results at the second field were inferred using Haplostats from the National Marrow Donor Program [31]. The allele calls at each HLA locus were compared with those generated by the HLA Explore software and the OptiType software (for class I alleles) from the whole exome sequencing data.

High resolution SSP amplification (Olerup, Wien, Austria) and/or an allele-specific pharmacogenomic assay was performed to genotype the HLA-A and HLA-B locus of DNA samples corresponding to the second data set. Results from SSP typing were compared to the output from the Omixon HLA Explore software. For this data set that focused on pharmacogenomics, the HLA-A locus was evaluated for concordance of presence or absence of the HLA-A*31:01 allele; the HLA-B locus was evaluated for concordance of the presence or absence of the HLA-B*15:02, *57:01, and *58:01 alleles.

Finally, HLA Explore results from the third data set were evaluated for presence or absence of each allele of pharmacogenomic significance. All positive results and a subset of negative results from the NGS-based assay were compared to results of clinical assays performed in the Personalized Genomics Laboratory at Mayo Clinic designed to detect the HLA-A*31:01, HLA-B*15:02, HLA-B*57:01, and HLA-B*58:01 alleles. Each of these assays detects the presence or absence of the specific allele queried by use of real time PCR with SYBR Green chemistry and laboratory developed reagents (HLA-A*31:01 and HLA-B*57:01) or kits purchased from Pharmigene (Taipei City, Taiwan; HLA-B*15:02 and HLA-B*58:01). Each assay only detects presence or absence of the allele queried and does not provide information related to the alleles that are present at that locus if the assay is negative for the specific allele queried. These assays were validated against SSP genotyping and SSP typing was performed for several samples.

2.4. Data analysis

Results generated by the analysis of next-generation sequencing data using HLA Explore, HLA-Genotyper, or OptiType software were compared to the actual genotype as determined by SSO or SSP typing, or to the results of a pharmacogenomic assay designed to detect a specific allele. Settings for quality metrics were set to minimum levels to allow for analysis of data with short read length, smaller fragment size, and lower read count. Results were directly evaluated for concordance using Microsoft Excel.

3. Results

3.1. Data set 1: Germline whole exome sequencing of patients with breast cancer

The first data set was interrogated in detail at each of the HLA-A, B, C, DRB1, DQA1, and DQB1 loci. The command line filtering step recommended by the Omixon software developer was deemed to be necessary; when this step was omitted, the software was unable to process the data. Similarly, running RazerS3 was recommended prior to using OptiType as that would significantly reduce the run time of Optitype. The average coverage of each locus is shown in Fig. 2. Class I genes generally had higher coverage than class II genes. HLA-A had the highest average coverage (175.5 reads), while HLA-DRB1 showed the lowest coverage (41.1 reads). Coverage differences may in part be due to interference of homologous sequence resulting in filtering and exclusion of reads with low mapping quality.

Fig. 2.

Fig. 2.

Average coverage (number of reads) mapping to each HLA gene in exome sequencing data obtained for breast cancer research and not enriched for the HLA region.

Of the 68 germline samples, one sample (1.5%) had insufficient DNA that precluded SSO typing at the HLA-A and HLA-B loci, leaving a total of 812 class I and II alleles available for analysis with Omixon and HLA-Genotyper software and 404 class I alleles for the OptiType software (see Supplementary Table 1 for details). Using Omixon, 765 (94.2%) of 812 alleles were concordant to the second field. At the HLA-A locus, 133 (99.3%) of 134 alleles were concordant to the second field, while 1 (0.7%) allele was concordant to the first field (HLA-A*02:186 by NGS vs. HLA-A*02:02 by SSO). 121 (90.3%) of 134 HLA-B alleles were concordant to the second field, with an additional 3 (2.3%) alleles concordant to the first field, which were all HLA-B*44 alleles. One individual (0.7%) was homozygous for the HLA-B*14:02:01 allele (0.7%) by exome but heterozygous HLA-B*14:02:01/*44:02:01:01 by SSO HLA typing, one allele (0.7%) was a true mismatch (HLA-B*51:42 by NGS, but *44:03:01 by SSO typing), and 8 (6.0%) alleles involving 4 samples (each involving an HLA-B*44 allele) failed to produce a call for either allele from exome data, but a perfect or first field match was included in the full list of potential alleles (Fig. 1). At the HLA-C locus, 100% of the 136 alleles tested were concordant to the second field. Using the OptiType software, 402 (99.5%) of 404 class I alleles were concordant with SSO to the second field. The only two alleles that were not concordant were an HLA-C*07:18 call by both SSO and Omixon that was called *07:01 by OptiType, and a separate sample with an HLA-C*17:03 allele call by both SSO and Omixon that was called *17:01 by OptiType. The HLA-Genotyper software produced 769 (94.7%) results concordant with SSO typing to the second field of the 812 alleles tested. At the HLA-A locus, one sample had 1 discordant allele (A*26:09 vs. A*34:01 by SSO and the other software packages) and 1 allele concordant only to the first field (A*02:05 vs. A*02:02 by SSO and the other software). At the HLA-B locus, two alleles were concordant to only the first field (B*42:02 vs. B*41:01 by SSO and B*15:15 vs. B*15:21). At the HLA-C locus, one discordant allele call was identified (C*02:02, which was called C*04:03 by SSO), along with 9 alleles only concordant to the first field (four miscalls of C*03:04, which was called C*03:03 by SSO and other software; two miscalls of C*12:02 vs. C*12:03 by SSO and other software; C*02:02 vs. C*02:10 by SSO and other software; C*07:01 vs. C*07:18 and C*17:01 vs. C*17:03, both of which were miscalls by both Optitype and HLA-Genotyper).

Using the Omixon software, at the HLA-DRB1 locus, 126 (92.6%) of the 136 alleles were concordant to the second field, while 10 (7.4%) alleles (five of which were HLA-DRB1*07:01 alleles and three of which were *15:01 alleles) involving 7 samples failed by exome. The HLA-DQA1 locus showed the lowest concordance with 119 (87.5%) of 136 alleles concordant to the second field. An additional 15 (11.0%) alleles (involving HLA-DQA1*01, *03, and *05 alleles) were concordant to the first field. One sample was homozygous HLA-DQA1*01:01:01 by NGS, but HLA-DQA1*01:01:01/*03:01:01 by actual HLA typing. One HLA-DQA1*03:02 allele was miscalled HLA-DQA1*05:05:01 by NGS. Finally, 130 (95.6%) of the 136 HLA-DQB1 alleles were concordant, while 6 (4.4%) HLA-DQB1*02:02:01 alleles involving 6 separate samples that also contained an HLA-DQB1*06 allele failed to call by exome (in these cases, the software was able to call the HLA-DQB1*06 alleles for each sample, but produced no result for the *02 allele for each sample). Using the HLA-Genotyper software, 130 (95.6%) of the alleles were concordant to the second field, while 5 (3.7%) were concordant to the first field (four calls of DRB1*14:01 instead of *14:54 and one call of homozygous *11:04 vs. *11:01/*11:04), and 1 allele was discordant (sample called homozygous *13:01 vs. *07:01/*13:01 by SSO and Omixon). At the HLA-DQA1 locus, HLA-Genotyper produced 122 (89.7%) concordant calls, with an additional 14 (10.3%) concordant to the first field (including six calls of DQA1*01:04 vs. *01:01, three calls of *03:03 vs. *03:01, and two calls of *03:03 vs. *03:02). HLA-Genotyper calls were 93.4% (127 alleles) concordant with SSO to the second field at the HLA-DQB1 locus, with 8 (5.9%) of allele calls concordant to the first field (seven calls of DQB1*02:01 vs. *02:02 by SSO and *03:01 vs. *03:19 by SSO) and one sample that was homozygous DQB1*06:03 by HLA-Genotyper and DQB1*02:02/*06:03 by SSO and Omixon.

These data are summarized by locus in Table 1 and the full results are in Supplementary Table 1. A list of unique alleles identified in this cohort is available in Supplementary Table 2. All of the loci that are currently tested at our institution for pharmacogenomic purposes were 100% concordant to at least the second field between SSO, Omixon, HLA-Genotyper, and OptiType (HLA-A*31:01, 1 positive sample, 67 negative; HLA-B*15:02, 68 negative samples; HLA-B*57:01, 7 positive samples, 61 negative samples; and HLA-B*58:01, 1 positive sample, 67 negative samples).

Table 1.

Summary of concordance between actual HLA typing by SSO and predicted typing by next generation sequencing by locus. Number (percentage) of alleles (for 68 samples from Data Set 1: Germline Whole Exome Sequencing of Patients with Breast Cancer) is provided. A) Using Omixon software; B) Using OptiType software; C) Using HLA-Genotyper software. Full results are available in the Supplementary materials.

A. Omixon HLA-A HLA-B HLA-C HLA-DRB1 HLA-DQA1 HLA-DQB1 Total
Concordant to Second Field 133 (99.3) 121 (90.3) 136 (1 0 0) 126 (92.6) 119 (87.5) 130 (95.6) 765 (94.2)
Concordant to First Field 1 (0.7) 3 (2.3) 0 0 15 (11.0) 0 19 (2.3)
Homozygous by Exome 0 1 (0.7) 0 0 1 (0.7) 0 2 (0.2)
Discordant 0 1 (0.7) 0 0 1 (0.7) 0 2 (0.2)
Failed by Exome 0 8 (6.0) 0 10 (7.4) 0 6 (4.4) 24 (3.0)
Number of Alleles 134 134 136 136 136 136 812
B. OptiType HLA-A HLA-B HLA-C Total

Concordant to Second Field 134 (1 0 0) 134 (1 0 0) 134 (98.5) 402 (99.5)
Concordant to First Field 0 0 2 (1.5) 2 (0.5)
Number of Alleles 134 134 136 404
C. HLA-Genotyper HLA-A HLA-B HLA-C HLA-DRB1 HLA-DQA1 HLA-DQB1 Total

Concordant to Second Field 132 (98.5) 132 (98.5) 126 (92.6) 130 (95.6) 122 (89.7) 127 (93.4) 769 (94.7)
Concordant to First Field 1 (0.7) 2 (1.5) 9 (6.6) 5 (3.7) 14 (10.3) 8 (5.9) 39 (4.8)
Homozygous by Exome 0 0 0 1 (0.7) 0 1 (0.7) 2 (0.2)
Discordant 1 (0.7) 0 1 (0.7) 0 0 0 2 (0.2)
Number of Alleles 134 134 136 136 136 136 812

3.2. Data sets 2 & 3: Whole exome sequencing for clinical purposes and targeted NGS-based pharmacogenomic panels

The second and third data sets were only interrogated for concordance of pharmacogenomically-relevant alleles, specifically HLA-A*31:01, HLA-B*15:02, HLA-B*57:01, and HLA-B*58:01 (see Supplementary Table 3). For these data sets, all positive HLA calls made by the Omixon software on NGS data from either a whole exome or a targeted pharmacogenomic panel were confirmed by SSP and/or a laboratory developed test, along with confirmation of a subset of negative calls. There was 100% concordance for the positive/negative status of each pharmacogenomically-relevant allele between the Omixon/NGS call and the confirmatory methods. Specifically, this included 28 positive and 71 negative HLA-A*31:01 calls, 2 positive and 76 negative HLA-B*15:02 calls, 34 positive and 74 negative HLA-B*57:01 calls, and 9 positive and 73 negative HLA-B*58:01 calls. Together, there was 100% concordance for each locus when combining the 68 samples evaluated at all loci with the samples tested for pharmacogenomic purposes, including: 138 negative and 29 positive samples for HLA-A*31:01, 144 negative and 2 positive samples for HLA-B*15:02, 135 negative and 41 positive samples for HLA-B*57:01, and 140 negative and 10 positive samples for HLA-B*58:01.

3.3. HLA results from tumor specimens vs. germline

By SSO, all tumor samples were concordant with the corresponding germline sample, including those after NAC. Tumor samples showed 85.4% (82 out of 96 alleles tested) class I and class II concordance when typed by Omixon, 91.6% (88 of 96 alleles) class I and II concordance with HLA-Genotyper, and 100% (48 out of 48 alleles) class I concordance by OptiType as compared to SSO typing (summary in Table 2, full results in Supplementary Table 4). NGS was 100% concordant with actual typing for all HLA-A and HLA-C alleles to the 2nd field. HLA-B performed similarly to the germline samples, with 1 of the 5 pre-NAC tumor samples failing by Omixon (HLA-B*07:02:01/*44:02:01:01 by SSO); the OptiType and HLA-Genotyper calls for this sample were concordant with SSO. The HLA-B*07:02:01 allele was successfully called from 3 exomes by Omixon from this same tumor during and after NAC, but Omixon miscalled the B*44:02:01:01 allele as B*83:01 for 2 of those 3 exomes, while the calls by HLA-Genotyper and OptiType were concordant with SSO. In addition, a discordance was present by Omixon, but not OptiType or HLA-Genotyper, for 1 HLA-B allele in another of the 5 pre-NAC exomes (HLA-B*44:02:01:01 by SSO vs. HLA-B*83:01 by NGS), and 1 of the 5 pre-NAC exomes was concordant to only the first field for one HLA-B allele (HLA-B*44:02:01:01 by SSO vs. HLA-B*44:06 by NGS) by Omixon. HLA-DRB1 calls were concordant to the second field, except 1 allele (HLA-DRB1*07:01:01:01), which failed to call in one pre-NAC exome, and in an exome obtained from the same tumor at time of recurrence, and in a post-treatment sample from the same tumor, the Omixon software called the sample homozygous DRB1*13:01:01 rather than heterozygous *07:01:01:01/*13:01:01; the HLA-Genotyper software similarly miscalled this sample homozygous *13:01. HLA-DQA1 calls were concordant to the second field, with the exception of 1 allele in each of two pre-NAC specimen that was concordant to the first field, similar to the corresponding germline exome (DQA1*01:05:02 by Omixon and *01:04 by HLA-Genotyper vs. *01:01:01 by SSO; DQA1*01:01 by HLA-Gentyper vs. *01:05:01 by SSO and Omixon). Finally, calls for HLA-DQB1 were concordant to the second field, with the exception of one tumor that failed to call the DQB1*02:02:01 allele by Omixon for the pre-NAC specimen and the tumor at recurrence, while the post-NAC specimen was miscalled as homozygous *06:03 by both Omixon and HLA-Genotyper, also missing the *02:02:01 allele. The *02:02:01 allele was also miscalled as *02:01 for all other specimens from this tumor by the HLA-Genotyper software. In addition, a different pre-NAC specimen that included a DRB1*02:02:01 allele was also miscalled as *02:01 by HLA-Genotyper.

Table 2.

Summary of concordance of HLA calls generated by actual HLA typing (SSO) and predicted typing by next generation sequencing by locus for tumor specimens of each of the 5 subjects. The tumor from subject A was typed at 4 different time points. Full data is available in the Supplementary materials.

Subject Specimen HLA-A HLA-B HLA-C HLA-DRB1 HLA-DQA1 HLA-DQB1
A Pre-NAC Concordant Omixon – No Call (both alleles); Optitype & HLA-Genotyper – Concordant Concordant Omixon - No Call (one allele); HLA-Genotyper - Concordant Concordant Omixon - No Call (one allele); HLA-Genotyper-Concordant
A Mid-NAC Concordant Omixon – discordant (one allele); Optitype & HLA-Genotyper – Concordant Concordant Concordant Concordant Concordant
A Post-NAC Concordant Concordant Concordant Omixon – Homozygous discordant; HLA-Genotyper – Homozygous Discordant Concordant Omixon - Homozygous discordant; HLA-Genotyper - Homozygous Discordant
A Recurrence Concordant Omixon – discordant (one allele); Optitype & HLA-Genotyper – Concordant Concordant Omixon - No Call (one allele); HLA-Genotyper - Concordant Concordant Omixon - No Call (one allele); HLA-Genotyper - Concordant
B Pre-NAC Concordant Omixon – discordant (one allele); Optitype & HLA-Genotyper – Concordant Concordant Concordant Concordant Concordant
C Pre-NAC Concordant Concordant Concordant Concordant Omixon - Concordant to 1st field (one allele); HLA-Genotyper - Concordant Concordant
D Pre-NAC Concordant Concordant Concordant Concordant Omixon - Concordant to 1st field (one allele); HLA-Genotyper - Concordant Omixon Concordant; HLA-Genotyper - Concordant to 1st field (one allele)
E Pre-NAC Concordant Omixon - Concordant to 1st field (one allele); Optitype & HLA-Genotyper – Concordant Concordant Concordant Concordant Concordant

4. Discussion

The use of next-generation sequencing, including large panels, exome sequencing, and genome sequencing is increasing for both clinical diagnostic purposes, as well as for research purposes. At the same time, interest in pre-emptive pharmacogenomic testing is also increasing. Therefore, it is important to understand the feasibility of making pharmacogenomic calls from this type of data for future clinical and research use. Several large studies have explored some pharmacogenes from NGS data [27,32]; however, the HLA region is particularly complex and has not been the focus of prior studies.

In our study, we found that although the datasets were not specifically enriched for the HLA region, there was sufficient coverage to call HLA alleles, and in most cases the calls from each of the three software packages were concordant with gold standard methods. We found that using both the Omixon and HLA-Genotyper software, HLA class I genes generally had higher concordance (96.5% for both Omixon and HLA-Genotyper) than class II (91.9% for Omixon and 92.9% for HLA-Genotyper). Certain HLA-B alleles proved more difficult for the Omixon software, particularly HLA-B*44 alleles. The HLA class II alleles were slightly more likely to fail to call by the Omixon software (3.9% vs. 2.0% for class I), mostly for the DRB1 and DQB1 loci or to be concordant to only the first field (HLA-DQA1). Similarly, while HLA-Genotyper always provided a call, class II alleles were more likely to be concordant to only the first field than class I alleles (6.6% vs. 3.0%). This may in part be due to the GC-rich nature of the HLA-DQ regions as well as homology with other genes. In most cases where Omixon and clinical typing were not concordant, other potential HLA alleles were predicted by the software and this list included the allele called by clinical typing. All of the additional alleles predicted by the Omixon software are considered rare alleles while the alleles that may be identified by clinical typing are common well-defined alleles as described by the catalogue of common and well-documented HLA alleles [33]. Therefore, in this setting, having HLA expertise may allow for selection of the most likely call based on known HLA linkage disequilibrium and the alleles present at other loci. In contrast to the Omixon and HLA-Genotyper software, OptiType only calls class I alleles. For both Optitype and HLA-Genotyper, output is to the second field, and by default the two most likely predicted alleles at each locus are resulted; however, there are settings that can be changed to allow for the software to show other, but less likely, potential alleles. The Omixon software had a user-friendly interface, whereas both HLA-Genotyper and Optitype have command-line interfaces, and provides results beyond the second field in some cases. While software performed well, HLA expertise is essential for the selection of samples that require confirmation by an alternate technique (i.e. due to an unlikely haplotype) and use of an alternate technique to fill in gaps in data when necessary.

For the alleles that are currently considered to have strong pharmacogenomic associations (i.e. HLA-A*31:01, HLA-B*15:02, HLA-B*57:01, HLA-B*58:01), allele calling from NGS, including exomes, was robust using Omixon, HLA-Genotyper, or OptiType software in the determination of presence or absence of specific HLA alleles with 100% concordance with methodology currently in use clinically.

Malignant cells are known to acquire HLA mutations and/or down-regulate HLA expression [3436], which may impact response to therapy; therefore, we also evaluated the possibility of determining HLA genotype in tumor tissue. While none of the tumor tissue selected differed from the germline genotype by SSO typing, in our small sample we noted slightly more failures in genotyping, which may be explained by the lower quality of DNA isolated from formalin-fixed paraffin-embedded tissue and/or small frozen tissue samples than for DNA isolated from blood. In addition, 3 of the 5 tumors had an HLA-B*44 allele, which was also difficult to call using the Omixon software from NGS from germline exomes. Although in our small sample we obtained HLA results from tumor tissue consistent with those obtained from blood specimens, results may not match germline due to loss of heterozygosity and/or acquisition of somatic mutations in the tumor.

Our study was limited by use of only three platforms for HLA typing. However, we were able to demonstrate the feasibility of HLA typing from germline ES samples. Therefore, other platforms may also be able to successfully determine HLA type from similar data sets. In addition, our population was predominantly Caucasian. Some software packages may use linkage as part of the algorithm to predict genotypes. Therefore, it is unknown whether similar results would be obtained using exome data from other racial groups.

Overall, this study demonstrated that reasonably concordant HLA typing results can be obtained from whole exome data, even if it was not specifically generated for the purpose of HLA typing and has suboptimal read depth and read length. Use of this data required filtering to exclude reads not mapping to the HLA region. Depending on the software, several specific alleles presented a greater challenge and may require additional confirmation. Performing this type of analysis in a laboratory with HLA expertise would be beneficial for selection of the most likely allele when multiple options are presented by the software and for additional testing for confirmation or to fill in gaps. HLA expertise is particularly necessary if a definitive allele is to be called using a software package, rather than positive/negative status for a specific allele(s).

This work has broad implications spanning future clinical care (pharmacogenomics, response of tumors to immunotherapy, autoimmunity, etc.) and research applications, in part because this approach to generate accurate HLA typing results can be applied with little added cost (the cost of the software itself if using a commercial package and personnel to evaluate results) to existing data sets. On the research side, this may allow for additional studies to uncover novel associations between HLA alleles and adverse medication reactions, human disease, response to immunotherapies, and other phenotypes. Clinically, this work demonstrates that HLA typing is feasible from clinical exomes for diverse purposes from disease associations to pharmacogenomics.

Supplementary Material

1
2
3
4

Acknowledgements

We would like to thank Laura Train, Kate Kotzer, Michelle Kluge, Susan Lagerstedt, Mary Beth Karow, Kimberley Harris, Amy Barthel, Charles Kremer, Brenda Moore, Sandra Peterson, Linnea Baudhuin, and the RIGHT study team for their contributions to the pharmacogenomics data. We would like to thank the BEAUTY study team for their contribution to the whole exome sequencing data.

Funding

The BEAUTY study is funded in part by the Mayo Clinic Center for Individualized Medicine; Nadia’s Gift Foundation; John P. Guider; the Eveleigh Family; George M. Eisenberg Foundation for Charities; generous support from Afaf Al-Bahar; and the Pharmacogenomics Research Network (PGRN). Other contributing groups include the Mayo Clinic Cancer Center (MPG) and the Mayo Clinic Breast Specialized Program of Research Excellence (SPORE) (MPG and KK).

Abbreviations:

EHR

electronic health record

ES

exome sequencing

GS

genome sequencing

HLA

human leukocyte antigen

NGS

next generation sequencing

SSO

sequence-specific oligonucleotide

SSP

sequence-specific primer

SBT

sequence based typing

NAC

neoadjuvant chemotherapy

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Appendix A. Supplementary data

Supplementary data to this article can be found online at https://doi.org/10.1016/j.humimm.2020.06.002.

References

  • [1].Wetterstrand KA, Data from the NHGRI Genome Sequencing Program (GSP). Available at: http://www.genome.gov/sequencingcostsdata. Accessed 4/11/2020.
  • [2].Robbins R, Top U.S. Medical Centers Roll Out DNA Sequencing Clinics for Healthy Clients: Patients can pay hundreds to thousands of dollars to screen for genetic health risks. STAT https://www.statnews.com/2019/08/16/top-u-s-medical-centers-roll-out-dna-sequencing-clinics-for-healthy-and-often-wealthy-clients/, 8/16/2019. Accessed 4/11/2020.
  • [3].Haga SB, Mills R, Moaddeb J, Allen Lapointe N, Cho A, Ginsburg GS, Patient experiences with pharmacogenetic testing in a primary care setting, Pharmacogenomics 17 (2016) 1629. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Mukherjee C, Sweet KM, Luzum JA, Abdel-Rasoul M, Christman MF, Kitzmiller JP, Clinical pharmacogenomics: patient perspectives of pharmacogenomic testing and the incidence of actionable test results in a chronic disease cohort, Per. Med 14 (2017) 383. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Lemke AA, Hulick PJ, Wake DT, Wang C, Sereika AW, Yu KD, et al. , Patient perspectives following pharmacogenomics results disclosure in an integrated health system, Pharmacogenomics 19 (2018) 321. [DOI] [PubMed] [Google Scholar]
  • [6].Hershfield MS, Callaghan JT, Tassaneeyakul W, Mushiroda T, Thorn CF, Klein TE, et al. , Clinical Pharmacogenetics Implementation Consortium guidelines for human leukocyte antigen-B genotype and allopurinol dosing, Clin. Pharmacol. Ther 93 (2013) 153. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Martin MA, Klein TE, Dong BJ, Pirmohamed M, Haas DW, Kroetz DL, et al. , Clinical pharmacogenetics implementation consortium guidelines for HLA-B genotype and abacavir dosing, Clin. Pharmacol. Ther 91 (2012) 734. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].Phillips EJ, Sukasem C, Whirl-Carrillo M, Muller DJ, Dunnenberger HM, Chantratita W, et al. , Clinical pharmacogenetics implementation consortium guideline for HLA genotype and use of carbamazepine and oxcarbazepine: 2017 update, Clin. Pharmacol. Ther 103 (2018) 574. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Karnes JH, Miller MA, White KD, Konvinse KC, Pavlos RK, Redwood AJ, et al. , Applications of immunopharmacogenomics: predicting, preventing, and understanding immune-mediated adverse drug reactions, Annu. Rev. Pharmacol. Toxicol (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Robinson J, Halliwell JA, Hayhurst JD, Flicek P, Parham P, Marsh SG, The IPD and IMGT/HLA database: allele variant databases, Nucl. Acids Res 43 (2015) D423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Holdsworth R, Hurley CK, Marsh SG, Lau M, Noreen HJ, Kempenich JH, et al. , The HLA dictionary 2008: a summary of HLA-A, -B, -C, -DRB1/3/4/5, and -DQB1 alleles and their association with serologically defined HLA-A, -B, -C, -DR, and -DQ antigens, Tissue Antigens 73 (2009) 95. [DOI] [PubMed] [Google Scholar]
  • [12].Adams SD, Barracchini KC, Simonis TB, Stroncek D, Marincola FM, High throughput HLA sequence-based typing (SBT) utilizing the ABI Prism 3700 DNA Analyzer, Tumori 87 (2001) S40. [PubMed] [Google Scholar]
  • [13].Itoh Y, Mizuki N, Shimada T, Azuma F, Itakura M, Kashiwase K, et al. , High-throughput DNA typing of HLA-A, -B, -C, and -DRB1 loci by a PCR-SSOP-Luminex method in the Japanese population, Immunogenetics 57 (2005) 717. [DOI] [PubMed] [Google Scholar]
  • [14].Hurley CK, Spellman S, Dehn J, Barker JN, Devine S, Fernandez-Vina M, et al. , Regarding “Recipients receiving better HLA-matched hematopoietic cell transplantation grafts, uncovered by a novel HLA typing method, have superior survival: a retrospective study”, Biol Blood Marrow Transplant 25 (2019) e268. [DOI] [PubMed] [Google Scholar]
  • [15].Schofl G, Lang K, Quenzel P, Bohme I, Sauter J, Hofmann JA, et al. , 2.7 million samples genotyped for HLA by next generation sequencing: lessons learned, BMC Genomics 18 (2017) 161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Bentley G, Higuchi R, Hoglund B, Goodridge D, Sayer D, Trachtenberg EA, et al. , High-resolution, high-throughput HLA genotyping by next-generation sequencing, Tissue Antigens 74 (2009) 393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Nelson WC, Pyo CW, Vogan D, Wang R, Pyon YS, Hennessey C, et al. , An integrated genotyping approach for HLA and other complex genetic systems, Hum. Immunol 76 (2015) 928. [DOI] [PubMed] [Google Scholar]
  • [18].Shiina T, Suzuki S, Ozaki Y, Taira H, Kikkawa E, Shigenari A, et al. , Super high resolution for single molecule-sequence-based typing of classical HLA loci at the 8-digit level using next generation sequencers, Tissue Antigens 80 (2012) 305. [DOI] [PubMed] [Google Scholar]
  • [19].Lee SJ, Klein J, Haagenson M, Baxter-Lowe LA, Confer DL, Eapen M, et al. , High-resolution donor-recipient HLA matching contributes to the success of unrelated donor marrow transplantation, Blood 110 (2007) 4576. [DOI] [PubMed] [Google Scholar]
  • [20].Holoshitz J, The quest for better understanding of HLA-disease association: scenes from a road less travelled by, Discov. Med 16 (2013) 93. [PMC free article] [PubMed] [Google Scholar]
  • [21].Pavlos R, Mallal S, Phillips E, HLA and pharmacogenetics of drug hypersensitivity, Pharmacogenomics 13 (2012) 1285. [DOI] [PubMed] [Google Scholar]
  • [22].Sanchez-Mazas A, Meyer D, The relevance of HLA sequencing in population genetics studies, J. Immunol. Res 2014 (2014) 971818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [23].Chowell D, Morris LGT, Grigg CM, Weber JK, Samstein RM, Makarov V, et al. , Patient HLA class I genotype influences cancer response to checkpoint blockade immunotherapy, Science 359 (2018) 582. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [24].Gandhi MJ, Ferriola D, Huang Y, Duke JL, Monos D, Targeted next-generation sequencing for human leukocyte antigen typing in a clinical laboratory: metrics of relevance and considerations for its successful implementation, Arch. Pathol. Lab. Med 141 (2017) 806. [DOI] [PubMed] [Google Scholar]
  • [25].Goetz MP, Kalari KR, Suman VJ, Moyer AM, Yu J, Visscher DW, et al. , Tumor sequencing and patient-derived xenografts in the neoadjuvant treatment of breast cancer, J. Natl Cancer Inst 109 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [26].Lindor NM, Schahl KA, Johnson KJ, Hunt KS, Mensink KA, Wieben ED, et al. , Whole-Exome Sequencing of 10 Scientists: Evaluation of the Process and Outcomes, Mayo Clin. Proc 90 (2015) 1327. [DOI] [PubMed] [Google Scholar]
  • [27].Ji Y, Skierka JM, Blommel JH, Moore BE, VanCuyk DL, Bruflat JK, et al. , Preemptive pharmacogenomic testing for precision medicine: a comprehensive analysis of five actionable pharmacogenomic genes using next-generation DNA sequencing and a customized CYP2D6 genotyping cascade, J. Mol. Diagn 18 (2016) 438. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Bielinski SJ, St Sauver JL, Olson JE, Larson NB, Black JL, Scherer SE, et al. , Cohort profile: the right drug, right dose, right time: using genomic data to individualize treatment protocol (RIGHT Protocol), Int. J. Epidemiol (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Szolek A, Schubert B, Mohr C, Sturm M, Feldhahn M, Kohlbacher O, OptiType: precision HLA typing from next-generation sequencing data, Bioinformatics 30 (2014) 3310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Farrell JJ, Jun G, Farrer LA, DeStefano A, Sebastiani P, HLA-Genotyper Prediction of HLA Genotypes from Next Generation Sequencing Data. 64th Annual Meeting of the American Society of Human Genetics San Diego, CA, 2014. [Google Scholar]
  • [31].HaploStats http://www.haplostats.org. Accessed 4/11/2020.
  • [32].Bush WS, Crosslin DR, Owusu-Obeng A, Wallace J, Almoguera B, Basford MA, et al. , Genetic variation among 82 pharmacogenes: the PGRNseq data from the eMERGE network, Clin. Pharmacol. Ther 100 (2016) 160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [33].Mack SJ, Cano P, Hollenbach JA, He J, Hurley CK, Middleton D, et al. , Common and well-documented HLA alleles: 2012 update to the CWD catalogue, Tissue Antigens 81 (2013) 194. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [34].Shukla SA, Rooney MS, Rajasagi M, Tiao G, Dixon PM, Lawrence MS, et al. , Comprehensive analysis of cancer-associated somatic mutations in class I HLA genes, Nat. Biotechnol 33 (2015) 1152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [35].Hicklin DJ, Marincola FM, Ferrone S, HLA class I antigen downregulation in human cancers: T-cell immunotherapy revives an old story, Mol. Med. Today 5 (1999) 178. [DOI] [PubMed] [Google Scholar]
  • [36].Garrido F, Ruiz-Cabello F, Aptsiauri N, Rejection versus escape: the tumor MHC dilemma, Cancer Immunol. Immunother 66 (2017) 259. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1
2
3
4

RESOURCES