Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2023 Feb 24;18(2):e0274339. doi: 10.1371/journal.pone.0274339

Pharmacogenetic allele variant frequencies: An analysis of the VA’s Million Veteran Program (MVP) as a representation of the diversity in US population

Kyriacos Markianos 1,#, Frederic Dong 1,#, Bryan Gorman 1, Yunling Shi 1, Daniel Dochtermann 1, Uma Saxena 1, Poornima Devineni 1, Jennifer Moser 2, Sumitra Muralidhar 2, Rachel Ramoni 2, Philip Tsao 3, Saiju Pyarajan 1,*, Ronald Przygodzki 2,*; for the Million Veteran Program
Editor: Hoh Boon-Peng4
PMCID: PMC9956596  PMID: 36827430

Abstract

We present allele frequencies of pharmacogenomics relevant variants across multiple ancestry in a sample representative of the US population. We analyzed 658,582 individuals with genotype data and extracted pharmacogenomics relevant single nucleotide variant (SNV) alleles, human leukocyte antigens (HLA) 4-digit alleles and an important copy number variant (CNV), the full deletion/duplication of CYP2D6. We compiled distinct allele frequency tables for European, African American, Hispanic, and Asian ancestry individuals. In addition, we compiled allele frequencies based on local ancestry reconstruction in the African-American (2-way deconvolution) and Hispanic (3-way deconvolution) cohorts.

Introduction

Genetic polymorphisms of metabolic pathways and cytochrome P450 (CYP) genes are associated with altering pharmacokinetics and pharmacodynamics of the absorption, distribution, metabolism and excretion (ADME) of drug and toxic compounds (xenobiotics). Gaining a better understanding of the interindividual variations of this genetic makeup is necessary to understand the metabolic rate of efficiency a xenobiotic is metabolized. In general, heritable selective pressure is a major determinant of variant frequency among the different ethnic populations, typically presenting with two or more variants identified in most metabolic pathway genes. Common star allelic variants (referred to herein as “variant”) prescribe a “normal” metabolic cycle while others convey a heightened or depressed metabolic cycle. Much of this is well catalogued in a variety of collections, including PharmGKB (https://www.pharmgkb.org/), with clinically actionable variant-vs-drug combinations presented in the Clinical Pharmacogenetics Implementation Consortium (CPIC https://cpicpgx.org/) and the Dutch Pharmacogenetics Working Group (DPWG http://upgx.eu/guidelines). While these variants are catalogued in the multitude of databases, it is also important to recognize that many of the variants identified heavily rely upon data derived from unique ethnic populations. Ethnic population data are typically derived from a limited collection of self-identified subjects and the unique variants associated within that ethnicity. Moreover, certain variants designated as normal are unique to a select ethnicity and not represented among others, such as is known for CYP2D6 and codeine, or CYP3A5 and Tacrolimus [1]. Lastly, while certain populations are considered relatively homogenous over several generations as dictum of culture, not all data is reflective of this consideration which further contributes to the diversity of drug responses.

The distribution of inherited xenobiotic-metabolizing alleles differs considerably between populations [2, 3] and appears to be rigid in frequency among ethnically stable populations. While there are several large-scale data sets that can provide variant frequencies of pharmacogenomic genes for researchers and clinicians to use, most of these data are not representative of the “melting pot” of the genetic ancestries present within the United States. (US). While one could rely on self-reported ethnicity to improve variant frequency found among unique populations, such data is imperfect [4]. Further complicating possible variant predictions is that nearly everyone has at least one pharmacogenomic variant allele with as many as 3% carrying 5 allelic variants [5]. These findings limit the overarching use of ethnically related variant frequencies in diverse populations such as is present in the US. This is because the data available is limited to a specific self-reported ethnicity and/or does not consider other variants that could be present among other ethnicities. This is a particularly important consideration for research of personalized drug therapy and potentially changes the healthcare guidelines provided by groups such as CPIC and DPWG that select important alleles for clinical genotyping based in part on population prevalence.

To address the issues with the use of pharmacogenomic variants and to further explore possibly pharmacogenetically-associated variant markers we used the Million Veteran Program (MVP) [6] with >800,000 participants to generate a coherent representation of allelic frequencies present within a US population. The MVP cohort is mostly male but is very diverse and represents the US population ancestry in general. The genotype data was imputed using the African Genome Resources (AGR) and 1000 Genomes imputation reference panels.

Results

Our analysis is based on the Release 4 of MVP data with 658,582 individuals genotyped with the MVP-1 Axiom array [7]. Participants were assigned ancestry based on the HARE algorithm (Harmonized Ancestry and Race/Ethnicity) [8]. The MVP cohort is diverse with ~30% of the cohort assigned as non-European (EUR 467k, AFR 125k, HIS 52k, ASN 8k). A small fraction of the cohort was highly admixed and not assigned to any of the four major ancestries and is not included in this analysis (<2%).

Our aim is to provide ancestry specific variant frequency catalog for a significant fraction of pharmacogenomics relevant variants in a large cohort representative of the US population. We examined Single Nucleotide Variants (SNV), an important pharmacogenetics relevant Copy Number Variant (CNV) as well as Human Leukocyte Antigen (HLA) 4-digit alleles. We defined our pharmacogenomics gene set by combining information from two publicly available data bases, PharmGKB and PharmVar (Methods). Overall, we were able to determine SNV frequencies for 273/1339 targeted SNVs, in 148/152 targeted genes. Details on variant selection can be found in Methods. S1 File provides a comprehensive table of all allele frequencies. As expected, SNV allele frequencies vary substantially among HARE groups (Fig 1).

Fig 1.

Fig 1

(a) Allele frequency distributions in different ethnic groups of 282 variants in 153 genes (b) differences in allele frequencies relative to EUR samples.

In addition to HARE allele frequencies, we used Local Ancestry Inference (LAI) to identify ancestral origin of individual chromosomal segments and compute allele frequencies based on the local ancestry. We “painted” the African American samples (125 k individuals) using two-way deconvolution, extracting allele frequencies for the AFR and EUR tracks. For the Hispanic individuals (52 k) we used three-way deconvolution to compute allele frequencies for EUR, AFR and Native American (AMR) tracks. Details on the LAI will be presented elsewhere. In Fig 2 we present allele frequencies derived from HARE groups, LAI as well as two publicly available databases, 1 k genome and gnomAD (Methods). The most striking differences are observed for Hispanics, a group that is extremely heterogeneous and not well defined in the genetics literature.

Fig 2. CYP2D6 allele frequencies for three MVP HARE groups, MVP Local Ancestry Inference (LAI) and two publicly available data sets: 1000 genome project and gnomAD.

Fig 2

Only consensus Tier 1 alleles [9] are shown. We note that we did not perform LAI in HARE EUR samples. Thus LAI frequencies are identical by default to HARE EUR frequencies. For HARE HIS, LAI corresponds to the AMR track allele frequencies (3-way deconvolution). We estimate imputation quality per site and ethnicity. We do not present allele frequencies for poorly imputed sites.

Allele frequencies for the three major MVP HARE groups (EUR, AFR, HIS) are in good agreement with gnomAD derived estimates. However, comparison of gnomAD HIS frequencies with the LAI AMR track of the MVP HIS population reveals significant differences (S1 Fig). Here we note that the LAI AMR track provides much better allele frequency ascertainment than the 60 AMR genomes that were used to anchor the local ancestry deconvolution. In the MVP HARE HIS population (52k individuals) the AMR track contributes ~30% of the genome resulting in an effective population size of ~15k individuals. Furthermore, while the 60 AMR genomes we used to anchor ancestry deconvolution provide sufficient multi-locus information to resolve local ancestry, the AMR track is a much better sampling of the AMR genome as it exists today in the US population.

Sirolimus is a widely used immunosuppressant and the variant controlling its metabolism, rs2242480 (allele CYP3A4*1G) [10], varies among populations. We observe widely different allele frequencies in the three major groups (EUR, AFR, HIS) for gnomAD (0.09, 0.74, 0.37) and MVP (0.09, 0.73, 0.35). However, the local ancestry derived AMR allele frequency (0.67) is almost twice as high as the HIS allele frequency. The same observation applies to tacrolimus, another significant immunosuppressant. The controlling variant, rs776746 (CYP3A5*3), shows large variation in major MVP groups (0.07, 0.70, 0.21) and there is a significant difference between HIS and local ancestry derived AMR allele frequency (0.31). Thus, recent demographic history of individuals, and the fraction of inheritance derived from different major population groups, has a large impact on the allele frequency distribution of pharmacogenomics relevant variants.

CYP2D6 is an important component of cytochrome P450 and is involved in the metabolism of many commonly prescribed medications, including antidepressants, antipsychotics, beta-blockers, opioids, antiemetics, atomoxetine, and tamoxifen [9, 11]. In addition to SNV frequencies for the most significant variants [12], Fig 2 presents allele frequencies for an important copy number variant, the whole gene deletion designated as CYP2D6*5 in the pharmacogenomics literature (Figs 2 and 3). We called the CYP2D6*5 CNV using UMAP, a machine learning algorithm [13]. Assignments are clearly separated for copy gain and copy loss. Furthermore, we can clearly separate single and double copy loss (Fig 3). The UMAP approach offers a clear advantage over classification based on Principal Components Analysis (PCA, S2 Fig). We note that UMAP does not represent a general approach to copy number variation detection. Hyper-parameters for the model are tuned for the specific, relatively common CNVs. Furthermore, we achieve optimal performance only when we tune the model separately for individual HARE ancestries.

Fig 3. Copy number variation in CYP2D6.

Fig 3

Results are shown just for the HARE AFR cohort; clusters were derived using UMAP [13].

Table 1A presents CYP2D6*5 allele frequencies for the three major HARE groups. The major survey of CYP2D6*5 [14] finds slightly different allele frequencies, e.g., 89% in Beoris et al vs 78% in MVP for copy number 2 EUR. Our findings are closer to the frequencies reported by gnomAD (80%, MCNV_22_1026 | gnomAD SVs v2.1 | gnomAD (broadinstitute.org). The differences might be due to different assays: single site PCR vs SNP genotyping (MVP) or sequencing (gnomAD), or differences in ascertainment of ethnic background. We are not able to run the UMAP algorithm on phased chromosomes, but we can use ancestry deconvolution and test CNV status in individuals ancestry-homozygous at CYP2D6, e.g. AMR/AMR individuals in the HARE HIS group. Results are shown in Table 1B. The ancestral AMR genome harbors fewer single-copy samples while the EUR tracks are in close agreement with observations in the EUR HARE cohort.

Table 1. Allele frequencies for allele CYP2D6*5 (whole gene deletion) in different HARE groups.

In addition, we show allele frequencies in the three components of the HARE HIS group (EUR, AFR, AMR) calculated using individuals ancestry-homozygous at CYP2D6.

HARE group HARE HIS, ancestry homozygous segments
Copies EUR AFR HIS EUR AFR AMR
0 0.14 0.37 0.14 0.16 0.31 0.03
1 6.25 10.70 5.71 5.93 9.87 4.47
2 78.40 76.90 82.50 77.01 79.78 93.05
3+ 14.80 11.96 11.58 16.89 10.03 2.45

In addition to SNVs and specific CNVs we derived HLA alleles from SNP genotypes using the HIBAG algorithm [15] (HLA Imputation using attribute BAGging). Although HLA status does not modify pharmacokinetics there are well established adverse drug reactions in the presence of specific HLA alleles. For example, abacavir, a common anti-retroviral, causes abacavir hypersensitivity syndrome in the presence of HLA-B*5701; Allopurinol is typically a safe drug for the treatment of gout but in the presence of HLA-B*5801 is associated with an increased risk for allopurinol induced severe cutaneous adverse drug reaction (SCAR) with most serious cases developing Stevens–Johnson syndrome and toxic epidermal necrolysis (SJS/TEN) [16]. HLA 4-digit Class I and Class II allele distribution for four HARE groups is shown in Fig 4. As expected, allele frequencies are highly variable in the four groups, including the three alleles most relevant for pharmacogenomics: HLA-A*3101, HLA-B*5701 and HLA-B*58:01 (Table 2). Details of HLA allele imputation will be presented elsewhere. Here, we note that HLA imputation precision was >90% for HARE EUR, AFR and HIS groups. However, we currently observe lower precision for the ASN predictions due to lack of an appropriate training set.

Fig 4. HLA allele distribution in 4 ethnic groups.

Fig 4

Allele imputation was performed through HIBAG using the Axiom UK Biobank model and Axiom MVP genotypes.

Table 2. Allele frequencies for HLA alleles with known large effects in pharmacogenomics.

Allele EUR AFR HIS ASN
HLA-A*31:01 2.75 0.82 5.08 2.21
HLA-B*57:01 3.73 0.79 1.41 0.75
HLA-B*58:01 0.68 4.07 1.23 4.45

Discussion

We present a survey of pharmacogenetics relevant variants in the MVP, a sample representative of the US population. Using the MVP-1 Axiom array we can resolve a large fraction of known pharmacogenomics alleles, either as direct or as imputed genotypes. In addition, we use the genotypes to derive population allele frequencies for an important common CNV, whole gene deletion/duplication of CYP2D6, as well as population distribution of HLA alleles, including HLA alleles important for drug delivery decisions.

As expected, there is substantial variation in allele frequencies between ancestry groups for a subset of the examined variants. In addition to allele frequencies of individual ancestry groups (HARE EUR, AFR, HIS, ASN) we use an innovative approach, LAI, to derive allele frequencies of ancestral genomes present in recently admixed US populations; 2-way deconvolution for HARE AFR (EUR, AFR) and 3-way deconvolution for HARE HIS (EUR, AFR, AMR).

LAI conveys important information on allele frequency distribution in under-represented populations. Allele frequency is an important consideration in the formulation of clinical genotyping guidelines provided by groups such as CPIC and DPWG. The AMR track is a much better sampling of the AMR genome as it exists today in the US population compared to the small, and not necessarily representative, number of samples from AMR populations (60 vs 15,000 effective genomes). Sites with significantly different allele frequencies in AMR and EUR/AFR tracks are sites where self-identification as HIS provides limited power to guess likelihood of drug sensitivity. Thus, they are sites where groups such as CPIC and DPWG should rely on LAI minimum allele frequency rather than ethnic group allele frequency for recommendations. The large sample size we use for our analysis is particularly important for low frequency variants. For example, single and double-copy deletions of CYP2D6 are relatively rare. Therefore, it is inappropriate to derive frequencies from a reference a panel, even under the assumption that the reference panel is representative of the general US population.

There are limitations in our derived population allele frequencies. While SNVs are phased neither CNV nor HLA calls are phased genotypes. Furthermore, successful phasing in the overall genome does not guarantee successful phasing in complex genomic regions such as CYP2D6. For CYP2D6 in particular, we have been able to resolve whole gene deletions and duplications, but we are certain that there is additional small scale copy number variation that cannot be resolved by our UMAP machine learning approach. For example, small deletions and complex rearrangements involving the proximal CYP2D7 and CYP2D8 pseudogenes. It is likely that such complex variation has a minor contribution to the population distribution of CYP2D6 pharmacogenetics. However, resolution of population level frequency of such variants will require specialized assays such as long-range sequencing. Improving phasing and imputation will aid the eventual derivation of star alleles in these regions.

We think that this comprehensive allele frequency report in a population representative of the US genome diversity will become a useful reference for future guidelines of relative importance of alleles worth ascertaining in pharmacogenetics screens. High variance of allele frequency, not only among ethnic groups but most importantly among ancestral genomes contributing to mixed ancestry individuals in the US population, further underscores the need for individual typing rather than reliance on self-reported ethnicity on drug delivery decisions in clinical practice. We hope this manuscript promotes the adoption of personalized medicine in under-represented populations.

Methods

Ethics statement

The Veterans Affairs (VA) central institutional review board (cIRB) and site-specific IRBs approved the Million Veteran Program study.

MVP genotype data

The MVP Release 4 dataset includes 658,582 individuals and consists of a hard-called dataset of 667,955 variants prepared as described in Hunter-Zinck et al. 2020 [7], as well as an imputed dataset. Genotype calls passing initial quality control were further prepared for phasing and imputation by removing markers with high missingness (>20%), monomorphic markers, and markers significantly out of Hardy-Weinberg equilibrium (p < 1e-6 adjusted for ancestry). Haplotypes were then statistically phased using SHAPEIT v4.1.3 (https://odelaneau.github.io/shapeit4/) and imputed into the African Genome Resources and 1000 Genomes imputation panels using Minimac4 (https://genome.sph.umich.edu/wiki/Minimac4). Each individual in the cohort was assigned a HARE group (EUR, AFR, HIS, or ASN), a surrogate variable for ancestry and race/ethnicity (Fang et al. 2019). The MVP Release 4 cohort consists of 467,162 EUR, 124,756 AFR, 52,423 HIS, 8,364 ASN, and 5,877 unassigned individuals. All analysis was performed in GRCh37.

Identification of known pharmacogenetics variants

We curated a catalogue of known or high-confidence pharmacogenetics variants by rsID from the PharmGKB and PharmVar databases. From PharmGKB, we downloaded variant summary data (https://api.pharmgkb.org/v1/download/file/data/variants.zip) and kept only variants with at least one Level 1 or 2 PharmGKB clinical annotation. From PharmVar, we downloaded the complete database (version 4.2.4) and kept all variants. In total, we identify 1,339 unique variants from 152 genes.

Identification of pharmacogenetics variants in the MVP genotype dataset

Genotyped dataset

We selected the intersection of known pharmacogenetics variants with the catalog of SNPs in the MVP array. We identified pharmacogenetics variants by chromosome location and rsID [7].

Imputed dataset

Imputation was performed using MINIMAC. We kept only variants with imputation R2 > 0.9 within the ethnic group. We assigned rsIDs to imputed variants by intersecting variant genomic position with rsID genomic position in NCBI dbSNP (v154) using bedtools. We then identified pharmacogenetics variants by overlapping imputed variant rsIDs.

In total, we find 193 pharmacogenetics variants from 136 genes in the genotyped data set. Including the imputed variants, we expand the set to 273 variants in 148 genes. If we relax the selection criteria to include all imputed variants that satisfy imputation R2 > 0.9 in any one of the 4 HARE groups (EUR, AFR, HIS, ASN) we expand the set to 408 variants.

Allele frequency analyses

Calculation of minor allele frequencies

HARE group-specific minor allele frequencies (MAFs) were calculated for the MVP hard-called and imputed datasets using PLINK2.

Local Ancestry Inference (LAI) based allele frequencies

Briefly, we performed LAI using rfmix2. We used 3,942 reference samples for EUR, AFR and Native American (AMR) ancestry collected by the 1000 genome project and the Human Genome Diversity Project (HGDP). The reference VCF files were curated by the gnomAD team (https://gnomad.broadinstitute.org/downloads#v3-hgdp-1kg). We used local ancestry output to create separate, ancestry specific, VCF output files. Two files for the HARE AFR sample (EUR-AFR) and three files for the HARE Hispanic sample (EUR-AFR-NAT). The allele frequency extraction procedure was the same for LAI and gnomAD samples, described below.

GnomAD allele frequencies

Population-specific frequencies were extracted as follows. LAI and gnomAD (v2.1.1 Genomes only, not Exomes) frequencies were stored in the INFO fields of VCF files. AFR and HIS LAI frequencies were stored in separate files. gnomAD frequencies were stored in population-specific INFO fields (AF_nfe, AF_afr, AF_amr for non-Finnish Europeans, African/African Americans, and Latino/Admixed Americans respectively). Using bcftools 1.10, VCF files were first filtered to the relevant SNPs (bcftools view—include ’ID = @<file of rsIDs> ’ <VCF file>), and frequencies were then extracted from the relevant INFO fields (e.g., bcftools query -’%ID\t%INFO/AF_nfe\n’).

1000 genomes allele frequencies

1000 Genomes population-specific MAFs were extracted from 1000 Genomes Phase 3 VCFs.

Analysis and visualization

Visualization of MAFs, and calculation and visualization of MAF differences between MVP and 1000 Genomes, was performed using R.

HLA type predictions

4-digit HLA type predictions were generated for HLA-A and HLA-B from hard-called genotype data using HIBAG [15]. We chose the pre-fit Affymetrix Axiom UK Biobank Array 4-digit resolution model (https://hibag.s3.amazonaws.com/hlares_index.html), as the MVP genotyping array covers > 95% of this model’s training variants for both loci. We used the European model for individuals assigned to HARE group EUR and the multi-ethnic model for individuals assigned to HARE groups AFR, HIS, and ASN. Predictions were generated by calling the predict() function from HIBAG. Frequencies were calculated for each 4-digit allele by HARE group.

Supporting information

S1 Fig. Allele frequency comparisons between gnomAD and MVP HARE groups for three groups (EUR, AFR, HIS).

For all three, correlation with gnomAD allele frequencies is high (R2>0.99). In the lower right we compare allele frequencies for gnomAD HIS and Local Ancestry Inference (LAI) derived allele frequencies for the AMR track of the HARE HIS group (R2 = 0.91). We use three-way local ancestry deconvolution (EUR, AFR, AMR).

(TIF)

S2 Fig. Copy number variation in CYP2D6 using two computational approaches.

Results are shown just for the HARE AFR cohort; clusters were derived using (a) Principal Components Analysis (PCA) and (b) UMAP(13). UMAP significantly reduces assignment ambiguity.

(TIF)

S1 File. Allele frequency table for HARE groups and LAI tracks.

In addition to allele frequencies, we provide imputation quality information per site and ethnicity/LAI-track (imputation R2). In the same table we attach PharmGKB annotation per site, where available. In the table we include all sites with imputation R2 > .9 in ANY of the four HARE groups (EUR, AFR, HIS, ASN) for a total of 408 sites. Four sites are multiallelic, thus the table has 412 rows.

(CSV)

Acknowledgments

This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration. This publication does not represent the views of the Department of Veterans Affairs, the US Food and Drug Administration, or the US Government.

Million Veteran Program Full Acknowledgments

MVP Executive Leadership

  • • * Sumitra Muralidhar, Ph.D. (Director, MVP)

US Department of Veterans Affairs, 810 Vermont Avenue NW, Washington, DC 20420

  • • J. Michael Gaziano, M.D., M.P.H.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • Philip S. Tsao, Ph.D.

VA Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304

MVP Program Office

  • • Program Director—Sumitra Muralidhar, Ph.D.

US Department of Veterans Affairs, 810 Vermont Avenue NW, Washington, DC 20420

  • • Associate Director, Scientific Programs

  • • Jennifer Moser, Ph.D.

US Department of Veterans Affairs, 810 Vermont Avenue NW, Washington, DC 20420

  • • Associate Director, Cohort Management & Public Relations

  • • Jennifer E. Deen, B.S.

US Department of Veterans Affairs, 810 Vermont Avenue NW, Washington, DC 20420

MVP Operations

  • • Director of Regulatory Affairs–Lori Churby, B.S.

VA Palo Alto Health Care System, 3801 Miranda Avenue, Palo Alto, CA 94304

  • • MVP Cohort Management Director–Stacey B. Whitbourne, Ph.D.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • MVP Recruitment/Enrollment Director—Jessica V. Brewer, M.P.H.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • Director, VA Central Biorepository, Boston–Mary T. Brophy M.D., M.P.H.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • MVP Informatics, Boston–Shahpoor (Alex) Shayan, M.S.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • Director, Center for Computational and Data Sciences (C-DACS) & Genomics Core–Saiju Pyarajan Ph.D.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

  • • Director, Phenomics Data Core–Kelly Cho, M.P.H, Ph.D.

VA Boston Healthcare System, 150 S. Huntington Avenue, Boston, MA 02130

Current MVP Local Site Investigators

  • • Atlanta VA Medical Center (Peter Wilson, M.D.)

1670 Clairmont Road, Decatur, GA 30033

  • • Bay Pines VA Healthcare System (Rachel McArdle, Ph.D.)

10,000 Bay Pines Blvd Bay Pines, FL 33744

  • • Birmingham VA Medical Center (Louis Dellitalia, M.D.)

700 S. 19th Street, Birmingham AL 35233

  • • Central Western Massachusetts Healthcare System (Kristin Mattocks, Ph.D., M.P.H.)

421 North Main Street, Leeds, MA 01053

  • • Cincinnati VA Medical Center (John Harley, M.D., Ph.D.)

3200 Vine Street, Cincinnati, OH 45220

  • • Clement J. Zablocki VA Medical Center (Jeffrey Whittle, M.D., M.P.H.)

5000 West National Avenue, Milwaukee, WI 53295

  • • VA Northeast Ohio Healthcare System (Frank Jacono, M.D.)

10701 East Boulevard, Cleveland, OH 44106

  • • Durham VA Medical Center (Jean Beckham, Ph.D.)

508 Fulton Street, Durham, NC 27705

  • • Edith Nourse Rogers Memorial Veterans Hospital (John Wells., Ph.D.)

200 Springs Road, Bedford, MA 01730

  • • Edward Hines, Jr. VA Medical Center (Salvador Gutierrez, M.D.)

5000 South 5th Avenue, Hines, IL 60141

  • • Veterans Health Care System of the Ozarks (Kathrina Alexander, M.D.)

1100 North College Avenue, Fayetteville, AR 72703

  • • Fargo VA Health Care System (Kimberly Hammer, Ph.D.)

2101 N. Elm, Fargo, ND 58102

  • • VA Health Care Upstate New York (James Norton, Ph.D.)

113 Holland Avenue, Albany, NY 12208

  • • New Mexico VA Health Care System (Gerardo Villareal, M.D.)

1501 San Pedro Drive, S.E. Albuquerque, NM 87108

  • • VA Boston Healthcare System (Scott Kinlay, M.B.B.S., Ph.D.)

150 S. Huntington Avenue, Boston, MA 02130

  • • VA Western New York Healthcare System (Junzhe Xu, M.D.)

3495 Bailey Avenue, Buffalo, NY 14215–1199

  • • Ralph H. Johnson VA Medical Center (Mark Hamner, M.D.)

109 Bee Street, Mental Health Research, Charleston, SC 29401

  • • Columbia VA Health Care System (Roy Mathew, M.D.)

6439 Garners Ferry Road, Columbia, SC 29209

  • • VA North Texas Health Care System (Sujata Bhushan, M.D.)

4500 S. Lancaster Road, Dallas, TX 75216

  • • Hampton VA Medical Center (Pran Iruvanti, D.O., Ph.D.)

100 Emancipation Drive, Hampton, VA 23667

  • • Richmond VA Medical Center (Michael Godschalk, M.D.)

1201 Broad Rock Blvd., Richmond, VA 23249

  • • Iowa City VA Health Care System (Zuhair Ballas, M.D.)

601 Highway 6 West, Iowa City, IA 52246–2208

  • • Eastern Oklahoma VA Health Care System (River Smith, Ph.D.)

1011 Honor Heights Drive, Muskogee, OK 74401

  • • James A. Haley Veterans’ Hospital (Stephen Mastorides, M.D.)

13000 Bruce B. Downs Blvd, Tampa, FL 33612

  • • James H. Quillen VA Medical Center (Jonathan Moorman, M.D., Ph.D.)

Corner of Lamont & Veterans Way, Mountain Home, TN 37684

  • • John D. Dingell VA Medical Center (Saib Gappy, M.D.)

4646 John R Street, Detroit, MI 48201

  • • Louisville VA Medical Center (Jon Klein, M.D., Ph.D.)

800 Zorn Avenue, Louisville, KY 40206

  • • Manchester VA Medical Center (Nora Ratcliffe, M.D.)

718 Smyth Road, Manchester, NH 03104

  • • Miami VA Health Care System (Ana Palacio, M.D., M.P.H.)

1201 NW 16th Street, 11 GRC, Miami FL 33125

  • • Michael E. DeBakey VA Medical Center (Olaoluwa Okusaga, M.D.)

2002 Holcombe Blvd, Houston, TX 77030

  • • Minneapolis VA Health Care System (Maureen Murdoch, M.D., M.P.H.)

One Veterans Drive, Minneapolis, MN 55417

  • • N. FL/S. GA Veterans Health System (Peruvemba Sriram, M.D.)

1601 SW Archer Road, Gainesville, FL 32608

  • • Northport VA Medical Center (Shing Shing Yeh, Ph.D., M.D.)

79 Middleville Road, Northport, NY 11768

  • • Overton Brooks VA Medical Center (Neeraj Tandon, M.D.)

510 East Stoner Ave, Shreveport, LA 71101

  • • Philadelphia VA Medical Center (Darshana Jhala, M.D.)

3900 Woodland Avenue, Philadelphia, PA 19104

  • • Phoenix VA Health Care System (Samuel Aguayo, M.D.)

650 E. Indian School Road, Phoenix, AZ 85012

  • • Portland VA Medical Center (David Cohen, M.D.)

3710 SW U.S. Veterans Hospital Road, Portland, OR 97239

  • • Providence VA Medical Center (Satish Sharma, M.D.)

830 Chalkstone Avenue, Providence, RI 02908

  • • Richard Roudebush VA Medical Center (Suthat Liangpunsakul, M.D., M.P.H.)

1481 West 10th Street, Indianapolis, IN 46202

  • • Salem VA Medical Center (Kris Ann Oursler, M.D.)

1970 Roanoke Blvd, Salem, VA 24153

  • • San Francisco VA Health Care System (Mary Whooley, M.D.)

4150 Clement Street, San Francisco, CA 94121

  • • South Texas Veterans Health Care System (Sunil Ahuja, M.D.)

7400 Merton Minter Boulevard, San Antonio, TX 78229

  • • Southeast Louisiana Veterans Health Care System (Joseph Constans, Ph.D.)

2400 Canal Street, New Orleans, LA 70119

  • • Southern Arizona VA Health Care System (Paul Meyer, M.D., Ph.D.)

3601 S 6th Avenue, Tucson, AZ 85723

  • • Sioux Falls VA Health Care System (Jennifer Greco, M.D.)

2501 W 22nd Street, Sioux Falls, SD 57105

  • • St. Louis VA Health Care System (Michael Rauchman, M.D.)

915 North Grand Blvd, St. Louis, MO 63106

  • • Syracuse VA Medical Center (Richard Servatius, Ph.D.)

800 Irving Avenue, Syracuse, NY 13210

  • • VA Eastern Kansas Health Care System (Melinda Gaddy, Ph.D.)

4101 S 4th Street Trafficway, Leavenworth, KS 66048

  • • VA Greater Los Angeles Health Care System (Agnes Wallbom, M.D., M.S.)

11301 Wilshire Blvd, Los Angeles, CA 90073

  • • VA Long Beach Healthcare System (Timothy Morgan, M.D.)

5901 East 7th Street Long Beach, CA 90822

  • • VA Maine Healthcare System (Todd Stapley, D.O.)

1 VA Center, Augusta, ME 04330

  • • VA New York Harbor Healthcare System (Peter Liang, M.D., M.P.H.)

423 East 23rd Street, New York, NY 10010

  • • VA Pacific Islands Health Care System (Daryl Fujii, Ph.D.)

459 Patterson Rd, Honolulu, HI 96819

  • • VA Palo Alto Health Care System (Philip Tsao, Ph.D.)

3801 Miranda Avenue, Palo Alto, CA 94304–1290

  • • VA Pittsburgh Health Care System (Patrick Strollo, Jr., M.D.)

University Drive, Pittsburgh, PA 15240

  • • VA Puget Sound Health Care System (Edward Boyko, M.D.)

1660 S. Columbian Way, Seattle, WA 98108–1597

  • • VA Salt Lake City Health Care System (Jessica Walsh, M.D.)

500 Foothill Drive, Salt Lake City, UT 84148

  • • VA San Diego Healthcare System (Samir Gupta, M.D., M.S.C.S.)

3350 La Jolla Village Drive, San Diego, CA 92161

  • • VA Sierra Nevada Health Care System (Mostaqul Huq, Pharm.D., Ph.D.)

975 Kirman Avenue, Reno, NV 89502

  • • VA Southern Nevada Healthcare System (Joseph Fayad, M.D.)

6900 North Pecos Road, North Las Vegas, NV 89086

  • • VA Tennessee Valley Healthcare System (Adriana Hung, M.D., M.P.H.)

1310 24th Avenue, South Nashville, TN 37212

  • • Washington DC VA Medical Center (Jack Lichy, M.D., Ph.D.)

50 Irving St, Washington, D. C. 20422

  • • W.G. (Bill) Hefner VA Medical Center (Robin Hurley, M.D.)

1601 Brenner Ave, Salisbury, NC 28144

  • • White River Junction VA Medical Center (Brooks Robey, M.D.)

163 Veterans Drive, White River Junction, VT 05009

  • • William S. Middleton Memorial Veterans Hospital (Prakash Balasubramanian, M.D.)

2500 Overlook Terrace, Madison, WI 53705

Data Availability

All relevant data are within the paper and its Supporting Information files.

Funding Statement

This work was supported by grant #MVP000 to SP from the Million Veteran Program (MVP) from the Veterans Affairs (VA) Office of Research and Development (ORD) (www.research.va.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

References

  • 1.Sanghavi K, Brundage RC, Miller MB, Schladt DP, Israni AK, Guan W, et al. Genotype-guided tacrolimus dosing in African-American kidney transplant recipients. Pharmacogenomics J. 2017;17(1):61–8. doi: 10.1038/tpj.2015.87 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Zhou Y, Ingelman-Sundberg M, Lauschke VM. Worldwide Distribution of Cytochrome P450 Alleles: A Meta-analysis of Population-scale Sequencing Projects. Clin Pharmacol Ther. 2017;102(4):688–700. doi: 10.1002/cpt.690 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Lauschke VM, Zhou Y, Ingelman-Sundberg M. Novel genetic and epigenetic factors of importance for inter-individual differences in drug disposition, response and toxicity. Pharmacol Ther. 2019;197:122–52. doi: 10.1016/j.pharmthera.2019.01.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Kaseniit KE, Haque IS, Goldberg JD, Shulman LP, Muzzey D. Genetic ancestry analysis on >93,000 individuals undergoing expanded carrier screening reveals limitations of ethnicity-based medical guidelines. Genet Med. 2020;22(10):1694–702. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Ji Y, Skierka JM, Blommel JH, Moore BE, VanCuyk DL, Bruflat JK, et al. Preemptive Pharmacogenomic Testing for Precision Medicine: A Comprehensive Analysis of Five Actionable Pharmacogenomic Genes Using Next-Generation DNA Sequencing and a Customized CYP2D6 Genotyping Cascade. J Mol Diagn. 2016;18(3):438–45. doi: 10.1016/j.jmoldx.2016.01.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S, Breeling J, et al. Million Veteran Program: A mega-biobank to study genetic influences on health and disease. J Clin Epidemiol. 2016;70:214–23. doi: 10.1016/j.jclinepi.2015.09.016 [DOI] [PubMed] [Google Scholar]
  • 7.Hunter-Zinck H, Shi Y, Li M, Gorman BR, Ji SG, Sun N, et al. Genotyping Array Design and Data Quality Control in the Million Veteran Program. Am J Hum Genet. 2020;106(4):535–48. doi: 10.1016/j.ajhg.2020.03.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Fang H, Hui Q, Lynch J, Honerlaw J, Assimes TL, Huang J, et al. Harmonizing Genetic Ancestry and Self-identified Race/Ethnicity in Genome-wide Association Studies. Am J Hum Genet. 2019;105(4):763–72. doi: 10.1016/j.ajhg.2019.08.012 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Taylor C, Crosby I, Yip V, Maguire P, Pirmohamed M, Turner RM. A Review of the Important Role of CYP2D6 in Pharmacogenomics. Genes (Basel). 2020;11(11). doi: 10.3390/genes11111295 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Lolita L, Zheng M, Zhang X, Han Z, Tao J, Fei S, et al. The Genetic Polymorphism of CYP3A4 rs2242480 is Associated with Sirolimus Trough Concentrations Among Adult Renal Transplant Recipients. Curr Drug Metab. 2020;21(13):1052–9. doi: 10.2174/1389200221999201027203401 [DOI] [PubMed] [Google Scholar]
  • 11.L.L. Brunton RH-D, B.C. Knollmann. Goodman & Gilman’s: The Pharmacological Basis of Therapeutics. 13 ed. New York2018.
  • 12.Pratt VM, Cavallari LH, Del Tredici AL, Gaedigk A, Hachad H, Ji Y, et al. Recommendations for Clinical CYP2D6 Genotyping Allele Selection: A Joint Consensus Recommendation of the Association for Molecular Pathology, College of American Pathologists, Dutch Pharmacogenetics Working Group of the Royal Dutch Pharmacists Association, and the European Society for Pharmacogenomics and Personalized Therapy. J Mol Diagn. 2021;23(9):1047–64. doi: 10.1016/j.jmoldx.2021.05.013 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.McInnes L, Healy J, Melville J. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:180203426. 2018.
  • 14.Beoris M, Amos Wilson J, Garces JA, Lukowiak AA. CYP2D6 copy number distribution in the US population. Pharmacogenet Genomics. 2016;26(2):96–9. doi: 10.1097/FPC.0000000000000188 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Zheng X, Shen J, Cox C, Wakefield JC, Ehm MG, Nelson MR, et al. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenomics J. 2014;14(2):192–200. doi: 10.1038/tpj.2013.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Manson LEN, Swen JJ, Guchelaar HJ. Diagnostic Test Criteria for HLA Genotyping to Prevent Drug Hypersensitivity Reactions: A Systematic Review of Actionable HLA Recommendations in CPIC and DPWG Guidelines. Front Pharmacol. 2020;11:567048. doi: 10.3389/fphar.2020.567048 [DOI] [PMC free article] [PubMed] [Google Scholar]

Decision Letter 0

Hoh Boon-Peng

31 Oct 2022

PONE-D-22-23467Pharmacogenetic allele variant frequencies: An analysis of the VA’s Million Veteran Program (MVP) as a representation of the diversity in US population.PLOS ONE

Dear Dr. Markianos,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

 The reviewers are in favour to this work. However, several comments are raised and should be taken into consideration before it can be accepted for publication. Specifically, the underlying message of the LAI analysis was not clearly presented.

Please submit your revised manuscript by Dec 15 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Hoh Boon-Peng, PhD

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Thank you for stating the following in the Acknowledgments Section of your manuscript:

“This research is based on data from the Million Veteran Program, Office of Research and Development, Veterans Health Administration and was supported by award no. MVP000.”

We note that you have provided additional information within the Acknowledgements Section that is not currently declared in your Funding Statement. Please note that funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

“This work was supported by grant #MVP000 to SP from the Million Veteran Program (MVP) from the Veterans Affairs (VA) Office of Research and Development (ORD) (www.research.va.gov). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.”

Please include your amended statements within your cover letter; we will change the online submission form on your behalf.

3. We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

4. We note that you have stated that you will provide repository information for your data at acceptance. Should your manuscript be accepted for publication, we will hold it until you provide the relevant accession numbers or DOIs necessary to access your data. If you wish to make changes to your Data Availability statement, please describe these changes in your cover letter and we will update your Data Availability statement to reflect the information you provide.

5. One of the noted authors is a group or consortium Million Veteran Program. In addition to naming the author group, please list the individual authors and affiliations within this group in the acknowledgments section of your manuscript. Please also indicate clearly a lead author for this group along with a contact email address.

6. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: No

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors analyzed the frequencies of pharmacogenetics relevant variants in a large US population. Regarding the frequencies of these variants, it is not clear whether those not mentioned in the manuscript are publicly available?

Other issues:

What is the message that the authors try to convey by conducting the LAI analysis?

The presentation of Fig.1 is neither clear nor informative, the authors should try to present in another way or just highlight some of the genes.

Why the column ‘LAI’ of ‘EUR’ in the table of Fig.2 is identical to ‘MVP HARE’, and why some of the SNVs do not have a value in LAI? I suggest the authors to switch the X and Y axis of the Fig.2.

In line 156, the authors referred Fig.2., does it mean that the SNVs in Fig.2 belong to the ones that in the CYP2D6*5 CNV region?

In line 182, it should be ‘fewer 2-copy samples’ rather than ‘fewer diploid samples’.

In Supplementary Fig.1, it is better to show the correlation coefficient and the P-value.

Reviewer #2: Overall, this is a well written manuscript and will add important information about pharmacogene allele frequencies in US populations.

Mostly, I have minor comments.

Genes should be in italics in accordance with HGVS nomenclature.

Please make sure that all abbreviations/acronyms are written out for clarity of the reader. Many are documented below.

line 32 -I prefer the term variants over polymorphisms unless all variants that are being discussed are greater than 1% frequency.

Line 88- first use of HARE, please write out acronym.

line 139 - while the use of the rsID is acceptable, for PGx readers, please also include *allele.

line 143 - please confirm if the drug name need to be capitalized.

line 144 - while the use of the rsID is acceptable, for PGx readers, please also include *allele.

line 153 - b-blockers, either write out beta or use Greek symbol as appropriate for publisher.

line 160 - first use of PCA, please write out acronym.

lines 181-182 - remove word clearly (that is up to the reader to determine) and "a lot"; make sure sentence reads well after revision.

lines 193, 393 - prefer the term variants to SNPs

Line 194 - first use of HIBAG, please write out acronym.

Line 199 - first use of SCAR, please write out acronym.

Line 199 - first use of SJS/TEN, please write out acronyms.

lines 206-207 - Please delete "something we hope to address in the future". If needed this would be for the discussion as a future research.

line 370 - QC'd is lab lingo, please write out

Line 374 - SHAPEIT appears to be some program, is this an acronym? Is there a source that needs to be cited?

lines 375, 397 - Minimac4 appears to be some program, is this an acronym? Is there a source that needs to be cited?

Lines 416-420 - check font. I am not sure that matters for formatting for publication.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

**********

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

Decision Letter 1

Hoh Boon-Peng

6 Feb 2023

Pharmacogenetic allele variant frequencies: An analysis of the VA’s Million Veteran Program (MVP) as a representation of the diversity in US population.

PONE-D-22-23467R1

Dear Dr. Markianos,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Hoh Boon-Peng, PhD

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The authors have provided explanations to my previous questions, I don't have any further questions at this round.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

**********

Acceptance letter

Hoh Boon-Peng

15 Feb 2023

PONE-D-22-23467R1

Pharmacogenetic allele variant frequencies: An analysis of the VA’s Million Veteran Program (MVP) as a representation of the diversity in US population.

Dear Dr. Markianos:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Professor Dr Hoh Boon-Peng

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Fig. Allele frequency comparisons between gnomAD and MVP HARE groups for three groups (EUR, AFR, HIS).

    For all three, correlation with gnomAD allele frequencies is high (R2>0.99). In the lower right we compare allele frequencies for gnomAD HIS and Local Ancestry Inference (LAI) derived allele frequencies for the AMR track of the HARE HIS group (R2 = 0.91). We use three-way local ancestry deconvolution (EUR, AFR, AMR).

    (TIF)

    S2 Fig. Copy number variation in CYP2D6 using two computational approaches.

    Results are shown just for the HARE AFR cohort; clusters were derived using (a) Principal Components Analysis (PCA) and (b) UMAP(13). UMAP significantly reduces assignment ambiguity.

    (TIF)

    S1 File. Allele frequency table for HARE groups and LAI tracks.

    In addition to allele frequencies, we provide imputation quality information per site and ethnicity/LAI-track (imputation R2). In the same table we attach PharmGKB annotation per site, where available. In the table we include all sites with imputation R2 > .9 in ANY of the four HARE groups (EUR, AFR, HIS, ASN) for a total of 408 sites. Four sites are multiallelic, thus the table has 412 rows.

    (CSV)

    Attachment

    Submitted filename: Response to reviewers 2022-12-12.docx

    Data Availability Statement

    All relevant data are within the paper and its Supporting Information files.


    Articles from PLOS ONE are provided here courtesy of PLOS

    RESOURCES