Abstract
Motivation: Genetic variants in drug targets and metabolizing enzymes often have important functional implications, including altering the efficacy and toxicity of drugs. Identifying single nucleotide variants (SNVs) that contribute to differences in drug response and understanding their underlying mechanisms are fundamental to successful implementation of the precision medicine model. This work reports an effort to collect, classify and analyze SNVs that may affect the optimal response to currently approved drugs.
Results: An integrated approach was taken involving data mining across multiple information resources including databases containing drugs, drug targets, chemical structures, protein–ligand structure complexes, genetic and clinical variations as well as protein sequence alignment tools. We obtained 2640 SNVs of interest, most of which occur rarely in populations (minor allele frequency < 0.01). Clinical significance of only 9.56% of the SNVs is known in ClinVar, although 79.02% are predicted as deleterious. The examples here demonstrate that even if the mapped SNVs predicted as deleterious may not result in significant structural modifications, they can plausibly modify the protein–drug interactions, affecting selectivity and drug-binding affinity. Our analysis identifies potentially deleterious SNVs present on drug-binding residues that are relevant for further studies in the context of precision medicine.
Availability and Implementation: Data are available from Supplementary information file.
Contact: yanli.wang@nih.gov
Supplementary information: Supplementary Tables S1–S5 are available at Bioinformatics online.
1 Introduction
Rapid advances in next-generation sequencing techniques along with decreasing cost have expedited large-scale discovery of single nucleotide variants (SNVs). dbSNP human Build 146 contain >140 million common and rare SNVs that are assigned unique dbSNP reference SNV (rs) accessions, 4 128 355 of which are missense and stop-gain mutations (Sherry et al., 2001). Yet, the functional annotation for these SNVs is hugely lacking. Only about 1% non-synonymous SNVs in dbSNP database are linked with phenotypes. Hence, substantial efforts are dedicated to understand the function and clinical significance of the remaining 99% non-synonymous SNVs that may have biological impacts, especially in the light of pharmacogenomics and individual drug response.
Optimal target–drug binding is essential to achieve desired therapeutic effects of drugs, as well as to eliminate unwanted side effects and toxicity. Protein–drug interactions are governed by local biochemistry and structure of both drug molecules and drug-binding cavities of target proteins. Key amino acid residues in proteins maintain the binding cavity structure and contribute to formation of non-covalent bonds with drug molecules. Therefore, non-synonymous SNVs mutating these key drug-binding residues can affect protein–drug interactions, leading to changes in drug response. Understanding how the inter-individual genetic variabilities lead to variations in therapeutic effects of drugs is fundamental to precision medicine development, which integrates basic research and clinical practice to design novel approaches for disease treatment and prevention based on individual variability in gene, environment and lifestyle.
There are several well established examples where mutations in drug-binding sites had resulted in altered therapeutic effect or toxicity. For instance, the dosing of commonly prescribed anticoagulant warfarin is guided by variations in vitamin K epoxide reductase complex 1 (VKORC1) and cytochrome P450 2C9 (CYP2C9) genes. The mutations Arg144Cys (rs1799853) and Ile359Leu (rs1057910) on CYP2C9 lead to decrease in enzymatic activity, resulting in low clearance of warfarin. Therefore, individuals carrying these SNVs require lower starting dose of the drug (Aithal et al., 1999; Aquilante et al., 2006; Lindh et al., 2009) and are at higher risk of bleeding during therapy. Allelic forms of CYPs are responsible for disparities in metabolism of several drugs, and are important markers to determine their dosage and dose related toxicity. Another well-known example is variant rs121434569 (Thr790Met) on epidermal growth factor receptor (EGFR). EGFR is a target for non-small cell lung cancer therapy, and the residue Thr790 is an important determinant of inhibitor specificity. The Thr790Met mutation causes steric interference with inhibitor binding and leads to resistance against drugs like gefitinib and erlotinib (Denis et al., 2015; Kobayashi et al., 2005). Although Thr790Met is mostly a secondary somatic mutation, germline mutations have also been observed in low frequency (Bell et al., 2005). In addition to analyzing the effects of genetic variants on already approved therapy, in recent years, efforts are also being focused on developing new paradigms for genomic variation-driven clinical trials, especially in oncology (Heckman-Stoddard and Smith, 2014; Simon, 2016). Furthermore, understanding the effects of disease associated mutations of target proteins on their drug-binding properties will help facilitate the design of small-molecule drugs as alternate therapeutic options for genetic diseases (Sun et al., 2014).
SNVs do not always result in complete loss of function but correlate with molecular functionality, leading to changes in binding affinity and selectivity. Therefore, even if a SNV is not clinically significant in terms of pathogenicity, it may be important in the context of drug response. Prioritizing SNVs that potentially interfere with preferred therapeutic effects of clinically approved drugs by changing the protein–drug interaction patterns will have significant applications in biomedicine. Of particular relevance are SNVs that affect the functionality of proteins interacting with multiple drugs. Also of clinical interest are variants where either the altered therapeutic effects can be circumvented by changing the drug dosage or where there are therapeutic alternatives available.
The linking of drugs to target protein structures and SNVs require integration of several information resources, which is particularly facilitated by the open access of human variation data and chemical biology information resources as well as the structural genomics initiative. This work focuses on survey and analysis of SNVs present in drug-binding cavities of validated drug targets and metabolizing enzymes via data mining across genetic, structure biology, chemical biology, cheminformatics and drug information resources. Furthermore, the data mining and integration initiative undertaken in this work contributes towards the Precision Medicine Initiative® Cohort Program at the National Institutes of Health (NIH), the fundamental to which is developing a framework to integrate multiple research disciplines and information sources for efficient data sharing to ultimately bring precision medicine into clinical practice.
2 Materials and methods
Mapping of SNVs on protein drug-binding cavities, in order to understand their possible functional implications, requires integration of data from various sources. Information resources including DrugBank (http://www.drugbank.ca/) (Law et al., 2014), HGNC (http://www.genenames.org/) (Gray et al., 2015), PubChem Compound (https://www.ncbi.nlm.nih.gov/pccompound) (Kim et al., 2016), NCBI Gene (http://www.ncbi.nlm.nih.gov/gene/), NCBI Protein (http://www.ncbi.nlm.nih.gov/protein), MMDB (http://www.ncbi.nlm.nih.gov/structure/) (Madej et al., 2012), PDB (http://www.rcsb.org/pdb/home/home.do), dbSNP (http://www.ncbi.nlm.nih.gov/snp) (Sherry et al., 2001), ClinVar (http://www.ncbi.nlm.nih.gov/clinvar/) (Landrum et al., 2016) and dbNSFP (https://sites.google.com/site/jpopgen/dbNSFP) (Liu et al., 2016) were utilized for this analysis. Data from DrugBank and HGNC were downloaded via the web interface. The interconnected databases maintained at National Center for Biotechnology Information (NCBI) were accessed through E-utilities (https://www.ncbi.nlm.nih.gov/books/NBK25501/), which are public API to the NCBI Entrez system. The companion Entrez Direct package, a collection of argument-driven functions, was used to directly call E-utilities from the command line interface. SNVs and related information are extracted by querying the local SQL copy of dbSNP database. Figure 1 presents the schematic of the data mining and integration process performed in this work. The integrated data used in this work are available from supplementary information (Supplementary Tables S1a, S1b, S2a and S2b).
Fig. 1.
The schematic for mining and integration of various data sources to map SNVs on protein–drug binding residues
2.1 Datasets
Lists of clinically approved small molecule drugs, their protein targets and metabolizing enzymes were downloaded from DrugBank database (version 4.3, accessed on December 7, 2015). Compounds and proteins in the lists were then represented by their respective PubChem Compound Identifier (CID) and NCBI Gene ID to allow easy data retrieval and cross-referencing across different databases.
2.1.1 Small molecule drugs
There are 2035 clinically approved small molecule drugs present in DrugBank, represented by their unique DrugBank IDs. Of these compounds, 1181 have PubChem CIDs listed in DrugBank. Using the compound name, DrugBank ID, CAS Registry number or InChI Key as query to the PubChem database via the Entrez system, CIDs for 574 of remaining 854 compounds were extracted. The final set of clinically approved small molecule drugs consists of 1743 unique PubChem compounds that represent 1755 drugs present in DrugBank. Some of the approved drugs listed in DrugBank are chemical substances that are commonly found in complex with protein structures, for example Adenosine triphosphate, Pyridoxamine-5ʹ-Phosphate, etc. These common chemical substances, as well as nutraceuticals like vitamins, were eliminated from this study.
2.1.2 Protein targets and drug metabolizing enzymes
DrugBank contains 1430 approved drug target proteins and 214 approved drug metabolizing enzymes, represented by their UniProt Protein IDs. For most of these proteins (1247 targets and 189 enzymes), the HUGO Gene Nomenclature Committee (HGNC) IDs are also listed. HGNC assigns standardized nomenclature to human genes. Using HGNC IDs or gene symbols as query to the HGNC database, the corresponding NCBI Gene IDs were extracted. The final target and enzyme datasets consist of 1418 and 209 unique genes, respectively. Different protein isoforms encoded by a single gene are all analyzed together under the single gene name.
Even though there are 96 proteins that overlap between target and enzyme sets, we considered the approved drug targets and metabolizing enzymes as two different datasets. This is because of the fact that the overlapping proteins mostly interact with different drugs depending on their functions as drug targets or as drug metabolizing enzymes. Consequently, the same protein might have different protein–drug interactions and might be affected differently by genetic variations.
2.1.3 Protein–drug complexes
The objective of this work is to analyze the SNVs present in drug-binding cavities of proteins. Therefore, to accurately determine the drug-binding residues, we considered only protein–drug pairs with available 3D structures of the complexes. The 3D structures were obtained from NCBI Structure database MMDB. 248 structures are available for 122 unique protein–drug complexes in target set, consisting of 80 target proteins and 103 interacting drugs. In enzyme set, 20 structures are available for only 17 unique protein–drug pairs of 10 enzymes and 16 drugs. Protein residues within 4 Å of drug molecules were defined as the drug-binding sites.
2.2 Mapping SNVs on protein structures
SNV data were obtained from dbSNP build 146. Only non-synonymous SNVs (missense, stop-gain and stop-loss) were taken into account. 18 476 unique non-synonymous reference SNVs are found to be mapped on the proteins in the datasets.
dbSNP provides the positions of SNVs on reference protein sequences. However, protein sequences obtained from PDB structures are not always identical with the reference sequences. Therefore, respective protein sequences were first aligned with reference sequences using MUSCLE (version 3.8.31) (Edgar, 2004). All SNVs present on reference sequences were then mapped onto the equivalent residues on protein structures. Along with the drug-binding residues, presence of SNVs on residues in their close proximity were also analyzed. For this purpose, both sequence proximity (five residues upstream and downstream of binding residue) and structural proximity (within 4 Å distance of binding site) were considered.
2.3 SNV annotations
Frequency and clinical significance data for the mapped SNVs, if available, were extracted from dbSNP. Allele frequencies of the SNVs were obtained from dbSNP, which integrates frequency data from 1000 Genomes Project (http://www.1000genomes.org/) (1000 Genomes Project Consortium, 2015), ExAC (http://exac.broadinstitute.org), GO-ESP (https://esp.gs.washington.edu/drupal/) as well as other populations. The clinical significance data in dbSNP are obtained from ClinVar, which records evidences from both experimental and clinical tests. In addition, to analyze the level of conservation and deleteriousness, Genomic Evolutionary Rate Profiling (GERP ++) scores (Davydov et al., 2010) and Combined Annotation Dependent Depletion (CADD) scores (Kircher et al., 2014) were extracted from dbNSFP database (version 31a). The GERP ++ used a maximum likelihood estimation procedure to quantify the evolutionary rate or constraint intensity of a chromosomal position in terms of rejected substitutions, which is the difference between neutral and observed rates of substitution (Davydov et al., 2010). Consequently, a positive GERP ++ score represents a lower rate of substitution, and indicates that the nucleotide residue is under functional constraint. The CADD score measures deleteriousness, a property of SNVs that correlates with molecular functionality and pathogenicity (Kircher et al., 2014). The scoring method integrates several diverse annotations into a single measure score for each variant. Scaled CADD score, used in this work, is derived from the raw score and gives an approximation of the rank of a SNV in the complete reference genome. A scaled score of 20, for example, refers to the top 1% of all reference genome SNVs. Amino acid variations were also scored according to the PAM10 substitution matrix, which represents the likelihood of an amino acid being replaced by another during a specified evolutionary interval. PAM10 assumes 10 mutations per 100 amino acids and is used for very similar sequences, as in this case to score mutated residues within a sequence.
3 Results
3.1 SNVs on drug-binding sites
3.1.1 Total number of SNVs mapped
In total, there are 10 321 unique SNVs mapped onto the protein–ligand structure complexes analyzed in this work, 2664 (25.81%) of which are present on drug-binding sites and proximal residues. The number of unique SNVs mapped onto 80 drug targets and 10 drug metabolizing enzymes in the datasets are 9057 and 2011, respectively. As four proteins act as both targets and enzymes, 747 SNVs overlap between the two sets. The target proteins contain 441 SNVs on drug-binding residues (Supplementary Table S3a) and 1948 SNVs on the neighboring residues, totaling to 2389 SNVs of interest. The drug–enzyme complexes contain 412 unique SNVs in the catalytic cavities (Supplementary Table S3b), 83 of which are mapped directly on the binding residues and 329 are present in immediate neighborhood. Among the 2664 SNVs of interest, 24 SNVs are flagged in dbSNP as potential false positives and hence were eliminated from the analysis. Finally, 2640 SNVs were analyzed (Supplementary Table S4), of which 2541 are missense and the remaining 99 are stop-gain SNVs.
3.1.2 Percentages of drug-binding residues containing SNVs
The frequency of protein–drug binding site and SNV mapping is shown in Figure 2. Very few drug targets considered in this work have SNVs mapped on >50% of the binding sites. Overall, 78.3% of the target proteins have SNVs mapped on 35% or less of their binding residues (Fig. 2A), while the most populated bin, where 15–20% of the binding residues have SNVs mapped on them, consists of 16.6% of the target protein sequences in the dataset. In contrast, proteins in the enzyme set have higher percentages of binding residues with mapped SNVs. 53.84% of the drug metabolizing enzymes have SNVs present on 35–45% of their binding residues (Fig. 2B). A similar trend in SNV mapping is also observed for residues in sequence and structural neighborhood of binding sites as well as for complete protein sequences.
Fig. 2.

Percentages of residues with SNV mapped on (A) drug target and (B) drug metabolizing enzyme sets. SNVs mapped onto drug-binding residues (blue) are analyzed in comparison with proximal structure (orange and proximal sequence (green) residues and complete protein sequences (grey)
We also analyzed whether there is any significant difference between number of SNVs mapped on drug-binding residues and complete protein sequences. After applying the Bonferroni correction to reduce family wise error rate for multiple comparisons, the P-value for individual protein is set at 5.813e-4 (familywise error rate = 0.05 and number of proteins = 86). Except for in serine/threonine-protein kinase B-raf (BRAF), where SNVs are over-represented in drug-binding residues, no significant difference is observed.
The net loss/gain of amino acids in the drug-binding cavities upon mutation affect the cavity micro-environment, thereby ligand binding. Overall, Arg is the most frequently lost amino acid upon mutation. This can be attributed to the fact that Arg has six codons, four of which contain the CpG dinucleotides, and thus mutates at a higher rate (Walser and Furano, 2010). This is also observed for disease-associated variants (de Beer et al., 2013). Tyr is significantly over-represented in binding residues when compared with the complete protein sequences, and hence shows high frequency of mutation. Aromatic Phe also shows high frequency of mutation. Along with stop-gain mutations, the most frequently gained amino acids are Cys, Ser and Gln.
3.2 SNV annotations
3.2.1 Allele frequency
The global minor allele frequency (MAF) data, extracted from dbSNP, are available for 2089 (79.13%) mapped SNVs (blue bubbles in Fig. 3). The population specific allele frequency of the SNVs is provided in supplementary information (Supplementary Table S5). 2077 (78.67%) of these SNVs are rare variants (MAF < 0.01), suggesting that non-synonymous SNVs in and around drug-binding sites are under strong purifying selection pressure. 2296 (86.97%) SNVs are mapped onto sites under evolutionary constraints, i.e. the mutating nucleotide residues were predicted to have positive GERP ++ scores (Fig. 4) (Davydov et al., 2010). Among the SNVs with known frequency, only three are commonly observed in populations (MAF > 0.05)—rs1695, rs1138272 and rs1058172. SNV rs1695 is positioned on drug-binding residue of glutathione S-transferase P1 (GSTP1), and the remaining SNVs are in close proximity of binding sites in cytochrome P450 2D6 (CYP2D6).
Fig. 3.

Available global minor allele frequency (MAF) and ClinVar clinical significance data for SNVs mapped on (A) drug-binding residues and (B) proximal residues of drug targets and drug metabolizing enzymes. SNVs with no available frequency data are marked as zero for LOG10(MAF). The size of the bubble indicate the relative number of SNVs in each category for MAF and ClinVar clinical significance (Pathogenic, Drug response, Benign and Other)
Fig. 4.

Scaled GERP ++ scores of SNVs mapped on (A) drug-binding residues and (B) proximal residues. SNVs with positive GERP ++ scores are at chromosomal positions that are under evolutionary constraints
3.2.2 Clinical significance and functional implications
Experimental evidences show that 198 of the mapped SNVs are pathogenic (red bubbles) and 10 SNVs affect drug response (yellow bubbles) (Fig. 3). It is observed that SNVs with undetermined MAF are over-represented among those that are annotated as pathogenic or drug-response variant (Fig. 3). This might be due to the fact that most such SNVs are associated with complex diseases like cancer, and hence are rare in populations. Additionally, it might be possible that these drug response and pathogenic mutations are for rare Mendelian diseases with very high penetrance. Therefore, such SNVs could only be found in affected individuals and not in the general population (Saint Pierre and Génin, 2014) and are underrepresented in the 1000 Genomes and ExAC projects (Saint Pierre and Génin, 2014; Yali et al., 2012). Altogether, clinical significance of only 9.56% of the SNVs is known in ClinVar (Landrum et al., 2016).
To estimate the deleteriousness of SNVs, the CADD scores were considered (Kircher et al., 2014). 79.02% of the SNVs are predicted to be deleterious (CADD score > 15) (Fig. 5), and 73.26% of the SNVs fall within the top 1% deleterious SNVs in human reference genome (CADD score > 20). In general, the higher the CADD score, the more deleterious a SNV is. Even though majority of the SNVs have comparatively higher PAM10 score indicating that the mutations are relatively likely to be tolerated during evolution, these SNVs are mostly predicted as deleterious. No correlation is observed between the likelihood of an amino acid mutation (PAM10 score) and its deleteriousness (CADD score). In total, 1770 (67.05%) SNVs with unknown ClinVar clinical significance are predicted as deleterious and are mapped on residues under evolutionary constraint. About 58.1% of the mapped SNVs are on highly conserved chromosomal positions (GERP ++ score > 4) and are among the 1% most deleterious SNVs in the genome (CADD score > 20).
Fig. 5.

Scaled CADD scores of SNVs mapped on (A) drug-binding residues and (B) proximal residues. CADD score gives a measure of deleteriousness. SNVs with CADD score > 15 are predicted as deleterious
3.3 Mapped SNVs identified in genome-wide association studies (GWAS)
GWAS reveal the relationships between variants and complex traits, an information that is fundamental to the development of therapeutic strategies tailored towards the genetic composition of a specific population as well as that of an individual. Therefore, identification of GWAS variants present on binding sites of validated drug targets can plausibly facilitate the development of precision medicine.
Only two of the analyzed SNVs, rs671 and rs1058172, are present in NHGRI GWAS catalog (Welter et al., 2014). The mutation rs671 (Glu504Lys) is present on drug-binding residue of mitochondrial aldehyde dehydrogenase 2 (ALDH2), which is the second enzyme of the major oxidative pathway for alcohol metabolism. Interestingly, the drug-binding cavity of ALDH2 otherwise have extremely low number of SNVs mapped. Despite of being rare in other populations, MAF of rs671 in East Asian population is 0.266. It is associated with the phenotypic loss of ALDH2 function in both heterozygous and homozygous individuals, resulting in adverse response to alcohol consumption. The risk-allele A is associated with susceptibility to alcohol-related esophageal cancer (Cui et al., 2009) and coronary heart disease (Takeuchi et al., 2012). Further, the SNV is associated with response to alcohol consumption, renal function-related traits, different hematological and biochemical traits and body mass index (Kamatani et al., 2010; Okada et al., 2012; Quillen et al., 2014; Wen et al., 2014). The other SNV rs1058172, on cytochrome P450 2D6 (CYP2D6), is present five residues upstream of a drug-binding residue. It is a triallelic SNV that changes amino acid Arg365 (allele G) to either His (allele A) or Pro (allele C). Except in East Asian and African populations, allele A occurs frequently in other subpopulations (global MAF is 0.12). GWAS show that the SNV is associated with response to selective serotonin reuptake inhibitors in major depressive disorder (Ji et al., 2014), where it alters the plasma drug level by modifying the biotransformation rate. Both ALDH2 and CYP2D6 are phase I drug metabolizing enzymes.
3.4 Case studies—SNVs mapped on the drug-binding residues of example proteins
The functional implications of SNVs on drug-binding residues are analyzed for selected proteins, angiotensin II receptor type 1 (AGTR1), butyrylcholinesterase (BCHE) and GSTP1. This analysis attempts to provide insights into the effects of SNVs on drug–target interaction as well as other changes including drug response upon mutations. In particular, we focused on specific residues and structural components of these proteins that are essential for optimal ligand selectivity and binding.
3.4.1 AGTR1
AGTR1 is a G-protein coupled receptor class A family protein that mediates the major cardiovascular effects of angiotensin II and is an important effector controlling blood pressure and cardiovascular volume. Further, it is associated with the condition of renal tubular dysgenesis. Recently, AGTR1 is also found to be over-expressed in certain breast cancers, and is proposed as a therapeutic target for ER-positive and ERBB2-negative breast cancers (Ateeq et al., 2009) in the realm of personalized medicine. There are several approved AGTR1 antagonists that are used in treatment of hypertension, diabetic neuropathy and congestive heart failure. Here we analyzed the binding of AGTR1 with small molecule inhibitor olmesartan (CID: 158781) (PDB ID: 4ZUD) (Zhang et al., 2015). There are 12 unique SNVs directly mapped onto 69% of the olmesartan binding residues in AGTR1.
Nine rare and deleterious SNVs—rs748117430 (Asp74Ala), rs747975618 (Phe77Cys), rs398122935 (Trp84Ter), rs774646145 (Ala85Thr), rs745868057 (Ala85Asp), rs750724789 (Thr287Ile), rs747780318 (Cys289Tyr), rs749234826 (Tyr292Cys) and rs374541305 (Leu297Pro)—are present on AGTR1 transmembrane domains 2 (TM2) and 7 (TM7). Interaction between these two domains is responsible for receptor functional selectivity and activation (Balakumar et al., 2014). In particular, the Phe77Cys and Tyr292Cys mutations affect residues that are not only critical for the transmembrane domain interaction but are also involved in binding with olmesartan (Fig. 6). The mutations disrupt hydrophobic interactions with substrate as well as with residues in other transmembrane domains. These SNVs might also produce secondary effects on evolutionary coupled residues, like Glu81, Tyr82 and Pro85 in TM2 and Leu112 in TM3, thereby affecting the protein stability. Stop-gain SNV mapped on Trp84 is pathogenic (ClinVar Variant ID: 50206) and was first identified in a stillborn North African girl born of consanguineous parents with renal tubular dysgenesis (Gribouval et al., 2012).
Fig. 6.

AGTR1 in complex with inhibitor olmesartan. The residues involved in forming the catalytic channel are colored green. Phe77 and Tyr292 (orange) are critical for the interaction between TM2 and TM7. The key residues, Trp84 and Arg167, involved in ligand binding are highlighted in red. The hydrogen bonds between Arg167 and olmesartan are also shown (cyan)
Two SNVs, rs768866306 (Arg167Ter) and rs200184769 (Arg167Gln), are mapped on residue Arg167, an important binding determinant that is involved in the formation of three hydrogen bonds with olmesartan (Fig. 6). The stop-gain mutation is evidently deleterious. Further, the Arg167Gln mutation disrupts all hydrogen bonds with substrate and results in loss of AGTR1 binding activity. It is in fact observed that Arg167 mutants, except for Arg167Lys, do not display any binding activity (Yan et al., 2010).
Other SNVs present on drug-binding residues of AGTR1—rs775810028 (Try87Cys), rs373362261 (Val108Ile), rs753570924 (Ser109Arg), rs201151143 (Leu112Arg), rs758763207 (Ile288Val) and rs780860717 (Ile288Met)—were all predicted to be deleterious except rs373362261, where the Val108Ile mutation results in additional hydrophobic interactions with Phe77 and yet maintains the binding affinity of the enzyme by retaining the original interactions with olmesartan. Similarly, Ile288Met maintains the native hydrophobic interactions with the substrate, but Ile288Val mutation disrupts interactions and destabilizes substrate binding.
3.4.2 BCHE
BCHE is a non-specific cholinesterase enzyme that hydrolyzes different esters of choline and is currently believed to be involved in the development of nervous system (Darvesh et al., 2003). BCHE expression and biochemical properties are also found to be altered in neurodegenerative diseases like Alzheimer’s (Darvesh et al., 2003), and some BCHE inhibitors are currently used as therapeutic agents for its treatment. We analyzed the SNVs present on choline (CID: 305) binding residues of BCHE (PDB ID: 1P0M) (Nicolet et al., 2003). Nine unique SNVs are mapped on about 86% of the choline binding residues. All SNVs mapped on BCHE catalytic cavity are rare and are predicted to be deleterious.
Two critical SNVs in BCHE active site are rs370077923 (Ser226Gly) and rs775935293 (His466Arg), which affect two residues in the highly conserved and evolutionary coupled catalytic triad of residues Ser226, His466 and Glu353 (Darvesh et al., 2003) (Fig. 7). The SNVs disrupt the hydrogen bond Ser226-OH…NE-His466 that is essential for the catalytic functionality of BCHE. In addition, they also alter the ability of the catalytic triad residues to involve in hydrogen bonds and hydrophobic interactions with substrates, thereby affecting enzymatic activity and substrate binding affinity of BCHE. The Ser226Gly mutation results in silent phenotype of BCHE that is marked by complete loss of activity (Primo-Parmo et al., 1996).
Fig. 7.

BCHE in complex with inhibitor choline. Residues in the catalytic grove are colored green. The catalytic triad Ser226, His466 and Glu353 and the cation site Trp110 are highlighted in red. The hydrogen bonds are shown with cyan lines
Other SNVs mapped on BCHE residues that are critical for either catalysis or maintenance of active site structure are rs764097445 (Trp110Arg), rs201820739 (Gly143Asp) and rs121918558 (Tyr156Cys). Mutation of the cationic site Trp110 to Arg results in loss of the π–cationic interaction that is crucial for substrate binding and activation. Aromatic residues Tyr156 and Glu225 are also part of this well-defined cation site along with His466 (one of the key residues in the catalytic triad). Molecular dynamics studies have shown that stable water-bridges, which are formed between residues Gly143…Tyr156, Gly143…Glu225 and Glu225…His466, have critical structural roles in BCHE active site (Suárez and Field, 2005). The Gly143Asp mutation disrupts these water bridges and is highly deleterious (CADD score 28.1). Similarly, the loss of aromatic side chain due to Tyr156Cys mutation disrupts binding and is known to be clinically pathogenic (ClinVar Variant ID: 13227) (Hidaka et al., 1997). Although no SNV is mapped on Glu225, the residue is evolutionary coupled with other residues, Gly143, Try156 and Gly467, with SNVs mapped on them.
The SNVs rs1799807 (Asp98Gly) and rs201120931 (Tyr360Asp) mutate residues that are located at the rim of the catalytic gorge and help in guiding positively charged substrates to the active site (Fig. 7). Therefore, these mutations might affect binding affinity for such substrates. Asp98Gly mutation gives rise to the atypical and perhaps deleterious variant that shows 100-fold decrease in the binding affinity of BCHE for succinylcholine (McGuire et al., 1989).
3.4.3 GSTP1
GSTP1 is an enzyme that catalyzes phase II metabolism of drugs. It plays an important role in detoxification by catalyzing the conjugation of reduced glutathione to hydrophobic electrophilic drugs, resulting in less toxic and more water-soluble conjugates that can be easily excreted. The protein is found to be over-expressed in a number of cancers and the polymorphisms are associated with varying disease susceptibilities (Beer et al., 2002; Laborde, 2010; Lee et al., 2005; Welfare et al., 1999). It is also found to be an important factor in developing drug-resistance and therapy-related leukemia in cancer patients (Townsend and Tew, 2003). GSTP1 has three variants resulting due to common mutations rs1695 (Ile104Val) and rs1138272 (Ala113Val) in the electrophile-binding site (Ali-Osman et al., 1997). Here we analyzed SNVs mapped on residues binding unconjugated chlorambucil (CID: 2708) (PDB ID: 3CSJ) (Parker et al., 2008) (Fig. 8). Four unique SNVs are mapped on 44% of residues binding chlorambucil.
Fig. 8.

GSTP1 in complex with chlorambucil. Residues in catalytic cavity are colored green. Tyr108 (red) is critical for catalytic mechanism and lies in the hydrophobic grove of Ile104 and Ile107 (pink). Ser65 (yellow) is a part of the glutathione binding site. The hydrogen bonds are shown with cyan lines
Two rare and deleterious SNVs, rs772897689 (Tyr7Cys) and rs560001291 (Arg13Cys), are mapped on evolutionary constrained drug-binding cavity of GSTP1. Mutation of Tyr7 to Cys disrupts the hydrogen bond with chlorambucil, as well as the π–π interactions of two other drug-binding residues Tyr108 and Phe8. The Arg13Cys mutation disrupts not only the hydrophobic interactions with unconjugated chlorambucil, but also the hydrogen bond and hydrophobic interaction with Ser65, which binds to the glutathione moiety. Arg13 is found to be evolutionary coupled with both residues 7 and 65.
Tyr108 is critical for the catalytic mechanism and plays a multifunctional role depending on substrate (Lo et al., 1997) (Fig. 8). Although no SNV is mapped on Tyr108 itself, it is in close proximity of several other SNVs, including rs1695 that results in the GSTP1*B variant. In this context, deleterious SNV rs199833944 (Ile107Phe) is significant, and might affect the auxiliary functional role of Ile107 in substrate binding.
4 Discussion
This work presents a survey and analysis of SNVs mapped on drug-binding cavities of 86 unique proteins, which are in complex with clinically approved small molecule drugs. In total, we have identified 2640 SNVs of relevance. In addition to SNVs mapped on protein residues that are directly involved in ligand-binding, we have also considered SNVs mutating residues that are in close proximity of substrate-binding cavities, which might have ancillary structural and functional roles.
Except BRAF, the drug-binding sites of any other proteins analyzed in the work do not show any significant difference for the number of SNVs mapped. While in comparison a higher number of SNVs are mapped on the metabolizing enzymes, this can be due to the fact that the enzyme set consists of fewer proteins than the target set. Moreover, six of the 10 proteins in enzyme set are cytochrome P450s, a class of proteins that are known for being highly polymorphic.
Although SNVs do not show any significant under-representation on drug-binding residues of proteins, the mapped SNVs are under strong purifying selection. This is demonstrated by the fact that the chromosomal positions of these variants mostly have positive GERP ++ scores and are under evolutionary constraint. SNVs in and around drug-binding cavities are mainly rare in populations and are mostly predicted to be deleterious (CADD score > 15). It is also shown that even when a mutation is likely to be tolerated during evolution (higher PAM10 score), depending on its position in the binding cavity, the SNV can still have a deleterious effect on the protein (CADD score > 15).
The GERP ++ and CADD scores, which are considered in this work to prioritize SNVs that plausibly affect protein–drug interactions, do not give a direct measure of pathogenicity. The GERP ++ score identifies SNVs mutating residues that are under strong evolutionary constraint. However, a drawback of the GERP ++ scores is that an over-estimation of neutral rate of substitution, a function of sequence alignment quality and genomic region, may lead to over-prediction of evolutionary constraint (Davydov et al., 2010). As a result, clinically benign SNVs might have positive GERP ++ scores. In our study, three clinically benign SNVs with positive GERP ++ scores are identified. On the other hand, the CADD score is a metric of deleteriousness, which can be systematically measured across the genome assembly and correlates with allelic diversity, functionality, pathogenicity, disease severity etc. (Kircher et al., 2014). Yet, unlike metrics of pathogenicity, CADD score is not subject to major ascertainment biases or limited to small sets of genetically well-characterized mutations (Kircher et al., 2014). It must be noted that being deleterious does not equate to being clinically pathogenic, but the opposite is true. Of the 198 SNVs classified as pathogenic in ClinVar, only two are predicted as non-deleterious and yet have CADD scores > 11. The study further provides a validation for CADD, the tool to score deleteriousness of SNVs. In absence of functional annotation data for most SNVs, such tools are useful to identify and highlight SNVs that might generate clinical effects.
Only two SNVs that are relevant in the context of drug-binding are identified in GWAS. This negligible overlap between the variants identified in GWAS and those mapped on drug-binding pockets is not an unexpected observation. GWAS chips are mostly designed to analyze the disease association of common SNVs (typically MAF > 1%), whereas most variants mapped on drug-binding cavities are rare in populations. Moreover, only a very small fraction of the drug targets has been directly detected in GWAS so far (Cao and Moult, 2014).
It is interesting to observe that the pattern of amino acid change varies when the complete set of SNVs mapped on binding cavities is compared with SNVs that have high CADD (> 15) and GERP ++ (> 4) scores, a subset of the former. For example, contrary to general trend, a net gain of amino acids Pro and Phe is observed in case of SNVs predicted as deleterious. Both Pro and Phe might alter the non-covalent binding with ligands. Moreover, Pro can also induce local structural changes and alter the flexibility of the binding cavity. This difference in net loss/gain pattern can also be utilized to indicate SNVs of interest.
Although the significance of some of the SNVs mapped on drug-binding residues is well-studied, for most, their functional consequences remain unknown. This is particularly true for SNVs mapped on proximal residues that might have auxiliary roles. Most SNVs do not cause significant changes in protein structures. Possible variations in binding affinity are predominantly brought about by deviations from native interaction patterns of mutating residues with ligands as well as with other neighboring residues. Nevertheless, it must be noted that not all SNVs mapped on drug-binding residues alter clinical responses. Besides conservative SNVs, minor structural rearrangements within a protein to accommodate mutations or presence of unidentified compensatory mutations elsewhere maintain native protein interactions. Further, effects of SNVs are drug specific and the functional influence of any SNV depends on the role of the particular residue in selectivity and binding of a particular drug.
Most SNVs present in the drug-binding cavities occur rarely in populations. This poses a challenge to utilize them in precision medicine context as population specific data to perform reliable statistical analysis is sparse. It might be possible to generate more information by analyzing the co-occurrence of rare SNVs. Even if only three common SNVs were identified in this study, they might have more widespread effects. Two of the common SNVs were found to be mapped on the important drug metabolizing enzyme CYP2D6, for which the functional consequences are mostly not studied in detail. As CYP2D6 is highly polymorphic, a study like this helps in better prioritizing of SNVs for further analysis.
Even if only a very small fraction of human variants in dbSNP database is included in this study, it contributes towards the annotation of dbSNP with information specific to drug-target binding, hence providing insight for personalized medicine. Additional annotations may be obtained by considering the functional, protein–protein and protein–nucleic acid interaction sites, and the structural features of proteins (like disordered regions). Incorporating such data may help in prioritizing SNVs that most likely generate clinically relevant phenotypes. Likewise, these results may help in selecting SNVs, classified as unknown or other in ClinVar, to be analyzed further for detailed annotation and reinterpretation of their clinical significance. 43 such SNVs are mapped on drug-binding cavities that are predicted as deleterious by CADD. Further studies can be designed to better understand the functional consequences and underlying mechanisms of such SNVs, and if applicable to bring them into clinical practice. The drug target proteins presented here have several SNVs mapped on functionally critical residues, yet their clinical significance is largely unknown till date. Early knowledge of SNVs that likely alter drug response can be critical in designing drug-development strategies and successful clinical trials.
One key element of this work was data mining and proper cross-referencing among various data resources focusing on several scientific disciplines, each of which has its own complexity and distinct domain knowledge. For instance, drug molecules have to be retrieved from chemical biology resources, and then mapped unambiguously to the ligands in protein structure complex by chemical structure search. As another example, numbering of residues in protein structures is often not identical with the reference sequences. Therefore, careful sequence alignment has to be performed to allow the mapping of SNVs onto the sequences of drug targets. To a large extent, the cross-linked and inter-connected resources maintained at NCBI enabled this initiative. The Entrez system at NCBI was used for easy data access and retrieval, and greatly facilitated the integration of information from different scientific majors. Particularly crucial for this work was the dbSNP database that, along with other annotations for variations, provides pre-compiled clinical significance and frequency data from several sources, thereby minimizing the data mining and cross-indexing efforts. This work highlights the importance of integrative data mining from multiple resources to derive biologically and clinically relevant conclusions from big data, which can then be applied in the precision medicine context. It contributes towards the development of a platform to incorporate multiple research disciplines and information sources for efficient data sharing and mining. This work, which attempts to provide insight into drug efficacy and patient's response by incorporating drug binding and population specific SNV data, in long run, will support the Precision Medicine Initiative® Cohort Program, which will aid towards developing ways for disease risk assessment, identification of new therapeutic targets, advancement in pharmacogenomics, etc., ultimately laying the scientific foundation to bring precision medicine into clinical practice for many diseases.
Funding
This work has been supported by the Intramural Research Program of the National Library of Medicine, National Institutes of Health.
Conflict of Interest: none declared.
Supplementary Material
References
- 1000 Genomes Project Consortium (2015) A global reference for human genetic variation. Nature, 526, 68–74. (2015) [DOI] [PMC free article] [PubMed] [Google Scholar]
- Aithal G.P. et al. (1999) Association of polymorphisms in the cytochrome P450 CYP2C9 with warfarin dose requirement and risk of bleeding complications. Lancet, 353, 717–719. [DOI] [PubMed] [Google Scholar]
- Ali-Osman F. et al. (1997) Molecular cloning, characterization, and expression in Escherichia coli of full-length cDNAs of three human glutathione S-transferase Pi gene variants. Evidence for differential catalytic activity of the encoded proteins. J. Biol. Chem., 272, 10004–10012. [DOI] [PubMed] [Google Scholar]
- Aquilante C.L. et al. (2006) Influence of coagulation factor, vitamin K epoxide reductase complex subunit 1, and cytochrome P450 2C9 gene polymorphisms on warfarin dose requirements. Clin. Pharmacol. Ther., 79, 291–302. [DOI] [PubMed] [Google Scholar]
- Ateeq B. et al. (2009) AGTR1 as a therapeutic target in ER-positive and ERBB2-negative breast cancer cases. Cell Cycle, 8, 3794–3795. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Balakumar P., Jagadeesh G. (2014) Structural determinants for binding, activation, and functional selectivity of the angiotensin AT1 receptor. J. Mol. Endocrinol., 53, R71–R92. [DOI] [PubMed] [Google Scholar]
- Bell D.W. et al. (2005) Inherited susceptibility to lung cancer may be associated with the T790M drug resistance mutation in EGFR. Nat. Genet., 37, 1315–1316. [DOI] [PubMed] [Google Scholar]
- Beer T.M. et al. (2002) Polymorphisms of GSTP1 and related genes and prostate cancer risk. Prostate Cancer Prostatic Dis., 5, 22–27. [DOI] [PubMed] [Google Scholar]
- Cao C., Moult J. (2014) GWAS and drug targets. BMC Genomics, 15, S5. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cui R. et al. (2009) Functional variants in ADH1B and ALDH2 coupled with alcohol and smoking synergistically enhance esophageal cancer risk. Gastroenterology, 137, 1768–1775. [DOI] [PubMed] [Google Scholar]
- Darvesh S. et al. (2003) Neurobiology of butyrylcholinesterase. Nat. Rev. Neurosci., 4, 131–138. [DOI] [PubMed] [Google Scholar]
- Davydov E.V. et al. (2010) Identifying a high fraction of the human genome to be under selective constraint using GERP ++. PLoS Comput. Biol., 6, e1001025.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- de Beer T.A. et al. (2013) Amino acid changes in disease-associated variants differ radically from variants observed in the 1000 genomes project dataset. PLoS Comput. Biol., 9, e1003382.. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Denis M.G. et al. (2015) EGFR T790M resistance mutation in non small-cell lung carcinoma. Clin. Chim. Acta, 444, 81–85. [DOI] [PubMed] [Google Scholar]
- Edgar R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res., 32, 1792–1797. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gray K.A. et al. (2015) Genenames.org: the HGNC resources in 2015. Nucleic Acids Res., 43, D1079–D1085. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Gribouval O. et al. (2012) Spectrum of mutations in the renin–angiotensin system genes in autosomal recessive renal tubular dysgenesis. Hum. Mutat., 33, 316–326. [DOI] [PubMed] [Google Scholar]
- Heckman-Stoddard B.M., Smith J.J. (2014) Precision medicine clinical trials: defining new treatment strategies. Semin. Oncol. Nurs., 30, 109–116. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Hidaka K. et al. (1997) Genetic analysis of a Japanese patient with butyrylcholinesterase deficiency. Ann. Hum. Genet., 61, 491–496. [DOI] [PubMed] [Google Scholar]
- Ji Y. et al. (2014) Citalopram and escitalopram plasma drug and metabolite concentrations: genome-wide associations. Br. J. Clin. Pharmacol., 78, 373–383. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kamatani Y. et al. (2010) Genome-wide association study of hematological and biochemical traits in a Japanese population. Nat. Genet., 42, 210–215. [DOI] [PubMed] [Google Scholar]
- Kim S. et al. (2016) PubChem substance and compound databases. Nucleic Acids Res., 44, D1202–D1213. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kircher M. et al. (2014) A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet., 46, 310–315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kobayashi S. et al. (2005) EGFR mutation and resistance of non-small-cell lung cancer to gefitinib. N. Engl. J. Med., 352, 786–792. [DOI] [PubMed] [Google Scholar]
- Laborde E. (2010) Glutathione transferases as mediators of signaling pathways involved in cell proliferation and cell death. Cell Death Differ., 17, 1373–1380. [DOI] [PubMed] [Google Scholar]
- Landrum M.J. et al. (2016) ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res., 44, D862–D868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Law V. et al. (2014) DrugBank 4.0: shedding new light on drug metabolism. Nucleic Acids Res., 42, D1091–D1097. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lee J.M. et al. (2005) Association of GSTP1 polymorphism and survival for esophageal cancer. Clin Cancer Res., 11, 4749–4753. [DOI] [PubMed] [Google Scholar]
- Lindh J.D. et al. (2009) Influence of CYP2C9 genotype on warfarin dose requirements—a systematic review and meta-analysis. Eur. J. Clin. Pharmacol., 65, 365–375. [DOI] [PubMed] [Google Scholar]
- Liu X. et al. (2016) dbNSFP v3.0: a one-stop database of functional predictions and annotations for human nonsynonymous and splice-site SNVs. Hum. Mutat., 37, 235–241. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Lo Bello M. (1997) Multifunctional role of Tyr 108 in the catalytic mechanism of human glutathione transferase P1-1. Crystallographic and kinetic studies on the Y108F mutant enzyme. Biochemistry, 36, 6207–6217. [DOI] [PubMed] [Google Scholar]
- Madej T. et al. (2012) MMDB: 3D structures and macromolecular interactions. Nucleic Acids Res, 40, D461–D464. [DOI] [PMC free article] [PubMed] [Google Scholar]
- McGuire M.C. et al. (1989) Identification of the structural mutation responsible for the dibucaine-resistant (atypical) variant form of human serum cholinesterase. Proc. Natl. Acad. Sci. USA, 86, 953–957. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Nicolet Y. et al. (2003) Crystal structure of human butyrylcholinesterase and of its complexes with substrate and products. J. Biol. Chem., 278, 41140–41147. [DOI] [PubMed] [Google Scholar]
- Okada Y. et al. (2012) Meta-analysis identifies multiple loci associated with kidney function-related traits in east Asian populations. Nat. Genet., 44, 904–909. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Parker L.J. et al. (2008) The anti-cancer drug chlorambucil as a substrate for the human polymorphic enzyme glutathione transferase P1-1: kinetic properties and crystallographic characterisation of allelic variants. J. Mol. Biol., 380, 131–144. [DOI] [PubMed] [Google Scholar]
- Primo-Parmo S.L. et al. (1996) Characterization of 12 silent alleles of the human butyrylcholinesterase (BCHE) gene. Am. J. Hum. Genet., 58, 52–64. [PMC free article] [PubMed] [Google Scholar]
- Quillen E.E. et al. (2014) ALDH2 is associated to alcohol dependence and is the major genetic determinant of “daily maximum drinks” in a GWAS study of an isolated rural Chinese sample. Am. J. Med. Genet. B Neuropsychiatr. Genet., 165, B, 103–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Saint Pierre A., Génin E. (2014) How important are rare variants in common disease? Brief Funct. Genomics, 13, 353–361. [DOI] [PubMed] [Google Scholar]
- Sherry S.T. et al. (2001) dbSNP: the NCBI database of genetic variation. Nucleic Acids Res., 29, 308–311. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Simon R. (2016) Genomic alteration-driven clinical trial designs in oncology. Ann. Intern. Med., 165, 270–278. [DOI] [PubMed] [Google Scholar]
- Suárez D., Field M.J. (2005) Molecular dynamics simulations of human butyrylcholinesterase. Proteins, 59, 104–117. [DOI] [PubMed] [Google Scholar]
- Sun H.Y. et al. (2014) Finding chemical drugs for genetic diseases. Drug Discov. Today, 19, 1836–1840. [DOI] [PubMed] [Google Scholar]
- Takeuchi F. et al. (2012) Genome-wide association study of coronary artery disease in the Japanese. Eur. J. Hum. Genet., 20, 333–340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Townsend D.M., Tew K.D. (2003) The role of glutathione-S-transferase in anti-cancer drug resistance. Oncogene, 22, 7369–7375. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Walser J.C., Furano A.V. (2010) The mutational spectrum of non-CpG DNA varies with CpG content. Genome Res., 20, 875–882. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Welfare M. et al. (1999) Polymorphisms in GSTP1, GSTM1, and GSTT1 and susceptibility to colorectal cancer. Cancer Epidemiol. Biomarkers Prev., 8, 289–292. [PubMed] [Google Scholar]
- Welter D. et al. (2014) The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res., 42, D1001–D1006. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wen W. et al. (2014) Meta-analysis of genome-wide association studies in East Asian-ancestry populations identifies four new loci for body mass index. Hum. Mol. Genet., 23, 5492–5504. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yali X. et al. (2012) Deleterious- and disease-allele prevalence in healthy individuals: insights from current predictions, mutation databases, and population-scale resequencing. Am. J. Hum. Genet., 91, 1022–1032. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yan L. et al. (2010) Analysis of transmembrane domains 1 and 4 of the human angiotensin II AT1 receptor by cysteine-scanning mutagenesis. J. Biol. Chem., 285, 2284–2293. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang H. et al. (2015) Structural basis for ligand recognition and functional selectivity at angiotensin receptor. J. Biol. Chem., 290, 29127–29139. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

