Abstract
Although reported gene variants in the RET oncogene have been directly associated with multiple endocrine neoplasia type 2 and hereditary medullary thyroid carcinoma, other mutations are classified as variants of uncertain significance (VUS) until the associated clinical phenotype is made clear. Currently, some 46 non-synonymous VUS entries exist in curated archives. In the absence of a gold standard method for predicting phenotype outcomes, this follow up study applies feature selected amino acid physical and chemical properties feeding a Bayes classifier to predict disease association of uncertain gene variants into categories of benign and pathogenic. Algorithm performance and VUS predictions were compared to established phylogenetic based mutation prediction algorithms. Curated outcomes and unpublished RET gene variants with known disease association were used to benchmark predictor performance. Reliable classification of RET uncertain gene variants will augment current clinical information of RET mutations and assist in improving prediction algorithms as knowledge increases.
Introduction
Medical genetics involves diagnosis, management, and determining risk of hereditary disorders [1], [2]. The genotype:phenotype correlation of gene variants in disease is a major component of medical genetics. In monogenic diseases, gene mutations are typically curated as either pathogenic or benign. However, many gene variants must be classified as “unknown” or “uncertain” significance because they have not been clearly associated with a clinical phenotype.
The outlay of time and labor to validate the disease association concerning a variant of uncertain significance (VUS) within the coding portion of a gene can be daunting and cost prohibitive [3], [4]. This is in large part, due to the communication between clinicians and laboratory geneticists needed to resolve these variants [5], [6]. To help bridge this genotype:phenotype gap, the use of machine learning classification algorithms to narrow the uncertain “grey area” between pathogenic and benign sequence variants warrants careful evaluation [7], [8], [9], [10]. Reliable machine learning based classification may augment costly patient recruitment, family histories, and biochemical confirmation of a gene variant with no associated disease correlation [11], [12], [13].
There are established methods for predicting mutation severity based on amino acid substitution penalties, structural disruption, sequence homology (ortholog conservation) or neural nets, such as PolyPhen [13], SIFT [14], MutPred [9] and PMut [15]. However, prediction algorithms are not always in agreement with curated data or each other [16], [17], [18]. Thus, there are opportunities to explore the use of other informatics approaches to this problem. Machine learning methods that can be trained on data available in well-curated gene variant collections may be promising tools to improve the predictive capabilities available to the research community.
The human RET gene (REarranged during Transfection) is located on chromosome 10q.11 codes for 20 exons. The transcript length is 5,659 bps and translates to the 1,114 amino acid residue protein (UniProt RET_HUMAN, #P07949) as shown in Figure 1. The gene belongs to the cadherin superfamily and encodes a receptor tyrosine kinase which functions in signaling pathways for cell growth and differentiation. RET plays a critical role in neural crest development. It can also undergo oncogenic activation in vivo and in vitro by cytogenetic rearrangement. It can be further classified by Gene Ontology (GO) categories (www.geneontology.org) of biological process of homophilic cell adhesion, posterior midgut development, and protein amino acid phosphorylation. Its GO annotated cellular location is component integral to membrane and the GO category of molecular functions lists ATP binding, calcium ion binding and transmembrane receptor protein tyrosine kinase activity. Functional domains of the RET protein are also summarized in Figure 1.
RET is essential for the development of the sympathetic, parasympathetic and enteric nervous systems. Disruption of function by germline mutations in RET have been associated with several diseases in humans including three related inherited cancers: multiple endocrine neoplasia type IIA (MEN2A), multiple endocrine neoplasia type IIB (MEN2B), and familial medullary thyroid carcinoma (FMTC). [19], [20] RET has also been implicated in congenital aganglionosis (absence of enteric nerve cells) in the gastrointestinal tract (Hirschsprung's disease) lack of the neuroenteric plexi impairs smooth muscle activity of the intestines (particularly the colon) resulting in refractory constipation. [21]
Although well understood codon changes often guide patient therapy or surgical options [22], RET gene variants may vary in functional severity, where some are reported as benign, some pathogenic, and some of uncertain significance. Curated RET oncogene mutations have been recently reported by Margraf et al. [23] The disease classification of RET gene variants has been curated as benign (6%), pathogenic (52%) and VUS (42%), meaning unknown or uncertain association with disease or phenotype outcome. This archive currently hosts 146 RET variants, including 62 VUS entries that can be accessed at http://www.arup.utah.edu/database/.
Accurate prediction of disease association for novel mutations and uncertain gene variants is of great importance to medicine and biology. Informatics tools for predicting disease severity of uncertain gene variants will aid in the improvement of genetically-informed patient care. With a rapidly growing number of on-line resources for gene variants collections, the opportunity to apply machine learning algorithms to well curated disease causing gene sets becomes increasingly desirable.
The absence of any gold standard for predicting phenotype severity in uncertain gene variants prompts two questions. Are algorithms trained specific to a gene/disease setting more appropriate to use than generalized on-line prediction tools? Does agreement between several and varying algorithms influence clinician decision-making? This study expands a recently reported algorithm, we here term Primary Sequence Amino Acid Properties (PSAAP), which uses feature selected amino acid physicochemical properties of primary amino acid sequence [24]. This previous work detailed algorithm performance using only gene variants with known disease association, while here we report applying the PSAAP algorithm classification for pathogenicity of novel and uncertain gene variants found in the RET proto-oncogene into categories of benign or pathogenic. The PSAAP algorithm performance has also been compared to four well-established prediction tools available on-line and agreement between algorithms summarized.
Results
The independent test set of RET curated mutations was used to evaluate performance of different categories of classifier algorithms. The best performing algorithm (using Weka) was Naïve Bayes. Algorithm metrics for this novel Bayes classifier of RET disease outcome were calculated using the above test set data. Evaluation of the classifier yielded a sensitivity of 0.938, specificity of 0.867 and positive predictive value (precision) of 0.883. Performance for our Primary Sequence Amino Acid Properties (PSAAP) classifier is summarized in Table 1. A benchmark of prediction performance for the established algorithms (MutPred, PolyPhen, PMut and SIFT) was also performed using curated RET gene variants with known disease outcomes. Following the 88% of the PSAAP classifier, MutPred was next closest to predicting the correct disease outcomes for the known RET variants with 84% precision. PolyPhen yielded the highest specificity for RET variant disease association of 92%, yet had the lowest precision at 54%. PMut correctly predicted gene variant disease outcomes with 72% precision but had the lowest specificity at 59%. Table 1 also summarizes performance metrics (sensitivity, specificity, precision) for curated RET mutations using the four established prediction algorithms.
Table 1. PSAAP algorithm performance of predicted phenotypes using curated RET mutations.
PSAAP Prediction a | MutPred Predictionb | PolyPhen Predictionc | PMut Predictiond | SIFT Predictione | |
Sensitivity | 0.938 | 0.767 | 0.597 | 0.783 | 0.816 |
Specificity | 0.867 | 0.823 | 0.920 | 0.591 | 0.821 |
Precision | 0.883 | 0.843 | 0.541 | 0.723 | 0.779 |
Primary Sequence Amino Acid Properties (PSAAP) algorithm.
Analyzed with default settings at http://mutdb.org/mutpred.
Analyzed with default settings at http://genetics.bwh.harvard.edu/pph.
Analyzed with default settings at http://mmb.pcb.ub.es/PMut.
Analyzed with default settings at http://sift.jcvi.org.
Next, evaluation of RET non-synonymous VUS mutations (n = 46) was performed using our recently reported algorithm [24]. The PSAAP algorithm classified 22 of the uncertain variants as pathogenic, while the remaining 24 fell within the benign grouping. For those variants classified as predicted pathogenic, the PSAAP algorithm estimated confidence remained above 90%. The classifier predicted disease outcome using our algorithm is listed in Table 2.
Table 2. Algorithm agreement for RET uncertain gene variants and predicted pathogenicity.
RET uncertain gene variant | PSAAP Predictiona | MutPred Predictionb | PolyPhe Predictionc | SIFT Predictiond | PMut Predictione |
5/5 agreement | |||||
A510VR600QK603QE632KA640GV648IY791NE843DR844LR844WR886WR912Q | BenignBenignBenignBenignBenignBenignpathogenicbenignpathogenic pathogenic pathogenic pathogenic | BenignBenignBenignBenignBenignBenigndisruptedBenigndisrupted disrupted disrupted disrupted | BenignBenignBenignBenignBenignBenignprobably damagingbenignprobably damaging probably damaging probably damaging probably damaging | Toleratedtoleratedtoleratedtoleratedtoleratedtoleratedaffects functiontoleratedaffects function affects function affects function affects function | Neutralneutralneutralneutralneutralneutralpathologicalneutralpathological pathological pathological pathological |
4/5 agreement | |||||
C611SD631GE805KS819IR833CS904CS904F | pathogenic pathogenicbenignpathogenic pathogenic pathogenic pathogenic | disruptedbenigndisrupted disruptedbenignbenignbenign | Probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging | Affects function affects function affects function affects function affects function affects function affects function | neutralpathological pathologicalneutralpathological pathological pathological |
3/5 agreement | |||||
Y606CC531RG533SD631AD631VR635GP841LL881VK907M | pathogenic pathogenic pathogenic pathogenic pathogenic pathogenic pathogenicbenignpathogenic | benignbenignbenignbenignbenignbenignbenigndisruptedbenign | probably damaging probablydamaging probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging | toleratedtoleratedaffects function affects function affects functiontoleratedtoleratedaffects function affects function | pathological pathologicalneutralneutralneutral pathological pathologicalneutralneutral |
2/5 agreement | |||||
C630SD631ES649LH665QR844QM848TI852MK907E | pathogenicbenignpathogenicbenignbenignbenignbenignbenign | benignbenignbenignbenignbenignbenignbenignbenign | probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging probably damaging | toleratedtoleratedtoleratedtoleratedtoleratedtoleratedaffects function affects function | Neutralpathologicalneutralpathological pathological pathologicalneutralneutral |
1/5 agreement | |||||
G321RE511KD631NA641SK666NR770QN777SV778IE818K | benignbenignbenignbenignbenignbenignbenignbenignbenign | benignbenignbenignbenignbenignbenignbenignbenignbenign | benignbenignbenignpossibly damaging probably damaging probably damaging possibly damagingbenignpossibly damaging | toleratedtoleratedtoleratedtoleratedtoleratedtoleratedtoleratedaffects functiontolerated | pathological pathological pathologicalneutralneutralneutralneutralneutralneutral |
Primary Sequence Amino Acid Properties (PSAAP) algorithm.
Analyzed with default settings at http://mutdb.org/mutpred.
Analyzed with default settings at http://genetics.bwh.harvard.edu/pph.
Analyzed with default settings at http://sift.jcvi.org.
Analyzed with default settings at http://mmb.pcb.ub.es/PMut.
Results from analysis of the RET uncertain gene variants (VUS) using the established on-line prediction tools are also summarized in Table 2, with predicted pathogenic variants bolded and ranked by agreement. The MutPred tool calculates the probability of a deleterious mutation and corresponding hypothesis of disrupted molecular mechanism. We used MutPred's default probability cutoff of 0.75 for differentiating between benign and disrupted/pathogenic mutations. Our PSAAP algorithm agreed with MutPred in 16 benign and 8 pathogenic predictions for 52% agreement (24 out of 46). PolyPhen has outcomes of “benign”, “possibly damaging” and “probably damaging”. The PSAAP classifier agreed with PolyPhen in 13 benign and 22 pathogenic predictions for 76% agreement (35 out of 46). PMut yields outcomes of “pathological” or “neutral” and a corresponding reliability metric (lower is better). Our PSAAP trained algorithm was in concordance with PMut in 13 benign and 14 pathogenic predictions for 58% agreement (27 out of 46). The SIFT algorithm gives outcomes of “tolerated” and “affects protein function”. Our algorithm agreed with SIFT in 19 benign and 16 pathogenic predictions for 76% agreement (35 out of 46).
Of special interest, for predicted RET benign variants, 7 of 24 agreed across all algorithms, while only 6 of 22 predicted pathogenic RET variants showed agreement across the different methods. Although only 13 out of 46 (28%) were concordant, these variants may count as having a higher degree of confidence in prediction due to the varied methodologies and basis of classification. Importantly, the focus of molecular research and clinical efforts could therefore be directed to this prioritized listing of RET uncertain variants. Curated variants are shown mapped across the length of the protein in Figure 2A. This graphing visually highlights the cysteine rich region just prior to the transmembrane domain, and the transmembrane domain itself which contain the majority of pathogenic variants. Our predictions for the uncertain RET variants (VUS) are also mapped by location across the length of the protein as added into Figure 2B.
Finally, several unpublished RET gene variants with known pathological (MEN2) outcomes (n = 5) were identified during routine genetic testing at ARUP Laboratories. To further benchmark a gold standard of truth for RET mutation prediction, all five algorithms were used to classify this set of not yet seen variants. Our novel Bayes trained PSAAP classifier correctly identified all five variants as pathogenic. PMut called 3 disease causing variants correctly, but classified two others as “neutral” mutations, when in fact these changes were known to be associated with disease. PolyPhen also correctly identified 3 as probably damaging (pathogenic), but missed classified the same 2 variants as PMut. SIFT predicted 4 of these variants would affect function (pathogenic), but called one of the same variants “tolerated.” MutPred correctly predicted all 5 as pathogenic.
Discussion
Mutations in the RET proto-oncogene have been directly associated with MEN2 and hereditary medullary thyroid carcinoma, and provide guidance for patient care. Accurate classification of phenotype severity for novel mutations and uncertain variants as relating to disease is of great importance to proper patient care. Although correlation of genotype-phenotype offers therapy options that would otherwise remain hidden and may lead to disease specific mutation-guided management strategies, appropriate caution is justified when clinicians are asked to trust computational outcomes for determining patient care [6].
On-line mutation prediction tools have been available for many years. Prediction tools such as PolyPhen [13] and SIFT [14] are primarily based on multiple alignment and amino acid substitution penalties. More recently, MutPred [9] which calculates probability of deleterious mutations by disrupted molecular mechanism. Additionally, PMut [15] is neural net based and trained on human mutations. We recently reported classification of curated RET gene variants using primary amino acid sequence properties and Naïve Bayes [24]. A key feature to highlight is the fact that the PSAAP algorithm relies on Bayes probability trained on gene-specific and clinically curated disease outcomes. Comparison of this recent PSAAP algorithm with established on-line prediction tools may improve our understanding of predicting mutation status in the RET proto-oncogene.
Sorting Intolerant From Tolerant (SIFT) was first published in 2003 by Ng and Heinikoff from work done at the Fred Hutchinson Cancer Research Center in Seattle [14]. The algorithm predicts whether an amino acid substitution will affect the function of a protein based on both sequence homology to various orthologs and physical properties of amino acids. SIFT is a multistep procedure that (1) searches for and chooses similar sequences (2), makes an alignment of these sequences, and (3) calculates scores based on the amino acids appearing at each position in the alignment. It was initially developed and trained on nsSNP data sets from LacI, Lysozyme, and HIV protease [25]. This algorithm works especially well when adequate numbers of sequence homologs are available for multiple alignment. Conversely, poor performance is seen when multiple alignment in not reliable or completely unavailable.
Polymorphism Phenotyping (PolyPhen) is an EMBL based tool from 2002 from Ramensky et al. [13] It was developed to predict the possible impact of an amino acid substitution on the structure and function of a human protein using physical and comparative considerations. It was originally developed from a set of disease-causing mutations in human proteins with known structures extracted from the SWISS-PROT database, and correlated to the Online Mendelian Inheritance in Man (OMIM) database [26]. Since the algorithm relies on predicted structural disruption, it works especially well where protein structure is known and less reliable when a solved protein structure is not available.
MutPred is a recently developed prediction algorithm by Li, Mooney and Radivojac [9]. It builds on the established SIFT method but offers improved classification accuracy based upon protein sequence, and models changes of structural features and functional sites between wild-type and mutant sequences with output of probabilities of gain or loss of structure and function. It was trained on a set of disease SNPs from cancer and the OMIM disease archive. This predicted disruption of molecular function again work especially well for well studied proteins, where homolog and solved structure is available.
PMut was first published in 2005 by the Molecular Modeling Unit at the Institut de Recerca Biomédica, Parc Científic de Barcelona, Spain [15]. It is based on a two layer neural network and was trained using human mutational data. It allows for either prediction of single point amino acidic mutations or scanning of mutational hot spots. Results are obtained by alanine scanning, identifying massive mutations and genetically accessible mutations. A graphical interface for Protein Data Bank (PDB) structures, when available, and a database containing hot spot profiles for all non-redundant PDB structures are also accessible from the PMut server.
Benchmarking the established prediction algorithms with curated RET variants and associated MEN2 disease demonstrates our PSAAP classifier model compares very well to other established prediction tools. A distinguishing feature of the PSAAP model herein reported is the algorithm was trained specifically to curated RET disease outcomes, as summarized in Figure 3 . This is in contrast to the less robust curated collections of mutations such as OMIM or dbSNP. Further, no homolog alignment or solved protein structure is necessary. Rather, it relies on primary sequence information only - with calculated delta matrices of substituted amino acid properties , and is therefore not limited to scenarios where SIFT or PolyPhen (and others) have traditional been used. These facts may explain the improved performance when classifying RET variants as compared to generalized prediction tools available on-line.
Ranking agreement of predicted phenotype severity across several complimentary algorithms may provide an additional level of clinical confidence in computational classifiers. At a minimum, these five all-in-agreement “predicted pathogenic” RET variants warrant closer investigation by traditional and molecular techniques. Furthermore, algorithm agreement in a clinical setting may be just as important for “benign” as it might be for “pathogenic.”
Personalized treatment in genomic medicine cannot advance until questions such as what was found, what does it mean and what to do about it can be answered for each individual patient and genetic test result. Among the key features critical for a decision support framework in clinical genetic testing is a reliable phenotype classification tool and scoring metric to predict consequences of a variation that alters protein structure. For these uncertain gene variants, the in-house algorithm trained specifically on available RET curated outcomes seems to outperform well-established and generalized prediction tools available on-line. More importantly, agreement between several predictors may provide research priority for novel and uncertain gene variants.
The use of machine learning algorithms to classify uncertain gene variants in disease is a promising tool to strengthen our underlying knowledge of disease pathogenesis. Software algorithms to better classify gene variants of uncertain significance are necessary to move translational research forward. This follow up study used the PSAAP algorithm to “reclassify” 46 variants of uncertain significance within the RET proto-oncogene into categories of benign or pathogenic. This novel application of classification algorithms for computational prediction of phenotype severity in uncertain gene variants could be generally applied to any gene-disease setting where a corpus of curated gene variants are trusted and where reported mutations impact clinical care.
Methods
Non-synonymous RET variants were characterized by physicochemical differences in primary amino acid sequence resulting from the mutation. Attributes of mutation status were characterized using values of 544 physical, chemical, conformational, or energetic properties (AAindex v9.4) [27]. AAindex is a database of numerical indices representing various physicochemical and biochemical properties of amino acids and pairs of amino acids. For each RET variant, matrices of delta values for each biochemical property of the substituted amino acid were calculated by Python scripting and the resulting mutation described by an array of variables archived using SQL - where each matrix corresponds to the absolute value of the difference between the value of the property in the amino acid present in the wild type and the one in the mutant.
As previously described, representative algorithms from different categories of classification (such as nearest neighbor, bayes, regression, rule-based and support vector machine) were evaluated for their ability to correctly predict mutation status in the training set [24]. Briefly, a clinically curated set (n = 84) of non-synonymous RET mutations with known pathogenicity was used to train and test machine learning classification algorithms. Although training and test sets included different disease subtypes such as MEN2A (n = 40), MEN2B (n = 3), FMTC (n = 5), MEN2A and FMTC (n = 36) - class labels of “pathogenic” and “benign” were used to describe all curated disease association. Random selection was used to build a 2/3 training set (n = 56) and 1/3 test set (n = 28). Attribute selection (feature selection) was performed during classification training/testing. Machine classification algorithms were implemented using the Weka software package (v3.6) [28]. When a given classification algorithm produced posterior probabilities of mutation status, we assigned each variant's mutation status according to the higher posterior probability (Weka's default behavior).
The PSAAP algorithm performance was evaluated using the test set, with sensitivity (true positive rate), specificity (true negative rate), and positive predictive value (precision) calculated. A data set of non-synonymous RET uncertain variants (n = 46) was then analyzed using our PSAAP (Naïve Bayes, gene-specific trained) classification algorithm. The workflow of our PSAAP algorithm is summarized in Figure 3.
Next, both curated RET mutations (known disease association) and RET uncertain variants (VUS data) were analyzed and compared using four existing mutation prediction algorithms. These established prediction tools are mainly based on phylogenetic properties such as sequence homology, amino acid substitution penalties or structural disruption. MutPred (mutdb.org/mutpred) [9], PolyPhen (genetics.bwh.harvard.edu/pph) [13], SIFT (sift.jcvi.org) [14], and PMut (mmb2.pcb.ub.es:8080/PMut) [15] were accessed during July/August 2010. Both curated RET variants and RET VUS entries were evaluated using respective default settings.
Finally, several unpublished RET disease variants (n = 5) with known pathogenic outcomes (by surgical pathology, molecular testing and family history) were identified during routine genetic testing at ARUP Laboratories. This nascent set of RET variants was also analyzed and compared by all prediction algorithms to further benchmark some standard of performance and precision. Data and methods used for this study were approved by the Institutional Review Board of the University of Utah.
Acknowledgments
The authors gratefully acknowledge Dr. Margraf for curation of the RET proto-oncogene disease variants.
Footnotes
Competing Interests: The authors have declared that no competing interests exist.
Funding: This work was supported by the University of Utah School of Medicine, Department of Pathology and ARUP Institute for Clinical and Experimental Pathology, NLM training grant LM007124 and NCRR translational science award 1KL2RR025763 01. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
References
- 1.Weinstein ND. Evaluating risk comprehension. J Natl Cancer Inst Monogr; 1999. What does it mean to understand a risk? pp. 15–20. [DOI] [PubMed] [Google Scholar]
- 2.Ensenauer RE, Michels VV, Reinke SS. Genetic testing: practical, ethical, and counseling considerations. Mayo Clin Proc. 2005;80:63–73. doi: 10.1016/S0025-6196(11)62960-1. [DOI] [PubMed] [Google Scholar]
- 3.Nowak R. Genetic testing set for takeoff. Science. 1994;265:464–467. doi: 10.1126/science.8036486. [DOI] [PubMed] [Google Scholar]
- 4.Machens A, Gimm O, Hinze R, Hoppner W, Boehm BO, et al. Genotype-phenotype correlations in hereditary medullary thyroid carcinoma: oncological features and biochemical properties. J Clin Endocrinol Metab. 2001;86:1104–1109. doi: 10.1210/jcem.86.3.7290. [DOI] [PubMed] [Google Scholar]
- 5.Ashley EA, Butte AJ, Wheeler MT, Chen R, Klein TE, et al. Clinical assessment incorporating a personal genome. Lancet. 2010;375:1525–1535. doi: 10.1016/S0140-6736(10)60452-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Tchernitchko D, Goossens M, Wajcman H. In silico prediction of the deleterious effect of a mutation: proceed with caution in clinical genetics. Clin Chem. 2004;50:1974–1978. doi: 10.1373/clinchem.2004.036053. [DOI] [PubMed] [Google Scholar]
- 7.Wei Q, Wang L, Wang Q, Kruger WD, Dunbrack RL., Jr Testing computational prediction of missense mutation phenotypes: functional characterization of 204 mutations of human cystathionine beta synthase. Proteins. 2010;78:2058–2074. doi: 10.1002/prot.22722. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081. doi: 10.1038/nprot.2009.86. [DOI] [PubMed] [Google Scholar]
- 9.Li B, Krishnan VG, Mort ME, Xin F, Kamati KK, et al. Automated inference of molecular mechanisms of disease from amino acid substitutions. Bioinformatics. 2009;25:2744–2750. doi: 10.1093/bioinformatics/btp528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Dorfman R, Nalpathamkalam T, Taylor C, Gonska T, Keenan K, et al. Do common in silico tools predict the clinical consequences of amino-acid substitutions in the CFTR gene? Clin Genet. 2010;77:464–473. doi: 10.1111/j.1399-0004.2009.01351.x. [DOI] [PubMed] [Google Scholar]
- 11.Ferreira-Gonzalez A, Teutsch S, Williams MS, Au SM, Fitzgerald KT, et al. US system of oversight for genetic testing: a report from the Secretary's Advisory Committee on Genetics, Health and Society. Per Med. 2008;5:521–528. doi: 10.2217/17410541.5.5.521. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Williams MS. Quality in clinical genetics. Am J Med Genet C Semin Med Genet. 2009;151C:175–178. doi: 10.1002/ajmg.c.30216. [DOI] [PubMed] [Google Scholar]
- 13.Ramensky V, Bork P, Sunyaev S. Human non-synonymous SNPs: server and survey. Nucleic Acids Res. 2002;30:3894–3900. doi: 10.1093/nar/gkf493. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Ng PC, Henikoff S. SIFT: Predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003;31:3812–3814. doi: 10.1093/nar/gkg509. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Ferrer-Costa C, Gelpi JL, Zamakola L, Parraga I, de la Cruz X, et al. PMUT: a web-based tool for the annotation of pathological mutations on proteins. Bioinformatics. 2005;21:3176–3178. doi: 10.1093/bioinformatics/bti486. [DOI] [PubMed] [Google Scholar]
- 16.Spencer DS, Stites WE. The M32L substitution of staphylococcal nuclease: disagreement between theoretical prediction and experimental protein stability. J Mol Biol. 1996;257:497–499. doi: 10.1006/jmbi.1996.0180. [DOI] [PubMed] [Google Scholar]
- 17.Kang HH, Williams R, Leary J, Ringland C, Kirk J, et al. Evaluation of models to predict BRCA germline mutations. Br J Cancer. 2006;95:914–920. doi: 10.1038/sj.bjc.6603358. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Engelhardt BE, Jordan MI, Muratore KE, Brenner SE. Protein molecular function prediction by Bayesian phylogenomics. PLoS Comput Biol. 2005;1:e45. doi: 10.1371/journal.pcbi.0010045. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Eng C, Clayton D, Schuffenecker I, Lenoir G, Cote G, et al. The relationship between specific RET proto-oncogene mutations and disease phenotype in multiple endocrine neoplasia type 2. International RET mutation consortium analysis. Jama. 1996;276:1575–1579. [PubMed] [Google Scholar]
- 20.Kouvaraki MA, Shapiro SE, Perrier ND, Cote GJ, Gagel RF, et al. RET proto-oncogene: a review and update of genotype-phenotype correlations in hereditary medullary thyroid cancer and associated endocrine tumors. Thyroid. 2005;15:531–544. doi: 10.1089/thy.2005.15.531. [DOI] [PubMed] [Google Scholar]
- 21.Attie T, Pelet A, Edery P, Eng C, Mulligan LM, et al. Diversity of RET proto-oncogene mutations in familial and sporadic Hirschsprung disease. Hum Mol Genet. 1995;4:1381–1386. doi: 10.1093/hmg/4.8.1381. [DOI] [PubMed] [Google Scholar]
- 22.Kloos RT, Eng C, Evans DB, Francis GL, Gagel RF, et al. Medullary thyroid cancer: management guidelines of the American Thyroid Association. Thyroid. 2009;19:565–612. doi: 10.1089/thy.2008.0403. [DOI] [PubMed] [Google Scholar]
- 23.Margraf RL, Crockett DK, Krautscheid PM, Seamons R, Calderon FR, et al. Multiple endocrine neoplasia type 2 RET protooncogene database: repository of MEN2-associated RET sequence variation and reference for genotype/phenotype correlations. Hum Mutat. 2009;30:548–556. doi: 10.1002/humu.20928. [DOI] [PubMed] [Google Scholar]
- 24.Crockett DK, Piccolo SR, Narus SP, Mitchell JA, Facelli JC. Computational feature selection and classification of RET phenotypic severity. J Data Mining in Genom Proteomics. 2010;1:1–4. [Google Scholar]
- 25.Ng PC, Henikoff S. Accounting for human polymorphisms predicted to affect protein function. Genome Res. 2002;12:436–446. doi: 10.1101/gr.212802. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26.Sunyaev S, Ramensky V, Koch I, Lathe W, III, Kondrashov AS, et al. Prediction of deleterious human alleles. Hum Mol Genet. 2001;10:591–597. doi: 10.1093/hmg/10.6.591. [DOI] [PubMed] [Google Scholar]
- 27.Kawashima S, Kanehisa M. AAindex: amino acid index database. Nucleic Acids Res. 2000;28:374. doi: 10.1093/nar/28.1.374. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics. 2004;20:2479–2481. doi: 10.1093/bioinformatics/bth261. [DOI] [PubMed] [Google Scholar]