Shihab et al., Human Mutation 1 Supp. Figure S1. Scoring the Magnitude of Effect of Amino Acid Substitutions. The expected input for an unweighted prediction is the protein sequence and substitution whereas the expected input for a weighted prediction is the SwissProt/TrEMBL protein ID and substitution. Next, protein domain annotations from the SUPERFAMILY and Pfam databases are made. In addition, if an unweighted prediction is requested, an ab initio HMM is built fromthe alignment of homologous sequences collected as part of the JackHMMERalgorithm. The amino acidsubstitution is thenmapped onto the correspondingHMM match states where the information gain (as measured by the Kullback-Leibler divergence from the SwissProt/TrEMBLamino acid composition) is then calculated. This is then used to deduce the most informative HMM and a prediction is made accordingly.Shihab et al., Human Mutation 2 Supp. Figure S2. In this study, we interrogate the amino acid conservation within homologous (both orthologous and paralogous) sequences using HMMs. The JackHMMER algorithm takes a query sequence and iteratively searches a sequence database for homologous sequences (akin to the PSI-BLAST algorithm). In order to establish the optimal parameters for the JackHMMER algorithm, i.e. the sequence database to search (SwissProt/TrEMBL or UniRef90) and the number of iterations required, we randomly sampled 100 proteins from the SwissVar dataset (involving 143 disease-causing and 311 functionally neutral AASs) and compared the performance of FATHMM at each JackHMMER iteration using Equation 1 (Figures A & B). In general, we observed no significant improvements in the performance of FATHMM at later iterations in either sequence database, indicating that a single iteration is sufficient when searching for homologous sequences for the computational prediction ofthe functional effects of amino acid substitutions. Therefore, the final versionofFATHMM implements one JackHMMER iteration across the UniRef90. Shihab et al., Human Mutation 3 Supp. Figure S3. A Receiver Operating Characteristic (ROC) curve showing the accumulated true positives plotted against the accumulated false positives for all unweighted (A) and weighted (B) computational prediction algorithms evaluated using the SwissVar benchmark dataset. Here, the “HumVar” and “Profile” versions ofPolyPhen v2 and PhD-SNP are plotted as they performed best (in terms of performance accuracy). Shihab et al., Human Mutation 4 Supp. Table S1. Mutation Submission Procedure forComputational Prediction Methods PredictionMethod Methodology SIFT Local Installation PolyPhen 1 Automatic Web Submission/Scraping PolyPhen 2 BatchSubmission PANTHER Local Installation PhD-SNP Local Installation PMut Author Request SNPs&GO Automatic Web Submission/Scraping MutPred Author Request For methods without a batch submission/download facility, we developed customweb-scraping scripts in the Python programming language (available upon request) which submitted the mutations, one at a time, and parsed the predictions. For prediction methods where this was not possible, e.g. MutPred, the authors of the method kindly processed our mutation dataset. Shihab et al., Human Mutation 5 Supp. Table S2. Performance of OurWeighted Method using a Leave-One-Out Analysis Accuracy Precision Specificity Sensitivity NVP MCC VariBench† 0.86 0.86 0.86 0.85 0.85 0.71 SwissVar† 0.81 0.84 0.85 0.77 0.79 0.63 BRCA1 - - 0.60 0.47 - - MSH2 - - 0.50 0.74 - - MLH1 - - 0.19 0.95 - - TP53 - - NA 1.00 - - For this analysis, we adjusted our pathogenicity weights, Wd and Wn, if and only when the AAS being evaluated was present ineither the HGMD [Stenson et al.,2009]or UniProt [Apweiler et al., 2004] datasets. We observed no significant deviations in the performance measures and concluded that the performances observed in our benchmarks were not biased towards the pathogenicity weights employed. † The performance measurements reported are calculated from normalised numbers. Shihab et al., Human Mutation 6 Supp. Table S3. Availability of Computational Prediction Methods SIFT PolyPhen v1 PolyPhen v2 PANTHER PhD-SNP (Profile) PMut SNPs&GO MutPred FATHMM Web-Server ✓ * ✓ ✓ ✓ ✓ ✓ ✓ ✓ Average Run-Time (Single Query) † - † < 1 Minute 2 Minutes † † † † BatchFacility Available ✓ - ✓ ✗ ✗ ✗ ✗ ✗ ✓ BatchFacility Limitation 1,000 Proteins - 150,000 AASs - - - - - Unlimited Phenotypic Associations ✗ - ✗ ✗ ✗ ✗ ✗ ✗ ✓ Download Available ✓ - ✓ ✓ ✓ ✗ ✗ ✗ ✓ OptionalPre-Computed Database‡ ✗ - ✓ ✗ ✗ - - - ✓ Open Source ✗ - ✗ ✓ ✗ - - - ✓ * PolyPhen v1 has now been discontinued and is no longer accepting user submissions † pre-computed / near-instant predictions available (restrictions may apply) ‡ optional pre-computed database for near-instant predictions while running locally Shihab et al., Human Mutation 7 Supp. Table S4. The Predicted Phenotypic Consequences of Disease-Associated AASs against theirAssociated Diseases/Abnormalities Gene & Amino Acid Substitution Associated Disease & MIMIdentifier Phenotypic Inference FBN1 C1971Y Marfan syndrome MIM# 154700  Dilatationof the Ascending Aorta  Abnormality ofthe Aortic Valve  Emphysema  MitralRegurgitation HEXA W485R GM2-Gangliosidosistype 1 MIM# 272800  Abnormality ofMetabolism/Homeostasis  Angiokeratoma  MentalDeterioration  Cardiomegaly  Beakingof Vertebral Bodies PSAP C388F Atypical Gaucher disease MIM# 610539  Abnormality ofthe MusculoskeletalSystem  Abnormality ofMetabolism/Homeostasis  Abnormality ofthe Immune System  Functional Respiratory Abnormality  Abnormality ofthe Lung  Respiratory Insufficiency CHRNG R239C Escobar syndrome MIM# 265000  Intermittent Episodes of Respiratory Insufficiency Due toMuscleWeakness  Prolonged Miniature Endplate Currents  DecreasedSize of Nerve Terminals  Generalized Muscle Weakness due toDefectat the Neuromuscular Junction  Muscle FiberAtrophy  Multiple Pterygia  Generalized Amyoplasia  DecreasedFetal Movement  Hypoplastic Heart  Cystic Hygroma  Poor Feeding due toMuscle Weakness  Easy Fatigability  Gower Sign  Flat Nose  Abnormal CervicalCurvature  Thin Ribs  Vertebral Fusion  Weak Cry  Ophthalmoparesis  Ptosis  High-Arched Palate  Poor Suck  Macrotia  Bulbar Palsy  IntrauterineGrowth Restriction  Myopathy  Malignant Hyperthermia  Umbilical Hernia  Joint Dislocation  Abnormality of TemperatureRegulation  Abnormality ofPrenatal Development or Birth  Abnormality ofMuscle Fibers  Abnormality ofthe Nervous System  Functional Respiratory Abnormality Shihab et al., Human Mutation 8 Supp. Table S5. Interesting Single Nucleotide Variants (SNVs) Between the “Elite” and “Landrace” Wheat Varieties Wheat Contig & nsSNP Position Phenotypic Inference Contig F0Z7V0F01D2DA5 nsSNP Position 127  1 Main ShootGrowth  Seed Development Stages  Plant StructureDevelopment Stage  Flower Development Stages  CorollaDevelopmental Stages  4 Anthesis  3 Flower Organ Development Stages  4 Leaf Senescence Stage  A VegetativeGrowth  Embryo Development Stages  Whole Plant Growth Stage  LeafProduction  LP.06 Six Leaves Visible  LP.12 Twelve Leaves Visible  LP.04 Four Leaves Visible  F Mature Embryo Stage  D Bilateral Stage  E ExpandedCotyledon Stage  LP.02 Two Leaves Visible  LP.10 Ten Leaves Visible  C GlobularStage  LeafDevelopment Stages  LP.08 Eight Leaves Visible Contig 09781 nsSNP Position 386  Plant StructureDevelopment Stage