New genomic and large-scale data hold the promise of revolutionizing our understanding and treatment of human disease, and are already influencing clinical practice. Multiple barriers stand between the acquisition of the data and fully realizing these and other benefits. In particular, we need powerful and well-characterized computational methods for deducing the phenotypic impact of genomic and system level perturbations. Many such methods have been developed, but currently, even though some are already deployed in clinical settings, we often remain ignorant of how they actually perform, as well as how and when they should be applied. Further, it is already clear that new and more sophisticated approaches must be developed to fully meet these challenges.
The Critical Assessment of Genome Interpretation (CAGI, \'kā-jē\) conducts community experiments to objectively assess computational methods for determining the phenotypic impacts of genomic variation. The primary goals are to establish the state of the art, to show where future progress may best be made, to highlight innovations and progress, and to build a strong collaborative community. In the CAGI experiments, participants are typically provided genetic variants and make blind predictions of resulting phenotypes. These predictions are evaluated against gold-standard experimental or clinical data by independent assessors. Four CAGI experiments have been conducted to date – a pilot in 2010, and three full-scale events in 2011, 2013, and 2016. Each edition of CAGI involves about 10 challenges. The experiment is conducted over a period of about a year, starting with the identification and development of suitable challenges, followed by a period during which participants are invited to submit their predictions, then a term in which the independent assessors evaluate the results, and concluding with a meeting to discuss the outcomes.
CAGI challenges span a wide range of relationships between genetic variation and disease. For single base variants, there are challenges that address the problem of interpreting the impact of missense mutations on protein activity using a variety of molecular and cellular phenotypes; challenges that test the ability to predict the effect of mutations in cancer driver genes on cell growth; and challenges on the effect of single base variants on RNA expression levels and splicing (including Carraro, Minervini, et al., 2017; Xu, Tang, et al., 2017; Zhang, Linch, et al., 2017; Niroula and Vihinen 2017; Katsonis and Lichtarge 2017; Capriotti et al. 2017; Pejaver et al. 2017; Yin et al. 2017; Tang et al., 2017; Tang and Fenton, 2017; Kreimer, Zeng, et al., 2017; Beer, 2017; Zeng, Edwards, et al., 2017). At the level of full exome and genome sequence, there are challenges that assess methods for assigning complex traits phenotypes and that evaluate ability to associate genome sequence an extensive profile of phenotypic traits (including Daneshjou, Wang, et al., 2017; Giollo, Jones, et al., 2017; Pal, Kundu, et al., 2017a; Laksshman, Bhat, et al., 2017; Wang, Chang, et al., 2017; Daneshjou, Gamazon, et al., 2014; Cai, Li, et al., 2017). CAGI also included challenges in which participants were asked to identify causative variants for rare diseases in gene-panel, exome and whole genome sequence data (including Chandonia, Adhikari, et al., 2017; Kundu, Pal, et al., 2017; Pal, Kundu, et al., 2017b). Many challenges have focused on cancer, given its prevalence and the impact of genetics.
This special issue of Human Mutation contains a selection papers reporting the assessments of challenge results, as well as papers from some individual participating teams, describing their methods and the results obtained. Most papers report on the recent challenges, from CAGI4, held in 2016. As CAGI best helps further development when challenges reoccur year after year, some manuscripts discuss results from the earlier editions of CAGI and their development over time.
Together, these results from CAGI offer powerful insights into the appropriate level of confidence to place in variant annotations and interpretation methods, and which classes of approaches are most suitable for a particular application. They reveal limitations of current data collection and analysis approaches and point to areas for future research and new approaches.
The fifth CAGI edition is presently underway. Full information about this and the previous CAGI editions is at http://www.genomeinterpretation.org.
Acknowledgments
We are most grateful to all CAGI participants. The primary contributors whose work is assessed in CAGI is that of the predictors: Allison Abad, Ogun Adebali, Ivan Adzhubey, Talal Amin, Johnathan R. Azaria, Giulia Babbi, Eraan Bachar, Benjamin Bachman, Minkyung Baek, Greet De Baets, Michael Beer, Violeta Beleva-Guthrie, Bonnie Berger, Brady Bernard, Rajendra Bhat, Rohit Bhattacharya, Samuele Bovo, Marcus Breese, Aharon S. Brodie, Yana Bromberg, Binghuang Cai, Colin Campbell, Chen Cao, Emidio Capriotti, Marco Carraro, Rita Casadio, Hannah Carter, Billy H. W. Chang, Shann-Ching Chen, Yun-Ching Chen, Chien-Yuan Chen, Melissa Cline, Andrea Corredor, Chen Cui, Carla Davis, Mark Diekhans, Rezarta I. Dogan, Christopher Douville, Ian Driver, Roland Dunbrack, Joost van Durme, Andrea Eakin, Matthew Edwards, Gokcen Eraslan, Hai Fang, Carlo Ferrari, Anna Flynn, Lukas Folkman, Colby T. Ford, Adam Frankish, Zaneta Franklin, Yao Fu, Alessandra Gasparini, Tom Gaunt, David Gifford, Manuel Giollo, Nina Gonzaludo, Valer Gotea, Julian Gough, Yuchun Guo, Jennifer Harrow, Marcia Hasenahuer, Lim Heo, Ramin Homayouni, Raghavendra Hosur, Cheng L. V. Huang, Peter Huwe, Sohyun Hwang, Tadashi Imanishi, Jules Jacobsen, Chan-Seok Jeong, Yuxiang Jiang, David T. Jones, Daniel Jordan, Beomchang Kang, Rachel Karchin, Panagiotis Katsonis, Sunduz Keles, Manolis Kellis, Nikki Kiga, Dongsup Kim, Eiru Kim, Jack F. Kirsch, Michael Kleyman, Andreas Kraemer, Anshul Kundaje, Kunal Kundu, Pui-Yan Kwok, Ernest Lam, Dae Lee, Gyu Rie Lee, Insuk Lee, Pietro Di Lena, Emanuela Leonardi, Andy Li, Mulin Jun Li, Yue Li, Biao Li, Olivier Lichtarge, Chiao-Feng Lin, Rhonald C. Lua, Angel Mak, Pier L. Martelli, David Masica, Zev Medoff, Aziz M. Mezlini, Rahul Mohan, Alexander M. Monzon, Sean D. Mooney, Matthew Mort, John Moult, Steve Mount, Eliseos Mucaki, Jonathan Mudge, Nikola Mueller, Chris Mungall, Katsuhiko Murakami, Yoko Nagai, Noushin Niknafs, Abhishek Niroula, Conor M. L. Nodzak, Yanay Ofran, Ayodeji Olatubosun, Kymberleigh Pagel, Lipika R. Pal, Taeyong Park, Nathaniel Pearson, Vikas Pejaver, Jian Peng, Alexandra Piryatinska, Catherine Plotts, Predrag Radivojac, Aditya R. Rao, Aliz Rao, Graham Ritchie, Peter Rogan, Frederic Rousseau, Jana M. Schwarz, Joost Schymkowitz, Chaok Seok, George Shackelford, Sohela Shah, Maxim Shatsky, Ron Shigeta, Hashem A. Shihab, Jung E. Shim, Junha Shin, Sunyoung Shin, Ilya Shmulevich, Bradford R. Silver, Nasa Sinnott-Armstrong, Ben Smithers, Yesim A. Son, Mario Stanke, Nathan Stitziel, Andrew Su, Laksshman Sundaram, Paul Tang, Nuttinee Teerakulkittipong, Natalie Thurlby, Janita Thusberg, Kevin Tian, Collin Tokheim, Silvio C. E. Tosatto, Yemliha Tuncel, Tychele Turner, Ron S. Unger, Aneeta Uppal, Gurkan Ustunkar, Jouni Valiaho, Mauno Vihinen, Mary Wahl, Michael Wainberg, Meng Wang, Maggie Wang, Yanran Wang, Xinyuan Wang, Li-San Wang, Liping Wei, Qiong Wei, Rene Welch, Stephen Wilson, Chunlei Wu, Lijing Xu, Qifang Xu, Yuedong Yang, Christopher Yates, Yizhou Yin, Chen-Hsin Yu, Dejian Yuan, Jan Zaucha, Haoyang Zeng, and Maya Zuhl
We are deeply grateful to the many researchers who shared their data, typically before publication and often requiring extensive permissions and review, to create the CAGI challenges: Russ Altman, Adam P. Arkin, Madeleine P. Ball, Jason Bobe, Paolo Bonvini, Bethany Buckley, George Church, Garry R. Cutting, Emma D’Andrea, Lisa Elefanti, Aron W. Fenton, Andre Franke, Nina Gonzaludo, Joe W. Gray, Linnea Jannson, John P. Kane, Pui-Yan Kwok, Rick Lathrop, Jonathan H. LeBowitz, Federica Lovisa, Angel C. Y. Mak, Mary J. Malloy, Richard McCombie, Chiara Menin, M. Stephen Meyn, John Moult, Robert Nussbaum, Lipika R. Pal, Britt-Sabina Petersen, Mehdi Pirooznia, James B. Potash, Clive R. Pullinger, Jasper Rine, Frederick Roth, Pardis Sabeti, Jeremy Sanford, Maria C. Scaini, Nicole Schmitt, Jay Shendure, Molly Sheridan, Michael Snyder, Tim Sterne-Weiler, Paul L. F. Tang, Sean Tavtigian, Ryan Tewhey, Silvio C. E. Tosatto, Jochen Weile, G. Karen Yu, and Peter Zandi.
The CAGI experiment also depended upon the assessors who evaluated each challenge: Aashish Adhikari, Marco Carraro, John-Marc Chandonia, Rui Chen, Wyatt T. Clark, Roxana Daneshjou, Roland Dunbrack, Iddo Friedberg, Gad Getz, Manuel Giollo, Nick Grishin, Rachel Karchin, Anat Kreimer, Stephen. Meyn, Sean D. Mooney, Alexander A. Morgan, John Moult, Robert Nussbaum, Jeremy Sanford, David B. Searls, Artem Sokolov, Josh Stuart, Shamil Sunyaev, Sean Tavtigian, Silvio C. E. Tosatto, Qifang Xu, and Nir Yosef.
We have been the beneficiaries of many who offered insights and guidance, essential to CAGI’s success including our advisory board members over the years: Russ Altman, George Church, Tim Hubbard, Scott Kahn, Sean D. Mooney, Pauline Ng, Susanna Repo, and John Shon; our Scientific Council: Patricia Babbitt, Atul Butte, Garry R. Cutting, Laura Elnitski, Reece Hart, Ryan Hernandez, Rachel Karchin, Robert Nussbaum, Michael Snyder, Shamil Sunyaev, Joris Veltman, and Liping Wei; and the CAGI Ethics Forum: Wylie Burke, Lawrence R Carr, Flavia Chen, Julie Harris-Wai, Kirsten Isgro, Barbara A. Koenig, Selena Martinez, Robert Nussbaum, and Mark Yarborough.
We also acknowledge those who helped organize the CAGI experiment and help with its technology John-Marc Chandonia, Ajithavalli Chellappan, Flavia Chen, Navya Dabbiru, Reece Hart (who coined the term ‘CAGI’), Melissa K. Ly, Andrew J. Neumann, Gaurav Pandey, Sadhna Rana, Rajgopal Srinivasan, Stephen Yee, Sri Jyothsna Yeleswarapu, and Maya Zuhl. We especially recognize Tata Consultancy Services, which has been a generous collaborator in organizing the CAGI experiment. We also acknowledge the efforts of Brenner and Moult lab researchers who contributed to CAGI.
We greatly appreciate the efforts of the anonymous peer reviewers and of Lead Guest Editor Rachel Karchin, and of Atul Butte, Scott Kahn, Sean D. Mooney, Robert Nussbaum, and Predrag Radivojac who served as Consulting Guest Editors and oversaw the peer-review process. Steven E. Brenner and John Moult were the Special Issue Editors, and Gaia Andreoletti was the Organizing Guest Editor, for this special issue of Human Mutation. We sincerely thank Christine Murray, Stephanie Serraon and Sean Yaftali for their efforts coordinating the editorial and production operations, respectively, at Wiley.
Finally, we also wish to acknowledge our profound debt to those many individuals who shared their private genetic and phenotypic or clinical information as participants in the research studies and clinical datasets that comprise the CAGI challenges.
Funding
The CAGI experiment coordination is supported by NIH U41 HG007346 and the CAGI conference by NIH R13 HG006650. SR was supported by a Marie Curie International Outgoing Fellowship (PIOF-GA-2009-237751). Support has also been provided via a research agreement with the Tata Consultancy Services.
References
- Beer MA. Predicting enhancer activity and variant impact using gkm-SVM. Human Mutation. 2017;38(9):1251–1258. doi: 10.1002/humu.23185. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cai B, Li B, Kiga N, Thusberg J, Bergquist T, Chen Y-C, Niknafs N, Carter H, Tokheim C, Beleva-Guthrie V, Douville C, Bhattacharya R, Yeo HTG, Fan J, Sengupta S, Kim D, Cline M, Turner T, Diekhans M, Zaucha J, Pal LR, Cao C, Yu C-H, Yin Y, Carraro M, Giollo M, Ferrari C, Leonardi E, Tosatto SCE, Bobe J, Ball M, Hoskins RA, Repo S, Church G, Brenner SE, Moult J, Gough J, Stanke M, Karchin R, Mooney SD. Matching phenotypes to whole genomes: Lessons learned from four iterations of the personal genome project community challenges. Human Mutation. 2017;38(9):1266–1276. doi: 10.1002/humu.23265. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Capriotti E, Martelli PL, Fariselli P, Casadio R. Blind prediction of deleterious amino acid variations with SNPs&GO. Human Mutation. 2017;38(9):1064–1071. doi: 10.1002/humu.23179. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Carraro M, Minervini G, Giollo M, Bromberg Y, Capriotti E, Casadio R, Dunbrack R, Elefanti L, Fariselli P, Ferrari C, Gough J, Katsonis P, Leonardi E, Lichtarge O, Menin C, Martelli PL, Niroula A, Pal LR, Repo S, Scaini MC, Vihinen M, Wei Q, Xu Q, Yang Y, Yin Y, Zaucha J, Zhao H, Zhou Y, Brenner SE, Moult J, Tosatto SCE. Performance of in silico tools for the evaluation of p16INK4a (CDKN2A) variants in CAGI. Human Mutation. 2017;38(9):1042–1050. doi: 10.1002/humu.23235. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chandonia J-M, Adhikari A, Carraro M, Chhibber A, Cutting GR, Fu Y, Gasparini A, Jones DT, Kramer A, Kundu K, Lam HYK, Leonardi E, Moult J, Pal LR, Searls DB, Shah S, Sunyaev S, Tosatto SCE, Yin Y, Buckley BA. Lessons from the CAGI-4 Hopkins clinical panel challenge. Human Mutation. 2017;38(9):1155–1168. doi: 10.1002/humu.23225. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daneshjou R, Gamazon ER, Burkley B, Cavallari LH, Johnson JA, Klein TE, Limdi N, Hillenmeyer S, Percha B, Karczewski KJ, Langaee T, Patel SR, Bustamante CD, Altman RB, Perera MA. Genetic variant in folate homeostasis is associated with lower warfarin dose in African Americans. Blood. 2014;124:2298–2305. doi: 10.1182/blood-2014-04-568436. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Daneshjou R, Wang Y, Bromberg Y, Bovo S, Martelli PL, Babbi G, Lena PDi, Casadio R, Edwards M, Gifford D, Jones DT, Sundaram L, Bhat R, Li X, Pal LR, Kundu K, Yin Y, Moult J, Jiang Y, Pejaver V, Pagel KA, Li B, Mooney SD, Radivojac P, Shah S, Carraro M, Gasparini A, Leonardi E, Giollo M, Ferrari C, Tosatto SCE, Bachar E, Azaria JR, Ofran Y, Unger R, Niroula A, Vihinen M, Chang B, Wang MH, Franke A, Petersen B-S, Pirooznia M, Zandi P, McCombie R, Potash JB, Altman R, Klein TE, Hoskins R, Repo S, Brenner SE, Morgan AA. Working towards precision medicine: Predicting phenotypes from exomes in the Critical Assessment of Genome Interpretation (CAGI) challenges. Human Mutation. 2017;38(9):1182–1192. doi: 10.1002/humu.23280. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Giollo M, Jones DT, Carraro M, Leonardi E, Ferrari C, Tosatto SCE. Crohn disease risk prediction-Best practices and pitfalls with exome data. Human Mutation. 2017;38(9):1193–1200. doi: 10.1002/humu.23177. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Katsonis P, Lichtarge O. Objective assessment of the evolutionary action equation for the fitness effect of missense mutations across CAGI-blinded contests. Human Mutation. 2017;38(9):1072–1084. doi: 10.1002/humu.23266. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kreimer A, Zeng H, Edwards MD, Guo Y, Tian K, Shin S, Welch R, Wainberg M, Mohan R, Sinnott-Armstrong NA, Li Y, Eraslan G, Amin T Bin, Goke J, Mueller NS, Kellis M, Kundaje A, Beer MA, Keles S, Gifford DK, Yosef N. Predicting gene expression in massively parallel reporter assays: A comparative study. Human Mutation. 2017;38(9):1240–1250. doi: 10.1002/humu.23197. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Kundu K, Pal LR, Yin Y, Moult J. Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge. Human Mutation. 2017;38(9):1201–1216. doi: 10.1002/humu.23249. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Laksshman S, Bhat RR, Viswanath V, Li X. Deep bipolar: Identifying genomic mutations for bipolar disorder via deep learning. Human Mutation. 2017;38(9):1217–1224. doi: 10.1002/humu.23272. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Niroula A, Vihinen M. PON-P and PON-P2 predictor performance in CAGI challenges: Lessons learned. Human Mutation. 2017;38(9):1085–1091. doi: 10.1002/humu.23199. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pal LR, Kundu K, Yin Y, Moult J. CAGI4 Crohn’s exome challenge: Marker SNP versus exome variant models for assigning risk of Crohn disease. Human Mutation. 2017a;38(9):1225–1234. doi: 10.1002/humu.23256. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pal LR, Kundu K, Yin Y, Moult J. CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants. Human Mutation. 2017b;38(9):1169–1181. doi: 10.1002/humu.23257. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Pejaver V, Mooney SD, Radivojac P. Missense variant pathogenicity predictors generalize well across a range of function-specific prediction challenges. Human Mutation. 2017;38(9):1092–1108. doi: 10.1002/humu.23258. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang Q, Alontaga AY, Todd H, Fenton AW. Exploring the limits of the usefulness of mutagenesis in studies of allosteric mechanisms. Human Mutation. 2017;38(9):1144–1154. doi: 10.1002/humu.23239. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tang Q, Fenton AW. Whole-protein alanine-scanning mutagenesis of allostery: A large percentage of a protein can contribute to mechanism. Human Mutation. 2017;38(9):1132–1143. doi: 10.1002/humu.23231. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang MH, Chang B, Sun R, Hu I, Xia X, Wu WKK, Chong KC, Zee BC-Y. Stratified polygenic risk prediction model with application to CAGI bipolar disorder sequencing data. Human Mutation. 2017;38(9):1235–1239. doi: 10.1002/humu.23229. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Xu Q, Tang Q, Katsonis P, Lichtarge O, Jones D, Bovo S, Babbi G, Martelli PL, Casadio R, Lee GR, Seok C, Fenton AW, Dunbrack RL. Benchmarking predictions of allostery in liver pyruvate kinase in CAGI4. Human Mutation. 2017;38(9):1123–1131. doi: 10.1002/humu.23222. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Yin Y, Kundu K, Pal LR, Moult J. Ensemble variant interpretation methods to predict enzyme activity and assign pathogenicity in the CAGI4 NAGLU (Human N-acetyl-glucosaminidase) and UBE2I (Human SUMO-ligase) challenges. Human Mutation. 2017;38(9):1109–1122. doi: 10.1002/humu.23267. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zeng H, Edwards MD, Guo Y, Gifford DK. Accurate eQTL prioritization with an ensemble-based framework. Human Mutation. 2017;38(9):1259–1265. doi: 10.1002/humu.23198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Zhang J, Linch L, Cong Q, Jochen W, Song S, Cote A, Roth F, Grishin N. Assessing predictions of fitness effects of missense mutations in SUMO-conjugating enzyme UBE2I. Human Mutation. 2017;38(9):1051–1063. doi: 10.1002/humu.23293. [DOI] [PMC free article] [PubMed] [Google Scholar]