Abstract
We present here a vision of individualized Knowledge Graphs (iKGs) in cardiovascular medicine: a modern informatics platform of exchange and inquiry that comprehensively integrates biological knowledge with medical histories and health outcomes of individual patients. We envision that this could transform how clinicians and scientists together discover, communicate, and apply new knowledge.
Keywords: artificial intelligence, data mining, informatics, machine learning, precision medicine
What if physicians had their own personal artificial intelligence assistant—similar to Tony Stark’s J.A.R.V.I.S.—to help prescribe individualized strategies for patient care? In our era of digital biomedicine, this symbiotic human–machine interaction has for the first time become a feasible reality. We have at our fingertips the depth, dimension, and scale of information to piece together a holistic picture of disease and the human condition; however, the human capacity to analyze and extract knowledge is limited, and collaborations with computers could drastically enhance the power and speed at which an integrated view of disease is synthesized.1 Beginning in 2017, 1 million Americans will participate in the Precision Medicine Initiative All of Us Research Program. Moreover, the National Institutes of Health Precision Medicine Initiative Cohort Program2–4 calls for a knowledge network that uses biomedical informatics to bridge basic biological knowledge on molecular disease drivers with higher-level phenotypic abstractions representing a patient’s clinical manifestation. This network could link evidence across methods used to characterize a disease process, hone disease classification, sharpen treatment regimens, and tailor preventative strategies for specific individuals. Both current and emerging informatics developments have shaped everyday clinical practice, including but not limited to echocardiographic pattern recognition,5 Framingham Risk Score,6 and random survival forests for predicting survival in systolic heart failure.7 In our view, these studies have begun to explore the promise of modern biomedical informatics. In the last decade, integrated omics science has provided unmatched power for unbiased characterization of patient features based on molecular events. Through pattern deconvolution, these omics data signatures will likely translate into far superior classifiers of cardiac phenotypes. Although unprecedented opportunities exist, focused efforts are required to translate the findings of big data studies into clinically actionable approaches. To this end, efforts to realize this knowledge network would benefit from the creation of shared platforms that transform how researchers and clinicians together will discover, communicate, and apply new biomedical knowledge. Thus, we are at a critical point of biomedical advancement where our community must face the challenges and opportunities involved in developing informatics systems that integrate biological and clinical knowledge of individual patients.
Such a modern paradigm could foster comprehension and exploration of disease states in a few key areas. First, efforts could break through cross-data boundaries and merge spatiotemporal scales of description. For example, which peripheral blood cell epigenetics inferences can be made from different types of coronary computed tomographic imaging, and how does variability in levels of expression impact quantitative imaging-based findings? Second, the modern paradigm could aim to combine new data sources for better contextualizing of biomarkers. For example, how strongly do multiomics markers for inflammation correlate with myocardial infarction, given an individual’s measured levels of physical activity via mHealth sensing? Third, efforts could focus on linking data sources of biological insight with clinical outcomes. For instance, what is the molecular basis of a statin’s action on metabolic pathways, and what is its efficacy in a patient with certain genetic and treatment response profiles?
iKG: A Computational System to Integrate Personalized Health Information
In 2012, Google introduced the concept of a Knowledge Graph (KG), a knowledgebase used to enhance search engine results via semantic information gathered from a variety of sources.8 Herein, we put forth a provocative idea to the cardiovascular community: creating iKGs, a system of cardiovascular knowledge networks for aggregating and depicting individualized cardiovascular health data in new and informative ways. An iKG framework (Figure) could be a 2-pronged system capable of supporting both translational research and clinical practice, each having a customized platform. The Translational Research Arm could facilitate new discoveries, whereas the Clinical Practice Arm could streamline the integration of experimental findings into clinical practice. Unanswered clinical questions could be fed back into the research arm for generating new, testable hypotheses. Networks of this type are actively being developed to bridge research and clinical practice in other areas of medicine.9 We submit that iKG-like networks could synergistically enhance translational research and clinical decision-making, reaping substantial benefits for our community. Importantly, this framework could facilitate team science, uniting experts in epidemiology, biomedical informatics, clinical cardiology, and cardiovascular biology. The iKG-containing commons could become a shared resource, coordinating with national efforts in data discovery and resource indexing, and establishing consensus among key stakeholders.
Figure. Overview of the 2-pronged integrated architecture of individualized Knowledge Graphs (iKGs).
A 2-pronged system with Clinical Practice and Translational Research Arms is shown. The Clinical Practice Arm could support clinicians in accessing and interactively viewing observations and phenotypic abstractions of a patient (eg, electronic health record [EHR], biomedical literature, and mHealth data). The Translational Research Arm could benefit researchers by providing tools for analyzing biological observations and pathways (eg, molecular, omics, laboratory, and imaging data). Three main interacting elements of iKGs, representing opportunities outlined in the text, are defined here. First, shown in blue (left) are the different data sources (eg, Data source 1–3) and types (eg, omics, imaging, and text data). Second, shown in yellow (middle) is an exemplary integrated platform addressing both portability and interoperability, and detailing how data sources could be used to construct iKGs. Last, shown in red (right) is how one may develop use cases and applications to validate iKGs, thereby supporting clinical translation.
Computational Opportunities and Challenges
How do iKG-like platforms add value to congestive heart failure (CHF) patient care? The promise lies in providing clinicians with live representations of personalized health (Online Figure I) and researchers with an informatics architecture for catalyzing discoveries. Example inquiries resolved by iKGs are as follows: What molecular observations in relevant cohorts or case reports help us predict changes in CHF patients? How will patients respond to therapies at the molecular level, according to drug-specific information? What is the biological basis of frailty, and how can it influence personalized treatments for CHF? Which CHF therapy (eg, mechanical device) results in the most benefit to individuals based on their genetic information? To answer these, one must overcome challenges and engage in the grand opportunities:
Challenge 1
Cardiovascular data are fragmented and noncommensurate in that they span different scales, data types, and sources. Implementation and conformation to a set of data standards for quality and provenance are lacking.
Opportunity 1
Identification, Characterization, and Integration of High-Value Data Sources That Provide a Multifaceted View of CHF
A high-value data set10 is supported by its accessibility, size, completeness, acquisition features, and information gain. Multiple computational approaches, including machine learning, can render cardiovascular data sets more findable, accessible, interoperable, and reusable, thereby adding value. Algorithms can be trained via supervised learning and subsequently automated via unsupervised learning. Moreover, crowdsourcing could perform data quality and annotation tasks for encoding semantic relationships, enhancing metadata quality, identifying common features, and achieving data integration.
Challenge 2
Traditional models for mapping relationships within data do not consider semantic inferences, and contextual information is often high-dimensional, sparse, incomplete, and noisy.
Opportunity 2
Contextualization of Data and Knowledge and Phenotypic Integration Over Heterogeneous Information Networks
There is a great need for the development of machine learning–based mapping algorithms to infer connections across data types and to establish different relations (eg, semantic and correlative to causal) among disparate data. Moreover, new methods for feature extraction and deep learning could unbiasedly identify novel predictive variables and unexpected patterns within the data sets. Novel approaches for mining bio-text corpora can capture and categorize the possible connections among data concepts,11 thereby enabling relationship prediction.
Challenge 3
Many existing informatics pipelines are not substantiated in cardiovascular use cases and lack proper validation, making them refractory to translation.
Opportunity 3
Validation and Iterative Improvements in Cardiovascular Informatics Platforms (eg, iKGs)
To add value, informatics platforms must be rigorously validated to assess use, impact, and sustainability. New applications that support prediction of novel disease signatures across multiple spatiotemporal observations and independent CHF populations are needed, including endotype discovery and predictive modeling, belief networks, and convolutional neural networks. These show great promise for strengthening inferences and empowering clinical decision-making. Mechanisms for systematically evaluating informatics platforms are needed, such as soliciting active feedback from scientific and clinical communities to guide processes for enhancement.
Challenge 4
Continuing education of clinicians on the guidelines for use of omics and other big data to individualize patient treatment is lacking.
Opportunity 4
Implementation of Big Data Education Programs for Physicians in Medical School Curricula and Continuing Educational Formats
This involves changes in both knowledge and mindset of clinicians. Traditional human–machine interactions in the clinic have involved a more passive utilization of computers to obtain simple read-outs that guide general practice. In contrast, iKG-like platforms would provide greater depth and diversity of information, requiring that clinicians understand the breadth of omics data sets and patterns embedded in heterogeneous data sources. Continuing education would ensure that clinicians are current on the most relevant information and discoveries; educational modules could be built directly into an iKG-like commons.
Conclusions
We have presented a vision involving a partnership between humans and machines, 2 indispensable components for precision cardiovascular medicine to become a reality. Information architectures like iKGs link together valuable data sets across repositories and offer continuums of observations across disease pathogenesis. This creates opportunities for individual research teams to contribute incrementally and accumulatively to the global cardiovascular data resource, empowering our collective ability to discern disease phenotypes. Opportunities abound for identifying high-value data and metadata, creating computational architectures, integrating heterogeneous data sources, validating methods for cardiovascular applications, and educating the scientific and medical communities to be a part of this informatics vision. Given the many well-established cardiovascular cohorts (eg, Jackson Heart Study and Multi-Ethnic Study of Atherosclerosis) constituting rich clinical and molecular data sets,12,13 our cardiovascular community is well poised to forge new discoveries and innovations in the era of precision medicine.
Acknowledgments
We thank Drs Wei Wang, William Hsu, and Sarah B. Scruggs at the University of California, Los Angeles (UCLA) for their critical input on the content.
Sources of Funding
This work was supported in part by National Institutes of Health U54 GM114833 (to P. Ping, K. Watson, and A. Bui); R37 HL63901 and R35 HL135772 (to P. Ping); R01 EB00362 (to A. Bui); and the T.C. Laubisch endowment at UCLA (to P. Ping).
Footnotes
Disclosures
None.
The opinions expressed in this article are not necessarily those of the editors or of the American Heart Association.
The online-only Data Supplement is available with this article at http://circres.ahajournals.org/lookup/suppl/doi:10.1161/CIRCRESAHA.116.310024/-/DC1.
References
- 1.Bui AA, Hsu W, Arnold C, El-Saden S, Aberle DR, Taira RK. Imaging-based observational databases for clinical problem solving: the role of informatics. J Am Med Inform Assoc. 2013;20:1053–1058. doi: 10.1136/amiajnl-2012-001340. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Institute of Medicine. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease. Washington, DC: The National Academies Press; 2011. [PubMed] [Google Scholar]
- 3.US Department of Health and Human Services. Precision Medicine Initiative Cohort Program. (March 1, 2017). https://www.Nih.Gov/precision-medicine-initiative-cohort-program.
- 4.Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med. 2015;372:793–795. doi: 10.1056/NEJMp1500523. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Narula S, Shameer K, Salem Omar AM, Dudley JT, Sengupta PP. Machine-learning algorithms to automate morphological and functional assessments in 2D echocardiography. J Am Coll Cardiol. 2016;68:2287–2295. doi: 10.1016/j.jacc.2016.08.062. [DOI] [PubMed] [Google Scholar]
- 6.Kannel WB, Doyle JT, McNamara PM, Quickenton P, Gordon T. Precursors of sudden coronary death. Factors related to the incidence of sudden death. Circulation. 1975;51:606–613. [DOI] [PubMed] [Google Scholar]
- 7.Hsich E, Gorodeski EZ, Blackstone EH, Ishwaran H, Lauer MS. Identifying important risk factors for survival in patient with systolic heart failure using random survival forests. Circ Cardiovasc Qual Outcomes. 2011;4:39–45. doi: 10.1161/CIRCOUTCOMES.110.939371. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.The Google Inc. The Knowledge Graph (March 1, 2017). https://www.google.com/intl/es419/insidesearch/features/search/knowledge.html.
- 9.Evans JP, Wilhelmsen KC, Berg J, Schmitt CP, Krishnamurthy A, Fecho K, Ahalt SC. A new framework and prototype solution for clinical decision support and research in genomics and other data-intensive fields of medicine. EGEMS. 2016;4:1198. doi: 10.13063/2327-9214.1198. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Musen MA, Bean CA, Cheung KH, Dumontier M, Durante KA, Gevaert O, Gonzalez-Beltran A, Khatri P, Kleinstein SH, O’Connor MJ, Pouliot Y, Rocca-Serra P, Sansone SA, Wiser JA; CEDAR Team. The center for expanded data annotation and retrieval. J Am Med Inform Assoc. 2015;22:1148–1152. doi: 10.1093/jamia/ocv048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Sun Y, Han J, Yan X, Yu PS, Wu T. Pathsim: meta path-based top-k similarity search in heterogeneous information networks. Proceedings of the VLDB Endowment. 2011;4:992–1003. [Google Scholar]
- 12.Bidulescu A, Liu J, Hickson DA, Hairston KG, Fox ER, Arnett DK, Sumner AE, Taylor HA, Gibbons GH. Gender differences in the association of visceral and subcutaneous adiposity with adiponectin in African Americans: the Jackson Heart Study. BMC Cardiovasc Disord. 2013;13:9. doi: 10.1186/1471-2261-13-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Kaufman JD, Adar SD, Barr RG, et al. Association between air pollution and coronary artery calcification within six metropolitan areas in the USA (the Multi-Ethnic Study of Atherosclerosis and Air Pollution): a longitudinal cohort study. Lancet. 2016;388:696–704. doi: 10.1016/S0140-6736(16)00378-0. [DOI] [PMC free article] [PubMed] [Google Scholar]