We are clearly living in the age of big data—and it is only going to get bigger. Health care systems around the world are becoming increasingly reliant on electronic health records (EHRs) and associated information technology to function on a day-to-day basis. Although the use of EHRs has some associated downsides, such as copied forward notes with potentially inaccurate or old data (1), EHRs also house enormous volumes of physiologic data, which, to date, is vastly underutilized. The simultaneous expansion of information technology and machine learning capabilities provides an extraordinary opportunity to leverage these data to improve patient care.
For many decades, clinical prediction rules have been successfully used by physicians to identify patients at higher risk of adverse outcomes (e.g., Framingham risk score and pooled cohort equations for cardiovascular [CV] disease). Hemodynamic instability (or intradialytic hypotension [IDH]) during hemodialysis (HD) is a common problem experienced by patients with ESKD. Given the frequency of maintenance HD treatments, the repeated measurements of BP, and the association with adverse outcomes (2,3), this represents an obvious area for the development of prediction models. The holy grail of such endeavors would be to develop algorithms for use in real time, with the goal of instituting measures to safely prevent IDH from happening in the first place. However, prediction of IDH may not be as simple as that achieved for CV risk scores, where a set of baseline clinical variables has been translated into a predicted probability of a future CV event. This is partly because HD is a dynamic process, where physiologic variables (such as heart rate, BP, temperature, and patient symptoms) and provider modifiable variables (such as ultrafiltration rate, dialysis dose, flow rates, dialysate concentration, and medications) are continuously changing and interacting over the course of several hours. As such, for any individual patient, the risk of IDH may not be the same at any given time point during the same HD treatment, while the risk of IDH may not be the same at similar time points in two different treatments. However, the promise of new analytic approaches, such as deep learning, may bring us a step closer to the holy grail of real-time prediction.
Machine learning is a branch of artificial intelligence where applications automatically learn from data and improve from experience, without being explicitly programmed to do so. Broadly, machine learning algorithms can be categorized into supervised, semisupervised, or unsupervised, according to the degree of human supervision required during the training phase. In supervised learning, outcomes of interest are “labeled” in the training dataset, allowing the algorithm to train using this prepopulated knowledge. In neural network analyses, a subset of supervised learning, the weights of the input data are iteratively adjusted according to how much different the algorithm-derived output is from a desired output for any given task. Sequentially connected layers of neurons form an artificial neural network and consist of an input layer, at least one middle layer, and an output layer. The nomenclature of “deep” learning models refers to having more than one middle (hidden) layer. These approaches are well suited for time series data and are, therefore, commonly used in the analysis of physiologic data, such as BP changes. Metrics of algorithm performance are most easily assessed for binary outcomes (e.g., IDH) and can be analyzed in 2×2 tables (known as confusion matrices) using concepts such as sensitivity, specificity, and receiver operating characteristic curves. There are also more complex metrics of algorithm performance (e.g., precision-recall curves, F1 scores, etc.), which facilitate more advanced understanding of model performance.
In this issue of CJASN, Lee et al. (4) use a deep learning model to predict the development of IDH using 1.6 million data points in over 260,000 individual HD sessions from over 9000 adult patients. The evaluation of any research study (including machine learning approaches) should begin with an assessment of the research question, which in this case, is supported by a clear rationale to develop better models to predict IDH. Next, one should pay close attention to the participants, predictors, outcomes, and analytic approach. In this case, the participants were all from a single hospital in Korea and are quite different compared with patients receiving maintenance HD in the United States and Europe (e.g., lighter, lower blood flow rates, and lower BP in the study Lee et al. [4]), which has important implications for the generalizability of the findings. Furthermore, around 7.5% of sessions were hemodiafiltration or other non-HD modalities, and both incident and prevalent patients were included. Assessment of predictors and outcomes is more complex in neural network approaches, as the training set requires labeling of outcomes of interest to begin with. In this respect, an important feature of the data was that the BP measurements were not protocolized, such that abnormal BPs tended to be checked more frequently. This clustering of data around an adverse event of interest could, itself, guide an algorithm’s predictive capability. For these analyses, binary definitions of IDH were chosen: although the primary outcome of a nadir systolic BP <90 mm Hg has been shown to be associated with mortality (2), the secondary outcome of systolic BP decline ≥20 mm Hg has typically been used in association with patient symptoms and is less robust in terms of adverse associations (2,5). Hence, there is a certain element of subjectivity in this important step, which has important consequences for the algorithm development and the final prediction model characteristics. In terms of the analytic approach, within the parent dataset, randomizing at the patient level, the authors constructed training, calibration, and test datasets, which is an important strength of the approach. Furthermore, the recurrent neural network approach was compared with other models (multilayer perceptron, Light Gradient Boosting Machine, and logistic regression), although the comparisons were not ideal, as the other approaches could not handle the quantity of temporal data appropriately.
Notwithstanding these caveats, the area under the receiver operating characteristic curve for the primary definition of IDH (nadir systolic BP <90 mm Hg) of 0.94 was remarkably high for the recurrent neural network analysis. Although there were minimal absolute differences in the area under the receiver operating characteristic curve, the recurrent neural network analysis statistically outperformed other approaches on a consistent basis and in multiple sensitivity analyses. Interestingly, in the feature set-ablation analyses, the removal of BP and other vital sign data led to the largest reduction in model performance, highlighting the obvious importance of BP data to predict a BP outcome. Overall, these results are intriguing and highlight the potential of such approaches in the HD research and clinical environment.
However, what limitations and challenges are associated with the acceptance and implementation of machine learning approaches into routine clinical practice? First, there are very real logistic challenges, including algorithm refinement for new populations and determination of optimal thresholds (trade-off between positive and negative errors), the requirement of digital ascertainment of data input, data privacy issues, information technology infrastructure and support, and buy-in from (and sharing across) large dialysis organizations and health care systems. Machine learning models may inadvertently reproduce discriminatory biases, which require significant effort in identification and implementation of corrective measures, or may miss important contextual information that is not recorded accurately in EHRs. Buy-in from clinicians and patients has also been identified as a major concern with these analyses, as challenges in understanding what happens inside the “black box” of machine learning may breed distrust for the answers it provides (6). Indeed, the importance of understanding and being able to interpret the output from machine learning algorithms has been highlighted in European data legislation, where individuals are entitled to a right to explanation for clinical decision making—this requirement could be rather arduous to meet with deep learning approaches (7). Concerns about over-reliance on technology and deskilling of physicians have been raised but are often countered by arguments that machine learning will augment the ability of physicians to assimilate ever-increasing amounts of data and to provide the most up-to-date and appropriate therapy (6,8).
Armed with the rather striking findings of excellent model performance, the authors highlight that “continuous and real-time prediction of IDH during hemodialysis may be achievable” (4). However, further refinement is clearly needed to narrow the prediction window to a shorter time frame and would likely require development of accurate, noninvasive, and continuous measurements of BP. Furthermore, as the authors admit, just being able to predict IDH accurately and in real time is an excellent first step but is not enough on its own. Rather, the overall goal is for prospective testing of interventions on the basis of such warning systems and proof that they reduce adverse outcomes in a safe manner.
Disclosures
All authors report employment with Brigham and Women’s Hospital. Support was provided by US Department of Health and Human Services, National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases grants U01DK096189 and R03DK122240, outside the submitted work.
Funding
None.
Acknowledgments
The content of this article reflects the personal experience and views of the author(s) and should not be considered medical advice or recommendation. The content does not reflect the views or opinions of the American Society of Nephrology (ASN) or CJASN. Responsibility for the information and views expressed herein lies entirely with the author(s).
Footnotes
Published online ahead of print. Publication date available at www.cjasn.org.
See related article, “Deep Learning Model for Real-Time Prediction of Intradialytic Hypotension,” on pages 396–406.
References
- 1.Wang MD, Khanna R, Najafi N: Characterizing the source of text in electronic health record progress notes. JAMA Intern Med 177: 1212–1213, 2017 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Flythe JE, Xue H, Lynch KE, Curhan GC, Brunelli SM: Association of mortality risk with various definitions of intradialytic hypotension. J Am Soc Nephrol 26: 724–734, 2015 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Reeves PB, Mc Causland FR: Mechanisms, clinical implications, and treatment of intradialytic hypotension. Clin J Am Soc Nephrol 13: 1297–1303, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Lee H, Yun D, Yoo J, Yoo K, Kim YC, Kim DK, Oh K-H, Joo KW, Kim YS, Kwak N, Han SS: Deep learning model for real-time prediction of intradialytic hypotension. Clin J Am Soc Nephrol 16: 396–406, 2021 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.K/DOQI Workgroup: K/DOQI clinical practice guidelines for cardiovascular disease in dialysis patients. Am J Kidney Dis 45: S1–S153, 2005 [PubMed] [Google Scholar]
- 6.Cabitza F, Rasoini R, Gensini GF: Unintended consequences of machine learning in medicine. JAMA 318: 517–518, 2017 [DOI] [PubMed] [Google Scholar]
- 7.Goodman B, Flaxman S: European Union regulations on algorithmic decision-making and a “right to explanation.” AI Mag 38: 50–57, 2017 [Google Scholar]
- 8.Fogel AL, Kvedar JC: Artificial intelligence powers digital medicine. NPJ Digit Med 1: 5, 2018 [DOI] [PMC free article] [PubMed] [Google Scholar]