Skip to main content
Physiological Reviews logoLink to Physiological Reviews
. 2023 Apr 27;103(4):2423–2450. doi: 10.1152/physrev.00033.2022

Leveraging physiology and artificial intelligence to deliver advancements in health care

Angela Zhang 1,2,3,, Zhenqin Wu 4, Eric Wu 5, Matthew Wu 3, Michael P Snyder 2, James Zou 6,7, Joseph C Wu 1,3,8,9,
PMCID: PMC10390055  PMID: 37104717

graphic file with name prv-00033-2022r01.jpg

Keywords: artificial intelligence, health care, medicine, physiology

Abstract

Artificial intelligence in health care has experienced remarkable innovation and progress in the last decade. Significant advancements can be attributed to the utilization of artificial intelligence to transform physiology data to advance health care. In this review, we explore how past work has shaped the field and defined future challenges and directions. In particular, we focus on three areas of development. First, we give an overview of artificial intelligence, with special attention to the most relevant artificial intelligence models. We then detail how physiology data have been harnessed by artificial intelligence to advance the main areas of health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. Finally, we discuss emerging concerns surrounding the use of individual physiology data and detail an increasingly important consideration for the field, namely the challenges of deploying artificial intelligence models to achieve meaningful clinical impact.


CLINICAL HIGHLIGHTS.

  • 1)

    We discuss the historical milestones in artificial intelligence (AI) and physiology that have led to the recent resurgence of AI and its innovations in health care. We define AI, machine learning, and deep learning and delve into the origins of deep learning. We highlight how the synergy of innovations of neural networks, convolutions, backpropagation, GPUs, and large training datasets catalyzed the deep learning revolution of the early 2010s to make recent progress possible.

  • 2)

    We give an overview of the procedure of training neural networks. Training of current neural networks now includes forward propagation, backpropagation, and optimization functions to minimize loss.

  • 3)

    We discuss the current classes of deep learning neural networks. The fundamental goal of deep learning models is to convert the input data into a faithful vector representation. Earliest classes of deep learning neural networks can be defined by the class of data they were designed to handle: feedforward networks for tabular data; convolutional neural networks for images; and recurrent neural networks for text. The development of attention models and transformers has revolutionized natural language processing to make ChatGPT possible and is now also being utilized to handle imaging datasets. Additionally, graph neural networks may be the most adept model to date at modeling the networks of biological processes.

  • 4)

    Physiology has been leveraged to create datasets, correlations, and insights into underlying mechanisms of AI in health care applications. This has led to advancements in automating existing health care tasks, increasing access to health care, and augmenting existing capabilities.

  • 5)

    The defining challenge of AI in health care has shifted from model development to model deployment. Implementation hurdles such as demonstrating robustness, generalizability, and interpretability of models will become increasingly important. As AI health care platforms begin to mature and demonstrate potential real clinical impact, how these platforms will be approved and regulated is now a leading problem. Finally, we discuss how inclusion of metrics beyond performance could be a solution to developing tools for enhanced integration into clinical workflows.

1. INTRODUCTION

The last decade has seen unprecedented applications of artificial intelligence (AI) to health care (14). Advancements and innovations in AI in the past several years have been critical to this success (FIGURE 1). There is now an unprecedented ability to represent and model complex data, resulting in advancements in computer vision (5), natural language processing (6), and robotics (7). This has created the opportunity to automate tasks and augment the ability to learn from increasingly large datasets. Computer vision models have been applied to classify and segment medical imaging from nearly every organ system, including retinal optical coherence tomography (OCT) scans (8), brain computed tomography (CT), magnetic resonance imaging (MRI) (9), and chest X-rays (10). Breakthroughs in natural language processing have allowed for the translation of patient interactions to clinical text, interpretation and summarization of electronic medical records, and captioning of medical images (1113). Emerging advancements in the field of reinforcement learning have allowed the prospective identification of health problems and semiautonomous surgical robots (14), whereas graph neural networks provide the capability to model non-Euclidean data.

Figure 1.

Figure 1.

Timeline of innovations in physiology and artificial intelligence (AI) for health care. Innovations in AI, with special focus on deep learning, are shown on left, and advancements in physiology, particularly the creation of large physiology datasets, are shown on right. 2012 marked the reinvigoration of the current wave of deep learning. AlexNet successfully leveraged and harmonized a number of prior developments in deep learning [neural networks (1943, 1957), convolutional neural networks (1980, 2008), backpropagation (1986, 2008), training neural nets on graphics processing units (GPUs) (2008), and the importance of large training datasets (2009)] to demonstrate for the first time that deep neural networks [convolutional neural networks (CNNs)] trained via backpropagation on GPUs could realistically be implemented and deliver large increases in performance. Since 2012, the field of deep learning has encountered rapid expansion and innovation. AlexNet and subsequent work has led to consistent, stable, and mature high-performing models in computer vision; the development of generative adversarial networks (GANs) (2014) revolutionized generative modeling, and the developments of bidirectional encoder representations from transformers (BERT) and generative pretrained transformer (GPT) have been breakthroughs in natural language processing. Simultaneously, a number of international, national, and institutional initiatives have led to the detailed creation of high-resolution and/or high-frequency physiology datasets (1995, 2000, 2001, 2006, 2009, 2015, 2017, 2018). Whereas innovations in AI have been the catalyst to current innovations of AI in health care, large physiology datasets represent the foundation. Work done by Gulshan et al. and Esteva et al. marked one of the first robust applications of AI to health care. Image created with BioRender.com, with permission.

Equally important have been the datasets used to train the models, namely rich, clinically relevant physiology datasets. Concurrent to advancements in AI, a wealth of large, rich physiological datasets have been generated (15). Advancements in technologies such as genome sequencing, medical imaging, and personalized smart devices have allowed unrivaled characterization of human physiology. Their decreasing costs and widespread use led to boons in the size and resolution of data (16). Simultaneously, initiatives in precision medicine promoted detailed collection of an individual’s physiology to deliver datasets that were varied and spanned the spectrum of human health (1719). Advancements in data storage and digitization of health records resulted in large databases and a way to interact with clinically relevant physiology data.

Through leveraging large physiology datasets, AI models analyze, interpret, and extract physiology patterns that can be mapped to clinically meaningful impacts to tackle and mitigate health care’s most pressing problems: rising costs, increasing shortage of physicians, unequal access to care, and inefficiencies and errors that harm patient outcomes. In this review we aim to explore the advancements in AI and physiology data that have made the last decade of AI in health care possible. We begin with an overview of AI and discuss emerging and popular models (for terminology, see TABLE 1). We then discuss how AI has harnessed physiology data to deliver three primary impacts in health care: automating existing health care tasks, increasing access to care, and augmenting health care capabilities. We end the review with an overview of emerging trends and future directions. As AI health care platforms transition into clinical use, attention has shifted from creation of AI platforms to implementation of AI platforms, making deployment the next defining challenge. Matters such as generalizability, interpretability, and meaningful clinical impact are now driving considerations (20). Additionally, concerns such as maintaining privacy of an individual’s physiology data used to develop health care AI models have emerged.

Table 1.

Glossary of artificial intelligence terminology

Term Definition
Artificial intelligence A field dedicated to building computational entities that mimic human intelligence and capabilities, namely natural language processing (communication), knowledge representation (understanding), automated reasoning (thinking), machine learning (learning), computer vision (sight), and robotics (movement)
Machine learning A subfield of artificial intelligence that focuses on learning patterns within a dataset
Deep learning An approach to machine learning that involves learning the features that best represent the data within a dataset
Supervised learning A class of machine learning that trains on labeled datasets in order to map an input (data) to an output (label)
Unsupervised learning A class of machine learning that trains on unlabeled datasets in order to learn inherent patterns and structure within a dataset
Reinforcement learning Reinforcement learning involves learning to interact with an environment in order to optimize a goal
Features Characteristics of a dataset that the model utilizes to find patterns within a dataset
Perceptron The fundamental unit of a deep learning model. It takes an input signal, aggregates and processes the inputs, passes the result through an activation function, and outputs and disseminates an output signal.
Neural net Multilayer perceptrons
Weights A value that is learned through training. A weight effectively determines how much each feature affects the prediction.
Biases A constant that is added to the product of weights and features
Forward propagation Deep learning training involves forward propagation and backpropagation. In forward propagation inputs are propagated forward throughout the network. This involves complexing the initial weights and biases and thresholding via the activation function.
Gradient descent Gradient descent involves computing the local gradient/slope (which direction a function is increasing or decreasing most rapidly) of the loss function via backpropagation to determine in which direction a step should be taken to move closer to the minimum.
Saliency maps A method used to interpret how a model makes predictions. It highlights the features or areas of an image that are used to make model predictions.
Hyperparameters Parameters that are determined by the individual training the model that control how the model is trained

2. ARTIFICIAL INTELLIGENCE

2.1. Machine Learning

AI aims to build computational entities that mimic human intelligence and capabilities, namely natural language processing (communication), knowledge representation (understanding), automated reasoning (thinking), machine learning (learning), computer vision (sight), and robotics (movement) (21) (FIGURE 2). Machine learning, a subfield of AI, has garnered attention in the past decade owing to achievements in the field that have allowed it to nearly parallel human abilities in learning and reasoning. At its core, machine learning can be defined as identifying patterns and structure in a dataset. Thus, a machine learning problem is defined by 1) the data and 2) the model used to learn from the data.

FIGURE 2.

FIGURE 2.

Overview of artificial intelligence. An overview of artificial intelligence, machine learning, deep learning, and their relationship to one another. Machine learning is a subfield of artificial intelligence and is composed of 3 predominant classes. Furthermore, deep learning is a subfield of and approach to machine learning. Image created with BioRender.com, with permission.

A dataset is a representative sampling of the domain space. Datasets can be unlabeled or labeled, e.g., a pathology sample that is labeled “malignant” or “benign.” In health care, the primary sources of data are rooted in capturing the physiology of an individual and include 1) medical images, 2) text or electronic health records (EHRs), and 3) genomic sequences (22).

There are three predominant classes of models in machine learning: 1) Supervised learning learns from labeled datasets to identify patterns that map the input data to the output label. For example, supervised learning can be used to identify patterns that would map a pathology slide to its label, benign or malignant. There are two primary forms of supervised learning. Regression maps input features to an output that is numerical and continuous (i.e., predicting oxygen saturation levels), whereas classification maps input features to outputs that are discrete and in categories (i.e., classifying an EKG as tachycardic or not) (23). 2) Unsupervised learning, which deals with unlabeled data, aims to find inherent structure within the data, such as subclusters, outliers, or low-dimensional representations (3). As an example, dimensionality reduction and clustering are used to learn the structure within single-cell transcriptomic datasets to identify clusters of cell populations (24).

Both supervised and unsupervised learning deal with fixed data in a static, nonchanging environment. However, many tasks in the world and particularly in health care are dynamic and interactive. 3) The third class of machine learning, reinforcement learning, involves learning to interact with an environment to optimize a goal (25, 26). Reinforcement learning is adapted from reinforcement in psychology, where correct actions taken in an environment result in a reward, leading to learning of the actions that maximize the reward. Reinforcement learning is the basis of training AI platforms to beat opponents in games like AlphaGo and in health care to learn treatment strategies that optimize patient outcomes (4, 27).

2.2. Deep Learning

The resurgence of AI in the past decade can be attributed to the successful implementation of deep learning, a subfield of, and approach to, machine learning (28) (FIGURE 2). Before the widespread use of deep learning, machine learning required domain expertise and hand-engineering to select the features that the models learned from. Features are characteristics used to define the dataset and are the inputs that machine learning models use to find patterns. For example, features of a pathology slide, such as cell size or number of cell nuclei, can be used by the model to determine whether there is malignancy. Creating a proper featurization is a core problem of machine learning, as features dictate the patterns that can be learned. However, it can be difficult to achieve high performance by hand-selecting features because determining which features should be selected and how to define them can be challenging and require extensive domain expertise. Furthermore, traditional machine learning models must rely on previously defined features and cannot utilize raw data such as an image of a pathology sample.

Deep learning provides a solution by optimizing feature selection through learning not just a mapping of the features to a pattern but also which features best represent the data (28, 29). This is achieved by passing raw data through the layers of a neural network, where each layer receives the outputs of the previous layer and learns a representation at each layer. As the raw data progress from initial layers to deeper layers of the network, representations are extracted and used to form more complex representations at each step. As an image of an eye passes through a deep learning network, the raw pixels of the image are fed into the first few layers, which extract edges that are then combined to form corners and contours in subsequent layers and are further combined to form an eye in the last few layers. Deep learning models can leverage larger datasets than previously seen before, which can facilitate higher performance that can be more rapidly and widely implemented, no longer needing heavy human involvement or domain expertise for feature engineering.

2.2.1. Deep learning foundations.

Historically, deep learning had been built around optimizing performance on supervised learning tasks: training a model to learn how to map an input to an expected output (28). To do so, deep learning models have taken inspiration from neuroscience (4, 28). The fundamental unit of a deep learning model is a perceptron. The perceptron, one of the first iterations of artificial neural networks, dates back to the 1950s and aims to recapitulate the mechanics of a depolarizing neuron (30). Similar to a neuron, the perceptron receives input signals, aggregates and processes the inputs, passes the result through an activation function, and outputs and disseminates an output signal. Importantly, the perceptron was the first neural net that introduced learnable weights and biases to functions that were used to aggregate and process inputs. Weights and biases were learned to more accurately map inputs to the desired corresponding outputs. Learning and updating weights and biases is a fundamental goal when training neural network models (29).

The process of training a neural network is analogous to how individuals typically learn: given a new task, an individual makes an attempt, compares the attempt to the anticipated output, and analyzes the differences to obtain feedback on how to better execute the task in future iterations. Deep learning formalizes and gives numerical structure to each component of this process (29) (FIGURE 3). Broadly, this process begins with forward propagation, in which inputs are propagated forward throughout the network (complexed with the current initial, often random, weights and biases and thresholded via the activation function) to deliver an output. Next, the difference between the predicted model output and the ground-truth expected output, also known as the loss, is computed by the loss function, a function that takes the predicted values and expected values (label) and outputs a level of discrepancy (31). The goal of training neural networks is to determine the weights that minimize the loss. To minimize loss and find the minimum of the loss function, optimization techniques are used, of which gradient descent has been the most popular. Gradient descent involves computing the local gradient/slope (which direction a function is increasing or decreasing most rapidly) of the loss function via backpropagation to determine in which direction a step should be taken to move closer to the minimum. This information is then used to update the weights. For a more rigorous explanation of model training, we refer the reader to Goodfellow et al. (29). With the weights updated, the model prediction is expected to deviate less from the ground truth. This process can be performed repeatedly, iteratively updating the weights and biases until the difference between the model predictions and desired outputs are minimized, resulting in a model that is able to make predictions with high performance and one that is faithful to the desired output.

FIGURE 3.

FIGURE 3.

Anatomy of a deep neural network. A: the fundamental unit of a deep neural network is a perceptron. A perceptron aims to model the dynamics of a neuron while also delivering the capability to learn weights to deliver binary classification. Similar to a neuron, the perceptron takes inputs from multiple sources. The inputs are multiplied by weights (W1W3), summed (Σ), and then added to a learned constant (b). The result is passed through an activation function that determines whether the signal meets the threshold to be propagated. If the threshold is met, the signal is propagated. If it is not met, the signal is not propagated. This is analogous to binary classification, where signals that are propagated are classified as “1” and signals that are not propagated are classified as “0.” Stacked perceptrons make up a neural network layer. B: neural network layers make up a neural network. There are 3 main types of neural network layers: 1) input layer, where the input is fed into the network; 2) hidden layers, where weights and biases are learned; and 3) output layer, where the internal representation is mapped to the model prediction. Image created with BioRender.com, with permission.

2.2.2. Deep learning models.

Deep learning neural networks are stacked multilayer perceptrons and are composed of three main types of layers: 1) input layer (which feeds the input into the model), 2) hidden layers (which learn to featurize the input), and 3) output classification or regression layer (which maps learned features and patterns to a prediction/output). Throughout the past decade, many design decisions have been engineered into neural networks to improve general performance (e.g., batch normalization, drop out, skip connections) and/or improve performance on specialized data (e.g., convolutions for spatial datasets and images, recurrent units for sequential datasets, attention). There are now many classes of deep learning models that are variations of the fundamental feedforward model architecture. The models can be defined by the unique modules they possess to handle specific classes of data (e.g., images, text, graphs) (FIGURE 4). Here, we survey a few of the most popular classes of deep learning models that have been used to leverage physiology data for health care applications.

FIGURE 4.

FIGURE 4.

There are numerous classes of deep learning models. The different classes can be defined by the data type that the model has been designed to handle. Feed-forward networks are bread-and-butter neural networks and best handle tabular data. Recurrent neural networks possess a recurrent node, which allows the model to exhibit memory for previously seen information. This renders recurrent neural networks adept at handling sequential data. Transformers are a class of models that are composed of attention mechanisms. This allows transformers to exhibit longer range memory and greater efficiency and parallelization, which allows these models to handle longer sequences and larger data sets. Convolutional neural networks are defined by repeating convolutional layers, which are adept at analyzing spatially arranged patterns commonly seen in images. Autoencoders are composed of encoders and decoders, and they are adept at featurizing the input and recreating the input to learn underlying patterns and structure within the data. Generative Adversarial Networks are composed of a pair of adversarial models, where one model endeavors to generate synthetic data and the other endeavors to detect synthetic data. Image created with BioRender.com, with permission.

2.2.2.1. feedforward neural networks.

Feedforward networks are the foundational deep learning neural networks. They are fully connected models with fully connected layers and are deemed feedforward because information flows forward from the input to the hidden layers and to the output. They are used for tabular data or data that do not have specific temporal or spatial structure.

2.2.2.2. recurrent neural networks and transformers.

Recurrent neural networks (RNNs) are defined by their recurrent module (32, 33), which allows input to flow in cycles, allowing the network to exhibit memory for previously seen information. As a result, RNNs are particularly adept at handling datasets with sequential information, such as language, genomic sequences, or clinical time series data (34, 35). Despite their successes, RNNs can be difficult to train, particularly on larger datasets, and can exhibit loss of memory when handling longer sequences.

Transformers were introduced by a team at Google Brain in 2017 to address the limitations of RNNs (36). Transformers, in contrast to RNNs, are able to capture longer-range dependencies and train on larger datasets. This is primarily owing to the decision not to use recurrent modules for handling sequential data and instead utilize attention mechanisms. Attention mechanisms featurize sequences through systematically relating tokenized pairs of a sequence. Importantly, attention mechanisms perform these operations in parallel, which allows for increased computing efficiency and featurization of longer relations of the input sequence (1). As a result, transformers allow for more parallelization than RNNs and therefore reduced training time. Using transformers, researchers have been able to train increasingly large datasets, leading to state-of-the-art performance on natural language processing tasks (6, 37). Improvements in natural language processing enable the development of text-oriented tasks in health care, such as clinical text summarization or clinical question and answering databases (11, 38). Outside of language, the attention module has been instrumental in a number of tasks, including the development of AlphaFold2.0, the deep learning model capable of predicting protein structure from amino acid sequences (39). For a more in depth review of transformers, we refer the reader to Zhang et al. (1).

2.2.2.3. convolutional neural networks.

Convolutional neural networks (CNNs) are defined by convolutional layers that make CNNs adept at handling spatially related data, such as images (40). CNN models are adept at three primary computer vision tasks: image classification, image segmentation, and object detection (4). CNNs were critical in defining much of the early and current deep learning landscape. In 2012, CNNs were utilized to achieve state-of-the-art performance at the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a competition that challenges researchers to develop models capable of classifying millions of images (40). This feat returned attention to neural networks, as it demonstrated that neural nets could deliver leaps in performance and be practically and feasibly trained, which addressed a long-standing criticism (4143).

As work on CNNs continued, CNNs contributed to the conceptualization and dissemination of transfer learning, a staple method in current deep learning training. Transfer learning involves pretraining deep neural networks on large datasets that are frequently difficult to obtain or create, transferring what has been learned from the previous layers, and retraining the final layers on a domain-specific dataset (3). Transfer learning takes advantage of the observation that earlier layers of the CNN have learned fundamental visual structures, such as curves, lines, and shapes, whereas the latter layers learn domain-specific features, such as ground-glass opacities in chest X-rays. The development of transfer learning is one of the primary reasons that CNNs and deep learning have been able to propagate rapidly through health care applications. It reduces the barrier of entry from a large multimillion size image dataset for training a deep model from scratch to a dataset in the thousands for retraining a few final layers of the model. Applications of transfer learning have led to models that can screen for diabetic retinopathy (8) and classify pathology slides at 1/1,000th of the time it takes a pathologist to do so (44). Whereas application of CNNs to analyze images is now a mature field, the use of CNNs to analyze videos, which adds an additional dimension of time, has presented itself as a new frontier (45, 46), with EchnoNet-Dynamic being the largest publicly available medical video dataset (47).

2.2.2.4. generative modeling.

In machine learning, models can be classified as discriminative or generative. Both classes of models learn the characteristics and statistics of datasets. Discriminative models utilize the learned patterns to distinguish between types of data, for example, chest X-rays that would be considered diseased versus healthy. Generative modeling, instead, uses learned characteristics and statistics of the dataset to generate additional synthetic data. There are a number of models that exist for generative learning, including variational autoencoders (48), normalizing flows (49), and autoregressive models (50), but among the most popular in health care applications have been generative adversarial networks (GANs) (1, 51). GANs aim to learn the implicit distribution of a dataset, using two adversarial models. GANs are composed of 1) a generator that generates synthetic samples and 2) a discriminator that classifies the sample as fake/generated by the generator or real/not generated by the generator (52). Intertwining the two models pushes them to work “adversarially” against each other. The generator attempts to generate more real images with the hope of fooling the discriminator, while the discriminator works to be able to distinguish real from fake, pushing the generator to create more realistic images. The result is a model that has learned the implicit distribution.

In health care GANs can serve many functions. GANs can create additional realistic data to augment a dataset (53). This overcomes limitations commonly seen in health care datasets such as small datasets or datasets limited by patient privacy (1, 54). Furthermore, GANs can be used to explore the full distribution of a clinical state (55). GANs have also facilitated the annotation of medical images with text- and image-to-image translation, translating MRI images to CT images (56).

2.2.2.5. graph neural networks.

The aforementioned models are designed to handle data that internally relate to each other via grids (such as images) or sequences (text). However, a vast majority of data exist not in grids or sequences but in an undefined manner such as those seen in networks or graphs. Graph neural networks are a class of neural networks that have been developed to handle data that exist as graphs (57, 58). They have been used to perform contract tracing for COVID-19, identify individuals who were exposed, and recommend indications for potential reopenings (59).

2.2.3. Reinforcement learning.

Reinforcement learning involves learning a series of decisions within a specific environment that optimize an outcome. The capability to create complex and nonlinear featurization of raw data in recent years has led to advancements in the field of reinforcement learning (60). Deep reinforcement learning has been used to succeed at tasks that require complex dynamic decision making, such as learning to defeat chess masters or win at 98 different video games (60). In health care, it is particularly well suited to deal with scenarios with inherent time delays, where decisions are performed without immediate knowledge of effectiveness but evaluated by a long-term future reward (61). It has been utilized to select treatment regimens for chronic diseases, plan clinical decision strategies in cases like sepsis, and perform automated robotic surgery.

2.2.4. Deep learning model limitations.

Although deep learning models have delivered significant advancements, they are not without limitations. Deep learning models often require hyperparameter tuning of model parameters (such as learning rate, number of epochs, etc.), which can be difficult to optimize because of difficulty in model convergence for models as complex as those seen in deep learning. The problem of hyperparameter tuning is further amplified by the large computing resources needed to train models. The computing resources needed are a reflection of the increasingly growing size of models and large dataset demand. Similarly, deep learning models are often known for being “black box” models because of difficulty in interpretation. One of the greatest strengths of deep learning models, learned features, also contributes to one of its greatest limitations. Learned features are typically not linearly correlated with model outputs and can be difficult to correlate with physiological mechanisms. Although methods are being developed to increase the interpretability of models, this remains one of the greatest hurdles to translation. Another significant limitation is the tendency for deep learning models to overfit to the training dataset and therefore perform poorly on datasets that differ slightly in data collection, demographics, or distributions.

Traditional machine learning (ML) models, such as logistic regression, support vector machine, and random forest classifiers, are not hampered by these limitations as severely. This is primarily because of the reduced complexity of traditional ML models (i.e., fewer parameters) and hand feature engineering. Traditional machine learning models are frequently interpretable: the relationships between input features and model outputs are explicit and correlated. Additionally, less hyperparameter tuning is required because of the reduced number of parameters and ease of convergence with less complex models. Additionally, because ML models are less complex and include more regularization, in some situations they can exhibit less overfitting compared with deep learning models. Thus, careful consideration of whether traditional machine learning models (logistic regression, support vector machine, random forest classifiers, etc.) are more suitable for the physiology problem at hand is frequently warranted.

3. AI IN HEALTH CARE: HARNESSING PHYSIOLOGY FOR HEALTH CARE

Health care is ripe with large datasets that characterize the physiology of a patient, as evidenced by the hundreds of datasets that now populate PhysioNet (15), a repository of open source physiology datasets. Here, we define physiology as the characterization of the normal functions of human organ systems and cells. Increasingly, physiology datasets characterize not only normal function of human organ systems and cells but also pathophysiology or when organ systems and cells become aberrant. In the past several decades, we have seen a revolution in the ability to collect physiology data at a patient level. This can be attributed to smart devices such as smartphones, watches, and glasses that are able to capture high-frequency, longitudinal readouts of an individual and new medical technology that is able to resolve an individual’s physiology at a higher resolution (FIGURE 5). The physiology data that are collected are inherently meaningful: they reflect clinical status and clinical outcomes and can be correlated with underlying clinical states (6264). AI and machine learning have been leveraged in recent years to unearth patterns from these physiology datasets, identifying the features that distinguish between normal and pathological physiology, and make correlations that map physiology data to their clinical interpretations. In doing so, our understanding of both normal physiology and physiology of disease has deepened. For example, using deep learning models, a team at Google Brain used OCT scans characterizing the fundus of an eye to predict underlying cardiovascular health (65). The capability to harness physiology data with AI has manifested in three main forms in health care: 1) automating existing health care tasks, where the interpretation of physiology data is automated; 2) increasing access to care, where easy-to-collect physiology data are mapped to underlying clinical states that would typically require more intensive equipment to determine; and 3) augmenting existing health care capabilities, where raw physiology data are aggregated to intuit more complex, unknown, underlying disease mechanisms and clinical outcomes. Here, we discuss how physiology data have been leveraged by AI in its applications to improve patient care (FIGURE 6).

FIGURE 5.

FIGURE 5.

Data in physiology. In the last three decades, there has been an increase in the size and resolution of physiology datasets. Owing to international, national, and institutional initiatives, there are now high-resolution and high-frequency datasets that describe the physiology of many organ systems within the body. Furthermore, the rise of smart devices in the last decade has seen an increase in individual physiology data. Image created with BioRender.com, with permission.

FIGURE 6.

FIGURE 6.

Application of artificial intelligence (AI) in health care. Through leveraging physiology data, AI models are capable of mapping physiology patterns to clinical impacts. Main applications of AI in health care include automating existing health care tasks (triage, second reader, and patient health records); increasing access to health care (empowering the individual and increasing access to care); and augmenting health care capabilities (optimizing treatment and predicting risk and outcomes). EHR, electronic health record. Image created with BioRender.com, with permission.

3.1. Automating Existing Health Care Tasks

3.1.1. Triage.

In health care, where patient cases frequently outnumber health care providers, triage can be used to prioritize critical patients, decrease the time to treat, and give greater clarity to patients as to whether medical attention is needed. The process of triaging patients can be reframed as a supervised learning classification task, where a model learns to classify patients into categories, such as “urgent” or “not urgent,” based on the underlying physiology and degree of pathology on clinical readouts. Machine learning has historically been adept at classification tasks, and with the advent of deep learning larger datasets have been leveraged to improve model performance (5, 66). In fields where triage is based on images or physical appearance, machine learning has matched the performance of physicians across a multitude of specialties, classifying skin lesions (66), referring retinopathy cases (67), and determining risk assessments of fractures (68).

An important extension has been the application of AI platforms to triage emergent cases, where delaying intervention by minutes can lead to irreversible damage and effective triage can shorten time to intervention. A focus of the field has been triaging emergent neurological events, where “time is brain” (69). In one study, a three-dimensional (3-D) CNN, developed to identify and then reorder acute neurological events (head trauma and stroke) according to the predicted criticalness, matched the average sensitivity of specialists and flagged critical cases 150 times faster than humans (1.2 s vs. 177 s) (70).

Similar technologies are currently being deployed in the clinic. Viz LVO is the first Food and Drug Administration (FDA)-approved AI-based triage system for large vesicle occlusions and is being used in >100 centers in the United States (71). Viz LVO analyzes incoming CT angiograms and alerts specialists through mobile notification when a stroke has been detected, with a median time of 6 min. In a small retrospective trial, patients triaged by Viz LVO saw a reduced triage time of 22.5 min and an overall reduced length in hospital stay (72, 73).

Viz.ai is expanding to include triage of other acute neurological diseases (74) and COVID, whereas Aidoc (75, 76), ZebraMedical (77), and others have also developed FDA-approved triage platforms to handle emergent neurological (74), orthopedic (78), and pulmonary (75) events. AI-driven triage platforms are among the first forays of AI platforms in the clinic. Their deployment will give much needed insight as to whether AI can be deployed on a large scale, successfully integrate into the workflow of a clinician, and improve outcomes for patients.

In nonemergent cases, AI triage platforms can be used to offset physician workload. A number of studies have found that when using AI platforms to triage and reorder a physician’s workload according to severity, more than half and up to 88% of the workload could be excluded without sacrificing sensitivity (44, 79, 80). When multiple rounds of screening were used for diagnosis, AI triage platforms could ensure that more severe cases were detected in earlier rounds (79).

A final important role for AI triage is triaging of patients before they arrive at the clinic. Initial triage is based on obtaining an accurate patient history, which can be automated through advancements in natural language processing and question and answering models. A number of AI platforms have utilized questionnaires or chat boxes to interact with patients to assess their symptoms and then determined whether further medical attention was needed (81). Typically, these models are trained on existing electronic health records (EHRs) to learn patterns of a disease and further combined with question and answering models to create an interactive platform. A few (8284) pretriage AI platforms are currently being deployed, with Babylon (8587) in collaboration with the UK National Health Service being one of the most notable.

3.1.2. Second reader.

Diagnostic error is one of the most common of patient care problems (88). It affects at least 1 in 20 United States adults each year, and most individuals are likely to encounter one or more diagnostic errors in their lifetime (8890). Having more than one health care reader review a case can decrease rates of diagnostic variation and error (89, 90), but because of a continual shortage in health care providers having multiple readers can be difficult to achieve.

Spanning across specialties, AI has matched the performance of physicians in diagnostic tasks (8, 44, 91, 92), allowing AI platforms to take on the role of second readers. In a number of diagnostic tasks, models were found to outperform physicians (9396). This is owing to a number of advantages of AI systems. AI systems excel at tasks that require meticulous, laborious, redundant calculations that are error prone for humans (47). Furthermore, models are capable of identifying cases that are missed by physicians, reducing the false negative rate significantly with only a moderate increase in false positives (9496). AI platforms excel in this regard because when trained on large datasets they are capable of efficiently leveraging the collective experience from thousands of patient cases and physician diagnoses, a task that would take physicians years to acquire. AI second readers also excel in diagnostic scenarios where diagnosis can be difficult for the physician, as in endoscopy for colon cancer screening where views are obstructed (97, 98). Even with partial views of polyps, AI models were capable of alerting physicians to malignant lesions that would have gone undetected otherwise (99).

AI platforms are now being increasingly integrated into advanced optical modalities (MRI, CTs) to create computer-aided detection (CADe) and computer-aided diagnosis (CADx) (100). Many are now being actively utilized in clinics (101). Encouragingly, it has been reported that physicians assisted with AI second readers saw greater performance than when acting alone, demonstrating that AI second readers can be successfully integrated into a clinical workflow to have meaningful impact (102106). However, how and when physicians are notified will require optimization, as one study found that up to a fifth of alerts from the AI second reader were ignored, with redundancy and alarm fatigue cited as the most common reasons (102).

3.1.3. Electronic health records.

Electronic health records (EHRs) are a rich resource documenting patient clinical history and a method for interacting with clinical data (4). However, physician dissatisfaction with EHRs is well documented (107), as EHRs are often laborious and cited as a leading cause of physician burnout (108, 109). Furthermore, scribing patient notes has been shown to limit patient interactions and consequently negatively affect patient care (110112). Advancements in natural language processing have enabled high performance in text translation, comprehension, and generation (6). When applied to health care, this has the capability to facilitate the EHR process in automated transcription of patient interactions (113), automated summarization of patient interactions into notes (114), and extrapolation of diagnoses (115, 116), outcomes (115, 116), or summaries from existing notes (117, 118).

Others have also combined two powerful fields of AI, computer vision and natural language processing, to achieve image-to-text translation (119). When utilized in medicine, image captioning can alleviate the bottleneck of creating summary reports for image-based fields such as pathology (120) and radiology (121123).

A future frontier for EHRs enabled by machine learning is video- or audio-based EHRs. Continuous recordings of patients with thermal sensors or videos that characterize the physiology of a patient (heart rate, body temperature, mobility, food intake, bowel movements) can be analyzed by CNNs and RNNs to translate videos into text summaries of patient activity. The result is a living clinical document of unparalleled resolution of patient activity during a hospital stay (7, 45, 46). And just as Apple’s Siri or Amazon’s Alexa have streamlined information retrieval, many have extended these pocket assistants into the medical sector (124126). This would allow health care providers to verbally access information about the patient and relevant clinical information in real time.

3.2. Increasing Access to Care

3.2.1. Increasing access to care in developing countries.

Low- and middle-income countries (LMICs) account for nearly 90% of the global burden of disease but have only a fraction of the health care workers compared with developed countries (127129). Many current AI health care platforms can be deployed in low-cost settings, needing only the computing power of a smartphone and internet access. Utilizing machine learning to increase access to health care in LMICs has been a primary goal for the field (130).

One of the main applications of AI in LMICs is to leverage low-cost physiology readouts with AI models to develop a screening tool for diseases that can be managed or treated. A main focus has been building AI screening platforms for diabetic retinopathy (8, 91, 96, 131, 132), a leading cause of preventable blindness in LMICs. An important problem in creating AI platforms is ensuring high and consistent performance across countries, where incidence, presentation, and patient demographics can vary widely. To address this, researchers have evaluated model performance on large multiethnic population datasets (91) and prospectively in countries with low resource settings (96, 131, 132). Importantly, models trained on datasets from developed countries were found to generalize well when evaluated in LMICs (96, 131, 132).

Oftentimes a significant bottleneck for health care delivery in LMICs is the lack of equipment. Increasingly, researchers have developed low-cost equipment that is integrated with AI screening platforms to combat the dual problem of dearth of health care providers and resources. Recently, one group developed a low-cost ($180) contrast-enhanced micrograph integrated with a machine learning model for the molecular diagnosis of lymphoma. A diagnosis can be achieved in <1 min with access to the internet and <10 min without (133).

Similarly, the Prakash laboratory, a champion of frugal science, recently developed the Octopi, a $500 bright-field and fluorescent microscope capable of automated scanning (130, 134). When combined with a CNN trained on 20,000 blood-smeared slides of malaria parasites, the Octopi can be used for automated and real-time detection of malaria 120 times faster than manual analysis (134). Octopi are now being deployed in Peru, Uganda, and India (130).

3.2.2. Increasing access to care in developed countries.

Many developed countries anticipate a shortage in physicians within the next decade (135, 136). At the same time, the disease burden is expected to continue to grow. These issues are further compounded by uneven access to health care.

One hope is that AI health care diagnostic platforms can assist practitioners in providing care comparable to trained specialists, thus increasing the number of areas with access to specialized care (93). Additionally, other groups have utilized AI to develop models to bypass parts of the clinical workflow that may need specialized expertise or resources to accomplish, such as “augmented microscopy” to assist with pathology slide analysis (137) and real-time diagnosis of tumor biopsies (138, 139) or “digital stains” of histology slides (140) to bypass sample preparation.

Of the shortage of physicians anticipated, a quarter is projected to be in surgical specialties (136). Currently, many simple surgical tasks, such as suturing (14, 141) and knot tying, can be automated by AI platforms (142, 143). Automated robotic surgery uses a combination of computer vision to understand the surgical terrain and reinforcement learning to optimize the series of steps (4). AI-automated surgery can be used to complement telesurgery operations and, perhaps in the future, to automate short minimally invasive or laparoscopic surgeries.

3.2.3. Empowering the individual.

A current revolution in health care is the unparalleled ability for individuals to collect real-time, dynamic, and noninvasive health data from wearable technology. This information delivers information back into the hands of the individual and can enable lifestyle changes (144), increased compliance with medical practices, precision and preventative medicine, and early diagnosis and interventions (145). Additionally, environmental and lifestyle factors are often not well captured in the current EHR but could be an important data type when considering the role of physiology in the application of AI in health care. Machine learning plays a pivotal role in translating the raw readouts from smart technology into understandable and actionable information for the user.

An important application of wearable technologies is the capability to use noninvasive physiology readouts to detect diseases that are frequently asymptomatic or often require continuous monitoring to confirm diagnosis (146). A great deal of work has involved training machine learning models to detect arrhythmias (62, 147), such as atrial fibrillation (148), tachycardia (149), or ectopic beats (150). Other readouts from wearable technology have allowed inferring hemoglobin levels from images of fingernails (151), prediction of diabetes risk with photoplethysmography (PPG) (152), diagnosis of sleep apnea (153), and characterization of motor and neurological diseases (154, 155).

Outside of disease detection, wearable technologies play an important role in staying vigilant about one’s health, enabling precision and preventative health. Researchers have developed a machine learning model to predict blood triglyceride and glycemic responses after meals in an effort to facilitate precision nutrition to combat cardiometabolic disease (156). Additional applications have included medication compliance (157) and mental health monitoring (158, 159).

A current challenge for wearable technology is determining the optimal data collection source and frequency of collection to obtain noise-free and representative data for analysis. New technologies such as smart glasses, contact lenses, and toilets (64) have been developed to increase either the quality or scope of data collected (160). Machine learning models play an important role in processing the data, integrating multiple data sources, and mapping patterns with health outcomes.

3.3. Augmenting Existing Capabilities

3.3.1. Early prediction of risk and outcomes.

The capability to identify which individuals are at risk for disease can improve patient outcomes and optimize allocation of health resources. For many decades, models have been developed to predict health outcomes for these purposes (161). The increasing number of large clinical datasets paired with advancements in deep learning have led to models capable of predicting outcomes earlier and with higher accuracy (65, 162164).

An area of focus has been predicting outcomes of acute adverse events (165, 166). This is owing to the large number of datasets with rich medical data on intensive care unit (ICU) patients (16, 167169). In one instance, a recurrent neural network trained on EHRs from US Department of Veterans Affairs sites was capable of predicting the risk of acute kidney injury with a lead time of up to 48 h (35).

Many risk prediction models are beginning to be evaluated for clinical use. A great body of work has been done on building models to predict sepsis, with models capable of predicting sepsis up to hours before the progression (170173). One such model resulted in a 39.5% reduction of in-hospital mortality, a 32.3% reduction in hospital length of stay, and a 22.7% reduction in 30-day readmission rate for sepsis-related patient stays (174) (NCT03960203). Many of these studies have utilized the Medical Information Mart for Intensive Care dataset, a dataset of >40,000 patients who were admitted to the critical care unit between 2001 and 2012. Given the abundance of quantitative physiological data collected in critical units, this dataset is one of the largest clinically annotated physiological datasets.

Another application has been the capability of AI risk models to identify previously unidentified biomarkers and underlying disease mechanisms (65, 175) from physiology data. In one instance, saliency maps derived from a model developed to predict the risk of developing diabetic retinopathy consistently identified regions that later progressed to microaneurysms, illuminating a potential early uninvestigated biomarker for diabetic retinopathy (176). Similar analysis done on models used to classify mesothelioma pathology slides identified epithelioid components, which have been reported in recent reports to be correlated with aggressive mesothelioma (177).

3.3.2. Optimizing and personalizing treatment.

Treatment often involves multistep decision making that takes into account traditional treatment strategies and the patient’s individual presentation. Current AI models have leveraged the collective experience from large clinical datasets and deep reinforcement models to select optimal and personalized treatment strategies for patient outcomes (170172, 178).

One such example has been development of treatment strategies for sepsis (173), where the management of intravenous fluids and vasopressors has been a key clinical challenge. A group of researchers created “AI Clinician,” a reinforcement learning model that suggests optimal treatments for adult patients with sepsis in the intensive care unit (ICU) using the MIMIC-III dataset. To train the AI Clinician, researchers tracked 48 physiology variables from the individuals during their hospital admission in increments of 4 h over 72 h (179). Using this data, they then simulated different possible trajectories for the patient. Trajectories that led to patient survival were rewarded with a positive score, while patient deaths were given negative scores. Having mapped out numerous different treatment strategies, the model could then determine the optimal sequence of treatment strategies to take given the patient and the clinical state. When evaluated on a test dataset, the AI Clinician selected treatments that were on average reliably higher than those of human clinicians and found that their treatment recommendations matched treatment strategies that resulted in the lowest mortality (172).

Similar modeling has been applied to optimizing cancer treatments, where the decisions of when to start treatment and which combination of treatments to select are critical for patient outcomes (180185). As cancer treatments become more targeted and molecular based, being able to map a patient’s underlying physiology and pathology to treatments and treatment outcomes is becoming increasingly important. Owing to advancements in genomic technologies, there is now a growing field that utilizes genomics and machine learning to identify targeted treatment, predict treatment outcomes, and monitor treatment results. The recent emergence of genomic technologies is currently revolutionizing tumor pathology. Single-cell transcriptomics, proteomics, and chromatin state allow for unparalleled characterization and resolution of tumor physiology. A number of studies have utilized machine learning and single-cell sequencing of tumor biopsies to characterize and map the tumor’s physiology, underlying cell composition, and tumor environment to targeted therapies (186, 187). The emergence of spatial technologies now allows for the incorporation of the spatial features of tumor physiology, such as cell communities, cell-cell interactions, and tumor microenvironments. Owing to the potential complexity of the cell spatial relationships and interactions, deep learning models, in particular graph neural networks, have been leveraged to map spatial datasets to treatment outcomes (188).

In addition to leveraging tumor biopsies, cancer diagnostics and targeted therapeutics have also leveraged another key aspect of tumor physiology: circulating tumor DNA (ctDNA). Here, machine learning has played a key role in mapping ctDNA to clinical outputs and enhancing our understanding of tumor physiology. Ensemble machine learning models have been developed to map ctDNA tumor burden to anticipated treatment outcomes (189). In doing so, these models have also elucidated underlying mechanisms relating tumor physiology to pathology, such as the role that fragmentomics, the length of DNA fragments with tumors releasing shorter fragments, and DNA methylation states play in characterizing tumor burden (190, 191).

An exciting prospective future application for machine learning and optimizing treatment strategies is the creation of a “google for patients” or a “digital twin.” A “google for patients” would enable searches for patients with similar demographics, biologic-omics, physiology, and disease history to better inform disease treatment strategies. A small-scale version, SMILY (Similar Medical Images Like Yours), a searchable database for pathology slides that can be annotated, has been created, demonstrating potential hope for this to come to fruition (192).

3.3.3. COVID and AI.

COVID-19, the worst global health pandemic in the last hundred years, has created many technical and medical problems. AI has been used to develop solutions for many of the problems, illustrating the many indispensable roles that AI now fills in health care (193, 194). CNNs were used to develop models for rapid triage and diagnosis of COVID patients (195). Natural language processing and EHRs were used to identify patients who were at greatest risk for mortality, while smartwatch data were used to detect patients at risk of developing COVID (63). AI had a critical role in discovering therapies for COVID, from drug repurposing neural networks to performing virtual screens for potential therapeutic targets (196). In an effort to increase public understanding, Salesforce developed “CO-search” to aggregate COVID literature into an interactive website with semantic search, question answering, and abstractive summarization (197).

4. CURRENT CHALLENGES

Whereas the previous decade has focused on demonstrating that AI models can leverage physiology data to deliver clinical interpretations, work in the field of AI health care over the next few years will be defined by whether these initial findings can be translated into the clinic. Before AI can be translated into the clinic, we must address current concerns of 1) demonstrating generalizability and robustness through validation on multiple datasets, in multiple clinics, and prospectively; 2) garnering the trust, understanding, and usability of health care providers; and 3) integrating into the clinical workflow to tangibly improve care. Below we discuss challenges in translating AI models into a health care setting (FIGURE 7).

FIGURE 7.

FIGURE 7.

Timeline and challenges of translating artificial intelligence (AI) into the clinic. The last decade of AI in health care has been focused on developing, applying, and optimizing models for applications in health care. Large physiology datasets and advancements in deep learning models have delivered AI models that can match or surpass human performance on a number of health care tasks. The next decade of AI in health care will focus on translating these models from research to clinical impact. Obstacles to deployment of AI models include demonstrating the robustness and generalizability of AI models, increasing interpretability of models, and integrating AI models into clinical workflows. ROC, receiver operating characteristic curve. Image created with BioRender.com, with permission.

4.1. Evaluating Models in the Wild

After developing an AI model, the next major milestone is to characterize the generalizability of the model. Generalization is the ability to maintain model performance on previously unseen data. Clinical datasets used to train models are incredibly rich but may also unknowingly carry institute (how and where the data were collected)- or patient (pediatric vs. adult profile)-specific characteristics. Current deep learning models may learn subtle patterns that do not reflect true or generalizable clinical patterns, resulting in decreased performance when evaluated on external datasets (198). It is critical to rigorously test AI models in the wild to answer these questions: 1) In a different institution, under different data collection workflows, in the hands of a different individual, how does the model perform? 2) For a different patient population, is model performance maintained? 3) Given that most models are trained on retrospective datasets, how does the model perform prospectively?

To address these questions, models in development are increasingly being evaluated on multihospital, multicountry, multipatient population datasets (91, 96, 131, 132, 199). The ultimate desired output from these evaluations is “thoughtful informed reporting” (20). It is necessary to state not only performance but also in which environments, with which data collection techniques and instruments, and in which patient demographics the model succeeds and therefore can be deployed (200). Informed reporting is a critical component of recently released SPIRIT-AI (201) and CONSORT-AI (202) guidelines for clinical trials of AI platforms.

To facilitate model generalization, a range of solutions have been proposed, from embedding safeguards directly into the model to improve generalizability (203) to combatting the phenomenon of “data shift” where the distributions of the training and test datasets are significantly different because of evolving patient populations and clinical practices (204), improving data collection harmonization among institutes to facilitate multicenter evaluations, and creating gold standard datasets that reflect patient populations and are updated regularly (205). Gold standard datasets will be critical in being able to compare different models with similar clinical goals and have already been adopted in other fields (206, 207).

However, data sharing and creation of multicenter large-scale datasets is rife with obstacles. Issues such as lack of data standardization, incompatible systems and proprietary software, and violation of patient privacy are common (205, 208, 209).

Federated learning has arisen as a possible solution (210). Federated learning rebukes the notion that data must be aggregated in a single location to train a model and instead sends the model to where the training data are located and trains the model locally. Each version of the trained model, but not the training data itself, is then sent back to a single place and aggregated. The result is one overall model that functions as if it had been trained on the entire dataset at once. Federated learning has already begun delivering promising results in AI health care applications (211213).

4.2. Interpretability, Usability, and Trust

A significant barrier to implementing current AI models, namely deep learning models, is the inability to understand how models learn. Deep learning models are frequently seen as black boxes for a number of reasons. Features learned by deep learning models are not usually understandable in terms of domain knowledge (i.e., clinical or biological concepts). Furthermore, the relationships among features and between outputs are typically nonlinear. Compared with linear relationships that can be understood as an increase in x leading to an increase in y, nonlinear relationships are difficult to explain semantically or in terms of simple concepts.

Although the lack of interpretability is problematic for any application of deep learning, it is particularly egregious in health care, where clinical decisions are based on explanations and evidence. Model interpretability plays a critical role in facilitating trust and usability by safeguarding against errors in the model such as training on artifacts or algorithmic biases (214, 215). If left undetected, model errors have the potential to harm patient outcomes.

Given its importance, the creation of methods to interpret AI models is an active field (216221). In health care, saliency maps have been particularly useful in determining “what the machine is looking at” and “what the machine thinks and sees” (222225). In one notable instance, saliency maps revealed that a deep learning model utilized nondisease features, such as the clinical center where the data were collected, to diagnose pneumonia from chest X-rays (198).

Another strategy that has become increasingly common has been reporting intermediary model steps or model predictions of related clinical outcomes. Doing so has been found to better mimic the workflow of a clinician. When creating a model to predict acute renal failure, researchers reported model predictions for laboratory values that clinicians customarily use to characterize renal failure (35). Similarly, in developing a model to diagnose heart failure from echocardiograms, researchers developed a left ventricle segmenter, beat-to-beat evaluation, and ejection fraction classification, calculations that are traditionally done before diagnosing heart failure from echocardiograms (47). Importantly, when AI models are paired with interpretation mechanisms, physicians see increases in usability and performance, validating the role that interpretation plays in implementing AI in health care (67, 104).

4.3. AI Health Care and Regulation

Before AI models can be legally marketed and sold in the United States, they must receive regulatory approval from the US Food and Drug Administration (FDA) (226). AI software is evaluated by the FDA a medical device and thus must either be granted de novo classification (if no prior approved device of the same nature exists), receive 510(k) clearance (if such a predicate device has already been approved), or be granted premarket approval (PMA) (if the device is considered high risk) (100). In each case, the AI device must be deemed safe and effective to be used on patients on the basis of evidence from a clinical study conducted and presented by the AI developers.

However, FDA evaluation standards for clinical studies have somewhat lagged behind the pace of technical development in medical AI. A study of public report summaries of FDA-approved medical devices from 2015–2020 revealed shortcomings in the evaluation process (227): most studies were performed retrospectively on data from a single clinical site, which can mask true clinical outcomes (227, 228); demographic subgroup analyses, which are important for detecting model biases (229235), were missing in most study reports. Across all studies, evaluation datasets were relatively small, and none was made publicly available.

After a device is approved, the FDA does not require postmarket surveillance, as is the case with drug approvals, making performance degradations over time challenging to measure and regulate (236, 237). Additionally, models cannot be updated or modified after approval (238, 239). In response to these concerns, the FDA has made public statements and action plans for more rigorous evaluations of model biases, continuous postmarket monitoring, and model updating (240, 241).

Beyond FDA regulation, there exist additional challenges to the successful clinical deployment of medical AI. Legal liability of malpractice, especially in the case of high-risk, fully autonomous diagnostic AI, remains to be answered (242, 243); insurance reimbursement schemes for AI algorithms are still in their nascent stages (244, 245); and publicly available, multisite gold standard evaluation datasets are limited (246), in part because of ongoing patient data privacy concerns (247).

4.4. AI Health Care and Quality Metrics

Although there are current metrics for the FDA regulation and evaluation of AI models, there are many additional metrics that should be evaluated to obtain a holistic assessment of AI health care models. Determining the metrics to evaluate the performance of AI health care models is critical for the successful deployment of AI models. Historically, AI models have focused on maximizing sensitivity, specificity, and accuracy. However, as AI models are beginning to be deployed in real-world settings, increasingly it has been demonstrated that these metrics do not always correlate with how well models integrate and assist clinical workflows (248). Numerous works have explored different quality metrics that can be used to evaluate the performance of models. Rather than solely focusing on optimizing accuracy, these models instead are engineered to optimize additional quality metrics such as cost (249), mortality (172), or medical ethics (250). CoAI was developed to optimize model performance in the setting of cost, yielding cost-aware, low-cost predictions (249). They found that doing so could also introduce robustness to the model, a current limitation hindering many AI health care models. In another model, Artificial Intelligence (AI) Clinician, reinforcement learning was used to find the optimal path to minimize mortality or complications in patients with sepsis using the MIMIC-III data set (172). Whether a model widens racial and gender health disparities is often a consideration when introducing novel technologies to clinical workflows. Work done by Pierson et al. (250) found that utilizing a deep learning approach to assess pain can provide an unbiased and quantitative metric that can reduce health inequities. These works raise an important consideration. Many current AI models are focused on high accuracy, particularly in determining a diagnosis or label; however, in clinical workflows AI-physiology models are used to make decisions and further the management of patients. Thus, models trained to optimize additional quality metrics will be increasingly important in the coming years.

4.5. Integrating into the Workflow and Evaluating Improved Care

A final hurdle is to successfully integrate AI platforms into the health care workflow and evaluate whether meaningful impact is seen. Implementation issues such as where to store the data and model, how the user interfaces with the model, how and when to integrate the platform into the clinical workflow, how to educate users and beneficiaries of the system, and what are follow-up action plans and how to proceed when the health care provider and the AI platform disagree are critical questions that must be answered before use (251254). Furthermore, whether the AI model results in meaningful improvements in patient outcomes or clinical care must also be evaluated, as past deployments of computer-assisted devices saw increased costs without benefits to patients (255).

A number of AI models currently being deployed in clinics or in clinical trials are addressing these issues (100). General trends include how best to position the AI platform so it does not succumb to alert fatigue, where health care providers overburdened by information may start to ignore important notifications (102). Another common trend has been the observation that high model performance and accuracy may not always translate to improved patient outcomes or streamlined care (256, 257). This may be due to the fact that customary model performance metrics such as AUC curves, sensitivity, specificity values, or accuracy may not always be the most informative/accurate metric to evaluate patient outcomes. Identifying metrics other than performance to describe AI health care models is increasingly being practiced and is an area of study (253, 258, 259).

A final consideration is to educate invested parties about AI in health care. Machine learning scientists who develop AI models must be increasingly informed about limitations of physiology and clinical data and how models can best be developed to solve impactful clinical problems. Health care providers have a duty to not only learn how to deploy AI platforms but also understand the assumptions models make and the datasets models are trained on to safeguard the model against patient harm. Finally, patients must have a working understanding of how AI models will affect their health care and the role they play in the AI-health care ecosystem.

GRANTS

This work was supported in part by National Institutes of Health Grants F30HL156478 (to A.Z.), P30AG059307 (to J.Z.), U01MH098953 (to J.Z.), R01HL163680 (to J.C.W), R01HL130020 (to J.C.W), R01HL146690 (to J.C.W.), and R01HL126527 (to J.C.W.); by National Science Foundation Grant CAREER1942926 (to J.Z.); and by American Heart Association Grant 17MERIT3361009 (to J.C.W.).

DISCLOSURES

J.C.W. is a founder of Greenstone Biosciences, and A.Z. is a consultant of Greenstone Biosciences.

AUTHOR CONTRIBUTIONS

A.Z. and J.C.W. conceived and designed research; A.Z. and M.W. prepared figures; A.Z., Z.W., E.W., and M.W. drafted manuscript; A.Z., Z.W., E.W., M.W., M.P.S., J.Z., and J.C.W. edited and revised manuscript; A.Z., Z.W., E.W., M.P.S., J.Z., and J.C.W. approved final version of manuscript.

ACKNOWLEDGMENTS

Figures were created with BioRender.com.

REFERENCES

  • 1. Zhang A, Xing L, Zou J, Wu JC. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed Eng 6: 1330–1345, 2022. doi: 10.1038/s41551-022-00898-y. [DOI] [PubMed] [Google Scholar]
  • 2. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 25: 44–56, 2019. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 3. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng 2: 719–731, 2018. doi: 10.1038/s41551-018-0305-z. [DOI] [PubMed] [Google Scholar]
  • 4. Esteva A, Robicquet A, Ramsundar B, Kuleshov V, DePristo M, Chou K, Cui C, Corrado G, Thrun S, Dean J. A guide to deep learning in healthcare. Nat Med 25: 24–29, 2019. doi: 10.1038/s41591-018-0316-z. [DOI] [PubMed] [Google Scholar]
  • 5. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25, edited by Pereira F, Burges CJ, Bottou L, Weinberger KQ. Red Hook, NY: Curran Associates, Inc., 2012, p. 1097–1105. [Google Scholar]
  • 6. Devlin J, Chang MW, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding (Preprint). arXiv 1810.04805, 2018. doi: 10.48550/arxiv.1810.04805. [DOI]
  • 7. Yeung S, Downing NL, Fei-Fei L, Milstein A. Bedside computer vision—moving artificial intelligence from driver assistance to patient safety. N Engl J Med 378: 1271–1273, 2018. doi: 10.1056/NEJMp1716891. [DOI] [PubMed] [Google Scholar]
  • 8. Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, Venugopalan S, Widner K, Madams T, Cuadros J, Kim R, Raman R, Nelson PC, Mega JL, Webster DR. Development and validation of a deep learning algorithm for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316: 2402–2410, 2016. doi: 10.1001/jama.2016.17216. [DOI] [PubMed] [Google Scholar]
  • 9. Xue Y, Xu T, Zhang H, Long LR, Huang X. SegAN: adversarial network with multi-scale L1 loss for medical image segmentation. Neuroinformatics 16: 383–392, 2018. doi: 10.1007/s12021-018-9377-x. [DOI] [PubMed] [Google Scholar]
  • 10. Wang G, Liu X, Shen J, Wang C, Li Z, Ye L, , et al. A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images. Nat Biomed Eng 5: 509–521, 2021. doi: 10.1038/s41551-021-00704-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11. Huang K, Altosaar J, Ranganath R. ClinicalBERT: modeling clinical notes and predicting hospital readmission. CHIL Workshop, 2020. doi: 10.48550/arXiv.1904.05342. [DOI]
  • 12. Smit A, Jain S, Rajpurkar P, Pareek A, Ng AY, Lungren MP. CheXbert: combining automatic labelers and expert annotations for accurate radiology report labeling using BERT. EMNLP, 2020. doi: 10.48550/arXiv.2004.09167. [DOI]
  • 13. Alsentzer E, Murphy J, Boag W, Weng W-H, Jindi D, Naumann T, McDermott M. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, MN: Association for Computational Linguistics, 2019, p. 72–78. doi: 10.18653/v1/W19-1909. [DOI] [Google Scholar]
  • 14. Luongo F, Hakim R, Nguyen JH, Anandkumar A, Hung AJ. Deep learning-based computer vision to recognize and classify suturing gestures in robot-assisted surgery. Surgery 169: 1240–1244, 2021. doi: 10.1016/j.surg.2020.08.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15. Goldberger AL, Amaral LA, Glass L, Hausdorff JM, Ivanov PC, Mark RG, Mietus JE, Moody GB, Peng CK, Stanley HE. PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101: E215–E220, 2000. doi: 10.1161/01.cir.101.23.e215. [DOI] [PubMed] [Google Scholar]
  • 16. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, Mark RG. MIMIC-III, a freely accessible critical care database. Sci Data 3: 160035, 2016. doi: 10.1038/sdata.2016.35. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17. Collins FS, Varmus H. A new initiative on precision medicine. N Engl J Med 372: 793–795, 2015. doi: 10.1056/NEJMp1500523. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18. Arges K, Assimes T, Bajaj V, Balu S, Bashir MR, Beskow L, , et al. The Project Baseline Health Study: a step towards a broader mission to map human health. NPJ Digit Med 3: 84, 2020. doi: 10.1038/s41746-020-0290-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19. Bycroft C, Freeman C, Petkova D, Band G, Elliott LT, Sharp K, Motyer A, Vukcevic D, Delaneau O, O’Connell J, Cortes A, Welsh S, Young A, Effingham M, McVean G, Leslie S, Allen N, Donnelly P, Marchini J. The UK Biobank resource with deep phenotyping and genomic data. Nature 562: 203–209, 2018. doi: 10.1038/s41586-018-0579-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20. Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX, Doshi-Velez F, Jung K, Heller K, Kale D, Saeed M, Ossorio PN, Thadaney-Israni S, Goldenberg A. Do no harm: a roadmap for responsible machine learning for health care. Nat Med 25: 1337–1340, 2019. doi: 10.1038/s41591-019-0548-6. [DOI] [PubMed] [Google Scholar]
  • 21. Turing AM. On computable numbers, with an application to the Entscheidungsproblem (Online). 1936. https://www.cs.virginia.edu/~robins/Turing_Paper_1936.pdf.
  • 22. Shilo S, Rossman H, Segal E. Axes of a revolution: challenges and promises of big data in healthcare. Nat Med 26: 29–38, 2020. doi: 10.1038/s41591-019-0727-5. [DOI] [PubMed] [Google Scholar]
  • 23. James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning. New York: Springer, 2013. [Google Scholar]
  • 24. Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods 15: 1053–1058, 2018. doi: 10.1038/s41592-018-0229-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25. Botvinick M, Ritter S, Wang JX, Kurth-Nelson Z, Blundell C, Hassabis D. Reinforcement learning, fast and slow. Trends Cogn Sci 23: 408–422, 2019. doi: 10.1016/j.tics.2019.02.006. [DOI] [PubMed] [Google Scholar]
  • 26. Hassabis D, Kumaran D, Summerfield C, Botvinick M. Neuroscience-inspired artificial intelligence. Neuron 95: 245–258, 2017. doi: 10.1016/j.neuron.2017.06.011. [DOI] [PubMed] [Google Scholar]
  • 27. Silver D, Schrittwieser J, Simonyan K, Antonoglou I, Huang A, Guez A, Hubert T, Baker L, Lai M, Bolton A, Chen Y, Lillicrap T, Hui F, Sifre L, van den Driessche G, Graepel T, Hassabis D. Mastering the game of Go without human knowledge. Nature 550: 354–359, 2017. doi: 10.1038/nature24270. [DOI] [PubMed] [Google Scholar]
  • 28. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 521: 436–444, 2015. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
  • 29. Goodfellow I, Bengio Y, Courville A. Deep Learning. Cambridge, MA: MIT Press, 2016. [Google Scholar]
  • 30. Rosenblatt F. The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65: 386–408, 1958. doi: 10.1037/h0042519. [DOI] [PubMed] [Google Scholar]
  • 31. Wang Q, Ma Y, Zhao K, Tian Y. A comprehensive survey of loss functions in machine learning. Ann Data Sci 9: 187–212, 2022. doi: 10.1007/s40745-020-00253-5. [DOI] [Google Scholar]
  • 32. Gers FA, Schmidhuber J, Cummins F. Learning to forget: continual prediction with LSTM. Neural Comput 12: 2451–2471, 2000. doi: 10.1162/089976600300015015. [DOI] [PubMed] [Google Scholar]
  • 33. Cho K, van Merriënboer B, Bahdanau D, Bengio Y. On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, 2014, p. 103–111. doi: 10.3115/v1/W14-4012. [DOI] [Google Scholar]
  • 34. Chen B, Khodadoust MS, Olsson N, Wagar LE, Fast E, Liu CL, Muftuoglu Y, Sworder BJ, Diehn M, Levy R, Davis MM, Elias JE, Altman RB, Alizadeh AA. Predicting HLA class II antigen presentation through integrated deep learning. Nat Biotechnol 37: 1332–1343, 2019. doi: 10.1038/s41587-019-0280-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35. Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H, Saraiva A, Mottram A, Meyer C, Ravuri S, Protsyuk I, Connell A, Hughes CO, Karthikesalingam A, Cornebise J, Montgomery H, Rees G, Laing C, Baker CR, Peterson K, Reeves R, Hassabis D, King D, Suleyman M, Back T, Nielson C, Ledsam JR, Mohamed S. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572: 116–119, 2019. doi: 10.1038/s41586-019-1390-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need (Online). arXiv 1706.03762, 2017. doi: 10.48550/arXiv.1706.03762. [DOI]
  • 37. Fedus W, Zoph B, Shazeer N. Switch transformers: scaling to trillion parameter models with simple and efficient sparsity (Online). arXiv 2101.03961, 2021. doi: 10.48550/arXiv.2101.03961. [DOI]
  • 38. Kieuvongngam V, Tan B, Niu Y. Automatic text summarization of COVID-19 medical research articles using BERT and GPT-2 (Online). arXiv 2006.01997, 2020. doi: 10.48550/arXiv.2006.01997. [DOI]
  • 39.The AlphaFold team. AlphaFold: a solution to a 50-year-old grand challenge in biology (Online). 2020. https://deepmind.com/blog/article/alphafold-a-solution-to-a-50-year-old-grand-challenge-in-biology. [2020 Dec 31].
  • 40. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification with deep convolutional neural networks. Commun ACM 60: 84–90, 2017. doi: 10.1145/3065386. [DOI] [Google Scholar]
  • 41. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A. Going deeper with convolutions (Online). arXiv 1409.4842, 2014. doi: 10.48550/arXiv.1409.4842. [DOI]
  • 42. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition (Online). arXiv 1512.03385, 2015. doi: 10.48550/arXiv.1512.03385. [DOI]
  • 43. Ronneberger O, Fischer P, Brox T. U-Net: convolutional networks for biomedical image segmentation (Online). arXiv 1505.04597, 2015. doi: 10.48550/arXiv.1505.04597. [DOI]
  • 44. Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, Brogi E, Reuter VE, Klimstra DS, Fuchs TJ. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 25: 1301–1309, 2019. doi: 10.1038/s41591-019-0508-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 45. Haque A, Milstein A, Fei-Fei L. Illuminating the dark spaces of healthcare with ambient intelligence. Nature 585: 193–202, 2020. doi: 10.1038/s41586-020-2669-y. [DOI] [PubMed] [Google Scholar]
  • 46. Yeung S, Rinaldo F, Jopling J, Liu B, Mehra R, Downing NL, Guo M, Bianconi GM, Alahi A, Lee J, Campbell B, Deru K, Beninati W, Fei-Fei L, Milstein A. A computer vision system for deep learning-based detection of patient mobilization activities in the ICU. NPJ Digit Med 2: 11, 2019. doi: 10.1038/s41746-019-0087-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47. Ouyang D, He B, Ghorbani A, Yuan N, Ebinger J, Langlotz CP, Heidenreich PA, Harrington RA, Liang DH, Ashley EA, Zou JY. Video-based AI for beat-to-beat assessment of cardiac function. Nature 580: 252–256, 2020. doi: 10.1038/s41586-020-2145-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48. Kingma DP, Welling M. Auto-encoding variational Bayes (Online). arXiv 1312.6114v10, 2013. doi: 10.48550/arXiv.1312.6114v10. [DOI]
  • 49. Rezende DJ, Mohamed S. Variational inference with normalizing flows (Online). arXiv 1505.05770, 2015. doi: 10.48550/arXiv.1505.05770. [DOI]
  • 50. van den Oord A, Kalchbrenner N, Vinyals O, Espeholt L, Graves A, Kavukcuoglu K. Conditional image generation with PixelCNN decoders (Online). arXiv 1606.05328, 2016. doi: 10.48550/arXiv.1606.05328. [DOI]
  • 51. Yi X, Walia E, Babyn P. Generative adversarial network in medical imaging: review. Med Image Anal 58: 101552, 2019. doi: 10.1016/j.media.2019.101552. [DOI] [PubMed] [Google Scholar]
  • 52. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Adv Neural Inf Process Syst 27, 2014. [Google Scholar]
  • 53. Abdelhalim IS, Mohamed MF, Mahdy YB. Data augmentation for skin lesion using self-attention based progressive generative adversarial network. Expert Syst Appl 165: 113922, 2021. doi: 10.1016/j.eswa.2020.113922. [DOI] [Google Scholar]
  • 54. Chartsias A, Joyce T, Dharmakumar R, Tsaftaris SA. Adversarial image synthesis for unpaired multi-modal cardiac data. In: Simulation and Synthesis in Medical Imaging, edited by Tsaftaris SA, Gooya A, Frangi AF, Prince JL. Cham, Switzerland: Springer International Publishing, 2017, p. 3–13. [Google Scholar]
  • 55. Chen RJ, Lu MY, Chen TY, Williamson DF, Mahmood F. Synthetic data in machine learning for medicine and healthcare. Nat Biomed Eng 5: 493–497, 2021. doi: 10.1038/s41551-021-00751-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56. Armanious K, Jiang C, Fischer M, Küstner T, Hepp T, Nikolaou K, Gatidis S, Yang B. MedGAN: medical image translation using GANs. Comput Med Imaging Graph 79: 101684, 2020. doi: 10.1016/j.compmedimag.2019.101684. [DOI] [PubMed] [Google Scholar]
  • 57. Zhou J, Cui G, Hu S, Zhang Z, Yang C, Liu Z, Wang L, Li C, Sun M. Graph neural networks: a review of methods and applications. AI Open 1: 57–81, 2020. doi: 10.1016/j.aiopen.2021.01.001. [DOI] [Google Scholar]
  • 58. Kipf TN, Welling M. Semi-supervised classification with graph convolutional networks (Online). arXiv 1609.02907, 2016. doi: 10.48550/arXiv.1609.02907. [DOI]
  • 59. Chang S, Pierson E, Koh PW, Gerardin J, Redbird B, Grusky D, Leskovec J. Mobility network models of COVID-19 explain inequities and inform reopening. Nature 589: 82–87, 2021. doi: 10.1038/s41586-020-2923-3. [DOI] [PubMed] [Google Scholar]
  • 60. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D. Human-level control through deep reinforcement learning. Nature 518: 529–533, 2015. doi: 10.1038/nature14236. [DOI] [PubMed] [Google Scholar]
  • 61. Yu C, Liu J, Nemati S. Reinforcement learning in healthcare: a survey (Online). arXiv 1908.08796, 2019. doi: 10.48850/arXiv.1908.08796. [DOI]
  • 62. Turakhia MP, Desai M, Hedlin H, Rajmane A, Talati N, Ferris T, Desai S, Nag D, Patel M, Kowey P, Rumsfeld JS, Russo AM, Hills MT, Granger CB, Mahaffey KW, Perez MV. Rationale and design of a large-scale, app-based study to identify cardiac arrhythmias using a smartwatch: the Apple Heart Study. Am Heart J 207: 66–75, 2019. doi: 10.1016/j.ahj.2018.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 63. Mishra T, Wang M, Metwally AA, Bogu GK, Brooks AW, Bahmani A, Alavi A, Celli A, Higgs E, Dagan-Rosenfeld O, Fay B, Kirkpatrick S, Kellogg R, Gibson M, Wang T, Hunting EM, Mamic P, Ganz AB, Rolnik B, Li X, Snyder MP. Pre-symptomatic detection of COVID-19 from smartwatch data. Nat Biomed Eng 4: 1208–1220, 2020. doi: 10.1038/s41551-020-00640-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 64. Park SM, Won DD, Lee BJ, Escobedo D, Esteva A, Aalipour A, Ge TJ, Kim JH, Suh S, Choi EH, Lozano AX, Yao C, Bodapati S, Achterberg FB, Kim J, Park H, Choi Y, Kim WJ, Yu JH, Bhatt AM, Lee JK, Spitler R, Wang SX, Gambhir SS. A mountable toilet system for personalized health monitoring via the analysis of excreta. Nat Biomed Eng 4: 624–635, 2020. doi: 10.1038/s41551-020-0534-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 65. Poplin R, Varadarajan AV, Blumer K, Liu Y, McConnell MV, Corrado GS, Peng L, Webster DR. Prediction of cardiovascular risk factors from retinal fundus photographs via deep learning. Nat Biomed Eng 2: 158–164, 2018. doi: 10.1038/s41551-018-0195-0. [DOI] [PubMed] [Google Scholar]
  • 66. Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, Thrun S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 542: 115–118, 2017. doi: 10.1038/nature21056. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67. De Fauw J, Ledsam JR, Romera-Paredes B, Nikolov S, Tomasev N, Blackwell S, , et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat Med 24: 1342–1350, 2018. doi: 10.1038/s41591-018-0107-6. [DOI] [PubMed] [Google Scholar]
  • 68. Dagan N, Elnekave E, Barda N, Bregman-Amitai O, Bar A, Orlovsky M, Bachmat E, Balicer RD. Automated opportunistic osteoporotic fracture risk assessment using computed tomography scans to aid in FRAX underutilization. Nat Med 26: 77–82, 2020. doi: 10.1038/s41591-019-0720-z. [DOI] [PubMed] [Google Scholar]
  • 69. Titano JJ, Badgeley M, Schefflein J, Pain M, Su A, Cai M, Swinburne N, Zech J, Kim J, Bederson J, Mocco J, Drayer B, Lehar J, Cho S, Costa A, Oermann EK. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat Med 24: 1337–1341, 2018. doi: 10.1038/s41591-018-0147-y. [DOI] [PubMed] [Google Scholar]
  • 70. Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, Mahajan V, Rao P, Warier P. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 392: 2388–2396, 2018. doi: 10.1016/S0140-6736(18)31645-3. [DOI] [PubMed] [Google Scholar]
  • 71.FDA approves stroke-detecting AI software. Nat Biotechnol 36: 290, 2018. doi: 10.1038/nbt0418-290. [DOI] [PubMed] [Google Scholar]
  • 72. Hassan AE, Ringheanu VM, Rabah RR, Preston L, Tekle WG, Qureshi AI. Early experience utilizing artificial intelligence shows significant reduction in transfer times and length of stay in a hub and spoke model. Interv Neuroradiol 26: 615–622, 2020. doi: 10.1177/1591019920953055. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 73. Morey JR, Fiano E, Yaeger KA, Zhang X, Fifi JT. Impact of Viz LVO on time-to-treatment and clinical outcomes in large vessel occlusion stroke patients presenting to Primary Stroke Centers (Preprint). medRxiv 2020.07.02.20143834, 2020. doi: 10.1101/2020.07.02.20143834. [DOI]
  • 74. Hale C. Viz.ai software gains FDA clearance for automatically spotting brain aneurysms (Online). 2022. https://www.fiercebiotech.com/medtech/vizai-software-gains-fda-clearance-automatically-spotting-brain-aneurysms. [2022 Jul 6].
  • 75.Aidoc. Aidoc granted AI industry-first FDA clearance for triage of incidental pulmonary embolism (Online). 2020. https://www.prnewswire.com/il/news-releases/aidoc-granted-ai-industry-first-fda-clearance-for-triage-of-incidental-pulmonary-embolism-301156870.html. [2021 Jan 21].
  • 76. Gormley B. Aidoc, an AI healthcare startup, nabs $110 million expansion round (Online). WSJ-PRO-WSJ.com, 2022. https://www.wsj.com/articles/aidoc-an-ai-healthcare-startup-nabs-110-million-expansion-round-11655373604. [2022 Jul 6].
  • 77.Zebra Medical Vision (Online). 2019. https://www.prnewswire.com/il/news-releases/zebra-medical-vision-receives-fda-approval-for-worlds-first-ai-chest-x-ray-triage-product-300848867.html. [2021 Jan 21].
  • 78. Hale C. Nanox nets FDA clearance for osteoporosis and spine fracture AI (Online). 2022. https://www.fiercebiotech.com/medtech/nanox-nets-fda-clearance-osteoporosis-and-spine-fracture-ai. [2022 Jul 6].
  • 79. Dembrower K, Wåhlin E, Liu Y, Salim M, Smith K, Lindholm P, Eklund M, Strand F. Effect of artificial intelligence-based triaging of breast cancer screening mammograms on cancer detection and radiologist workload: a retrospective simulation study. Lancet Digit Health 2: e468–e474, 2020. doi: 10.1016/S2589-7500(20)30185-0. [DOI] [PubMed] [Google Scholar]
  • 80. McKinney SM, Sieniek M, Godbole V, Godwin J, Antropova N, Ashrafian H, , et al. International evaluation of an AI system for breast cancer screening. Nature 577: 89–94, 2020. doi: 10.1038/s41586-019-1799-6. [DOI] [PubMed] [Google Scholar]
  • 81. Razzaki S, Baker A, Perov Y, Middleton K, Baxter J, Mullarkey D, Sangar D, Taliercio M, Butt M, Majeed A, DoRosario A, Mahoney M, Johri S. A comparative study of artificial intelligence and human doctors for the purpose of triage and diagnosis (Online). arXiv 1806.10698, 2018. doi: 10.48550/arXiv.1806.10698. [DOI]
  • 82.Buoy Health and CVS Health Provide Easy Access to Affordable Care (Online). 2018. https://www.businesswire.com/news/home/20180508005075/en/Buoy-Health-and-CVS-Health-Provide-Easy-Access-to-Affordable-Care. [2021 Jan 14].
  • 83.HITInfrastructure. Diagnostic robotics, Mayo Clinic bring triage platform to patients (Online). 2020. https://hitinfrastructure.com/news/diagnostic-robotics-mayo-clinic-bring-triage-platform-to-patients. [2021 Jan 2].
  • 84.Three preeminent health systems select GYANT Vaccine Digital Health Solution to facilitate COVID-19 vaccine rollout (Online). 2020. https://www.businesswire.com/news/home/20201222005402/en/Three-Preeminent-Health-Systems-Select-GYANT-Vaccine-Digital-Health-Solution-to-Facilitate-COVID-19-Vaccine-Rollout. [14 Jan. 2021].
  • 85. Iacobucci G. Babylon Health holds talks with “significant” number of NHS trusts. BMJ 368: m266, 2020. doi: 10.1136/bmj.m266. [DOI] [PubMed] [Google Scholar]
  • 86. Iacobucci G. London GP clinic sees big jump in patient registrations after Babylon app launch. BMJ 359: j5908, 2017. doi: 10.1136/bmj.j5908. [DOI] [PubMed] [Google Scholar]
  • 87. Oliver D. David Oliver: lessons from the Babylon Health saga. BMJ 365: l2387, 2019. doi: 10.1136/bmj.l2387. [DOI] [PubMed] [Google Scholar]
  • 88. Singh H, Graber ML. Improving diagnosis in health care—the next imperative for patient safety. N Engl J Med 373: 2493–2495, 2015. doi: 10.1056/NEJMp1512241. [DOI] [PubMed] [Google Scholar]
  • 89. Singh H, Meyer AN, Thomas EJ. The frequency of diagnostic errors in outpatient care: estimations from three large observational studies involving US adult populations. BMJ Qual Saf 23: 727–731, 2014. doi: 10.1136/bmjqs-2013-002627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 90.Institute of Medicine, National Academies of Sciences. Improving Diagnosis in Health Care. Washington, DC: The National Academies Press, 2015. [Google Scholar]
  • 91. Ting DS, Cheung CY, Lim G, Tan GS, Quang ND, Gan A, Hamzah H, Garcia-Franco R, San Yeo IY, Lee SY, Wong EY, Sabanayagam C, Baskaran M, Ibrahim F, Tan NC, Finkelstein EA, Lamoureux EL, Wong IY, Bressler NM, Sivaprasad S, Varma R, Jonas JB, He MG, Cheng CY, Cheung GC, Aung T, Hsu W, Lee ML, Wong TY. Development and validation of a deep learning system for diabetic retinopathy and related eye diseases using retinal images from multiethnic populations with diabetes. JAMA 318: 2211–2223, 2017. doi: 10.1001/jama.2017.18152. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 92. Ehteshami Bejnordi B, Veta M, Johannes van Diest P, van Ginneken B, Karssemeijer N, Litjens G, , et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318: 2199–2210, 2017. doi: 10.1001/jama.2017.14585. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93. Liu Y, Jain A, Eng C, Way DH, Lee K, Bui P, Kanada K, de Oliveira Marinho G, Gallegos J, Gabriele S, Gupta V, Singh N, Natarajan V, Hofmann-Wellenhof R, Corrado GS, Peng LH, Webster DR, Ai D, Huang SJ, Liu Y, Dunn RC, Coz D. A deep learning system for differential diagnosis of skin diseases. Nat Med 26: 900–908, 2020. doi: 10.1038/s41591-020-0842-3. [DOI] [PubMed] [Google Scholar]
  • 94. Brinker TJ, Hekler A, Enk AH, Klode J, Hauschild A, Berking C, Schilling B, Haferkamp S, Schadendorf D, Holland-Letz T, Utikal JS, von Kalle C; Collaborators. Deep learning outperformed 136 of 157 dermatologists in a head-to-head dermoscopic melanoma image classification task. Eur J Cancer 113: 47–54, 2019. doi: 10.1016/j.ejca.2019.04.001. [DOI] [PubMed] [Google Scholar]
  • 95. Maron RC, Weichenthal M, Utikal JS, Hekler A, Berking C, Hauschild A, Enk AH, Haferkamp S, Klode J, Schadendorf D, Jansen P, Holland-Letz T, Schilling B, von Kalle C, Fröhling S, Gaiser MR, Hartmann D, Gesierich A, Kähler KC, Wehkamp U, Karoglan A, Bär C, Brinker TJ; Collaborators. Systematic outperformance of 112 dermatologists in multiclass skin cancer image classification by convolutional neural networks. Eur J Cancer 119: 57–65, 2019. doi: 10.1016/j.ejca.2019.06.013. [DOI] [PubMed] [Google Scholar]
  • 96. Raumviboonsuk P, Krause J, Chotcomwongse P, Sayres R, Raman R, Widner K, , et al. Deep learning versus human graders for classifying diabetic retinopathy severity in a nationwide screening program. NPJ Digit Med 2: 25, 2019. doi: 10.1038/s41746-019-0099-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 97. Wu L, Zhang J, Zhou W, An P, Shen L, Liu J, Jiang X, Huang X, Mu G, Wan X, Lv X, Gao J, Cui N, Hu S, Chen Y, Hu X, Li J, Chen D, Gong D, He X, Ding Q, Zhu X, Li S, Wei X, Li X, Wang X, Zhou J, Zhang M, Yu HG. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy. Gut 68: 2161–2169, 2019. doi: 10.1136/gutjnl-2018-317366. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 98. Wang P, Berzin TM, Glissen Brown JR, Bharadwaj S, Becq A, Xiao X, Liu P, Li L, Song Y, Zhang D, Li Y, Xu G, Tu M, Liu X. Real-time automatic detection system increases colonoscopic polyp and adenoma detection rates: a prospective randomised controlled study. Gut 68: 1813–1819, 2019. doi: 10.1136/gutjnl-2018-317500. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 99. Wang P, Xiao X, Glissen Brown JR, Berzin TM, Tu M, Xiong F, Hu X, Liu P, Song Y, Zhang D, Yang X, Li L, He J, Yi X, Liu J, Liu X. Development and validation of a deep-learning algorithm for the detection of polyps during colonoscopy. Nat Biomed Eng 2: 741–748, 2018. doi: 10.1038/s41551-018-0301-3. [DOI] [PubMed] [Google Scholar]
  • 100. Benjamens S, Dhunnoo P, Meskó B. The state of artificial intelligence-based FDA-approved medical devices and algorithms: an online database. NPJ Digit Med 3: 118, 2020. doi: 10.1038/s41746-020-00324-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 101. Abràmoff MD, Lavin PT, Birch M, Shah N, Folk JC. Pivotal trial of an autonomous AI-based diagnostic system for detection of diabetic retinopathy in primary care offices. NPJ Digit Med 1: 39, 2018. doi: 10.1038/s41746-018-0040-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 102. Wijnberge M, Geerts BF, Hol L, Lemmers N, Mulder MP, Berge P, Schenk J, Terwindt LE, Hollmann MW, Vlaar AP, Veelo DP. Effect of a machine learning-derived early warning system for intraoperative hypotension vs standard care on depth and duration of intraoperative hypotension during elective noncardiac surgery: the HYPE Randomized Clinical Trial. JAMA 323: 1052–1060, 2020. doi: 10.1001/jama.2020.0592. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 103. Sayres R, Taly A, Rahimy E, Blumer K, Coz D, Hammel N, Krause J, Narayanaswamy A, Rastegar Z, Wu D, Xu S, Barb S, Joseph A, Shumski M, Smith J, Sood AB, Corrado GS, Peng L, Webster DR. Using a deep learning algorithm and integrated gradients explanation to assist grading for diabetic retinopathy. Ophthalmology 126: 552–564, 2019. doi: 10.1016/j.ophtha.2018.11.016. [DOI] [PubMed] [Google Scholar]
  • 104. Lundberg SM, Nair B, Vavilala MS, Horibe M, Eisses MJ, Adams T, Liston DE, Low DK, Newman SF, Kim J, Lee SI. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat Biomed Eng 2: 749–760, 2018. doi: 10.1038/s41551-018-0304-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 105. Kiani A, Uyumazturk B, Rajpurkar P, Wang A, Gao R, Jones E, Yu Y, Langlotz CP, Ball RL, Montine TJ, Martin BA, Berry GJ, Ozawa MG, Hazard FK, Brown RA, Chen SB, Wood M, Allard LS, Ylagan L, Ng AY, Shen J. Impact of a deep learning assistant on the histopathologic classification of liver cancer. NPJ Digit Med 3: 23, 2020. doi: 10.1038/s41746-020-0232-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 106. Lakhani P, Sundaram B. Deep learning at chest radiography: automated classification of pulmonary tuberculosis by using convolutional neural networks. Radiology 284: 574–582, 2017. doi: 10.1148/radiol.2017162326. [DOI] [PubMed] [Google Scholar]
  • 107. Khairat S, Coleman C, Ottmar P, Jayachander DI, Bice T, Carson SS. Association of electronic health record use with physician fatigue and efficiency. JAMA Netw Open 3: e207385, 2020. doi: 10.1001/jamanetworkopen.2020.7385. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 108. Downing NL, Bates DW, Longhurst CA. Physician burnout in the electronic health record era: are we ignoring the real cause? Ann Intern Med 169: 50–51, 2018. doi: 10.7326/M18-0139. [DOI] [PubMed] [Google Scholar]
  • 109. Kapoor M. Physician burnout in the electronic health record era. Ann Intern Med 170: 216, 2019. doi: 10.7326/L18-0601. [DOI] [PubMed] [Google Scholar]
  • 110. Verghese A, Shah NH, Harrington RA. What this computer needs is a physician: humanism and artificial intelligence. JAMA 319: 19–20, 2018. doi: 10.1001/jama.2017.19198. [DOI] [PubMed] [Google Scholar]
  • 111. Tawfik DS, Profit J, Morgenthaler TI, Satele DV, Sinsky CA, Dyrbye LN, Tutty MA, West CP, Shanafelt TD. Physician burnout, well-being, and work unit safety grades in relationship to reported medical errors. Mayo Clin Proc 93: 1571–1580, 2018. doi: 10.1016/j.mayocp.2018.05.014. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 112. Singh H, Spitzmueller C, Petersen NJ, Sawhney MK, Sittig DF. Information overload and missed test results in electronic health record-based settings. JAMA Intern Med 173: 702–704, 2013. doi: 10.1001/2013.jamainternmed.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 113. Chiu CC, Tripathi A, Chou K, Co C, Jaitly N, Jaunzeikare D, Kannan A, Nguyen P, Sak H, Sankar A, Tansuwan J, Wan N, Wu Y, Zhang X. Speech recognition for medical conversations. arXiv 1711.07274, 2017. doi: 10.48550/arXiv.1711.07274. [DOI]
  • 114. Rajkomar A, Kannan A, Chen K, Vardoulakis L, Chou K, Cui C, Dean J. Automatically charting symptoms from patient-physician conversations using machine learning. JAMA Intern Med 179: 836–838, 2019. doi: 10.1001/jamainternmed.2018.8558. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 115. Rajkomar A, Oren E, Chen K, Dai AM, Hajaj N, Hardt M, , et al. Scalable and accurate deep learning with electronic health records. NPJ Digit Med 1: 18, 2018. doi: 10.1038/s41746-018-0029-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 116. Liang H, Tsui BY, Ni H, Valentim CC, Baxter SL, Liu G, , et al. Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence. Nat Med 25: 433–438, 2019. doi: 10.1038/s41591-018-0335-9. [DOI] [PubMed] [Google Scholar]
  • 117. Hassanpour S, Langlotz CP. Information extraction from multi-institutional radiology reports. Artif Intell Med 66: 29–39, 2016. doi: 10.1016/j.artmed.2015.09.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 118. Goff DJ, Loehfelm TW. Automated radiology report summarization using an open-source natural language processing pipeline. J Digit Imaging 31: 185–192, 2018. doi: 10.1007/s10278-017-0030-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 119. Karpathy A, Fei-Fei L. Deep visual-semantic alignments for generating image descriptions (Online). arXiv 1412.2306, 2014. doi: 10.48850/arXiv.1412.2306. [DOI] [PubMed]
  • 120. Zhang Z, Xie Y, Xing F, McGough M, Yang L. MDNet: a semantically and visually interpretable medical image diagnosis network (Online). arXiv 1707.02485, 2017. doi: 10.48850/arXiv.1707.02485. [DOI]
  • 121. Wang X, Peng Y, Lu L, Lu Z, Summers RM. TieNet: text-image embedding network for common thorax disease classification and reporting in chest X-rays (Online). arXiv 1801.04334, 2018. doi: 10.48550/arXiv.1801.04334. [DOI]
  • 122.ChestX-Ray8: hospital-scale chest X-ray database (Online). 2017. https://www.semanticscholar.org/paper/ChestX-Ray8%3A-Hospital-Scale-Chest-X-Ray-Database-on-Wang-Peng/05e882679d61f4c64a68ebe21826251a39f87e98.
  • 123. Shin HC, Roberts K, Lu L, Demner-Fushman D, Yao J, Summers RM. Learning to read chest X-rays: recurrent neural cascade model for automated image annotation (Online). arXiv 1603.08486, 2016. doi: 10.48550/arXiv.1603.08486. [DOI]
  • 124. Sezgin E, Huang Y, Ramtekkar U, Lin S. Readiness for voice assistants to support healthcare delivery during a health crisis and pandemic. NPJ Digit Med 3: 122, 2020. doi: 10.1038/s41746-020-00332-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 125. Sakthive V, Kesaven MP, William JM, Kumar SK. Integrated platform and response system for healthcare using Alexa. Int J Commun Comput Technol 7: 14–22, 2019. [Google Scholar]
  • 126. Kowatsch T, Nißen M, Shih CH, Rüegger D, Volland D, Filler A, Künzler F, Barata F, Büchter D, Brogle B, Heldt K, Gindrat P, Farpour-Lambert N, l’Allemand D. Text-based Healthcare Chatbots Supporting Patient and Health Professional Teams: Preliminary Results of a Randomized Controlled Trial on Childhood Obesity (Online). Persuasive Embodied Agents for Behavior Change (PEACH2017) Workshop, co-located with the 17th International Conference on Intelligent Virtual Agents (IVA 2017). 2017. alexandria.unisg.ch https://www.alexandria.unisg.ch/id/eprint/252944. [2021 Jan 14].
  • 127. Peters DH, Garg A, Bloom G, Walker DG, Brieger WR, Rahman MH. Poverty and access to health care in developing countries. Ann NY Acad Sci 1136: 161–171, 2008. doi: 10.1196/annals.1425.011. [DOI] [PubMed] [Google Scholar]
  • 128. Naicker S, Plange-Rhule J, Tutt RC, Eastwood JB. Shortage of healthcare workers in developing countries—Africa. Ethn Dis 19: S1–S60, 2009. [PubMed] [Google Scholar]
  • 129. Chang AY, Cowling K, Micah AE, Chapin A, Chen CS, Ikilezi G, , et al. Past, present, and future of global health financing: a review of development assistance, government, out-of-pocket, and other private spending on health for 195 countries, 1995–2050. Lancet 393: 2233–2260, 2019. doi: 10.1016/S0140-6736(19)30841-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 130. Yong E. An ingenious microscope could change how quickly disease is detected (Online). The Atlantic, 2019. https://www.theatlantic.com/science/archive/2019/08/cheap-automatic-microscope-could-change-how-diseases-are-detected/596440. [2020 Dec 23].
  • 131. Gulshan V, Rajan RP, Widner K, Wu D, Wubbels P, Rhodes T, Whitehouse K, Coram M, Corrado G, Ramasamy K, Raman R, Peng L, Webster DR. Performance of a deep-learning algorithm vs manual grading for detecting diabetic retinopathy in India. JAMA Ophthalmol 137: 987–993, 2019. doi: 10.1001/jamaophthalmol.2019.2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 132. Bellemo V, Lim ZW, Lim G, Nguyen QD, Xie Y, Yip MY, Hamzah H, Ho J, Lee XQ, Hsu W, Lee ML, Musonda L, Chandran M, Chipalo-Mutati G, Muma M, Tan GS, Sivaprasad S, Menon G, Wong TY, Ting DS. Artificial intelligence using deep learning to screen for referable and vision-threatening diabetic retinopathy in Africa: a clinical validation study. Lancet Digit Health 1: e35–e44, 2019. doi: 10.1016/S2589-7500(19)30004-4. [DOI] [PubMed] [Google Scholar]
  • 133. Im H, Pathania D, McFarland PJ, Sohani AR, Degani I, Allen M, Coble B, Kilcoyne A, Hong S, Rohrer L, Abramson JS, Dryden-Peterson S, Fexon L, Pivovarov M, Chabner B, Lee H, Castro CM, Weissleder R. Design and clinical validation of a point-of-care device for the diagnosis of lymphoma via contrast-enhanced microholography and machine learning. Nat Biomed Eng 2: 666–674, 2018. doi: 10.1038/s41551-018-0265-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 134. Li H, Soto-Montoya H, Voisin M, Valenzuela LF, Prakash M. Octopi: open configurable high-throughput imaging platform for infectious disease diagnosis in the field (Preprint). bioRxiv 684423, 2019. doi: 10.1101/684423. [DOI]
  • 135. Kirch DG, Petelle K. Addressing the physician shortage: the peril of ignoring demography. JAMA 317: 1947–1948, 2017. doi: 10.1001/jama.2017.2714. [DOI] [PubMed] [Google Scholar]
  • 136.The complexities of physician supply and demand: projections from 2018 to 2033 (Online). https://www.aamc.org/system/files/2020-06/stratcomm-aamc-physician-workforce-projections-june-2020.pdf.
  • 137. Chen PH, Gadepalli K, MacDonald R, Liu Y, Kadowaki S, Nagpal K, Kohlberger T, Dean J, Corrado GS, Hipp JD, Mermel CH, Stumpe MC. An augmented reality microscope with real-time artificial intelligence integration for cancer diagnosis. Nat Med 25: 1453–1457, 2019. doi: 10.1038/s41591-019-0539-7. [DOI] [PubMed] [Google Scholar]
  • 138. Orringer DA, Pandian B, Niknafs YS, Hollon TC, Boyle J, Lewis S, Garrard M, Hervey-Jumper SL, Garton HJ, Maher CO, Heth JA, Sagher O, Wilkinson DA, Snuderl M, Venneti S, Ramkissoon SH, McFadden KA, Fisher-Hubbard A, Lieberman AP, Johnson TD, Xie XS, Trautman JK, Freudiger CW, Camelo-Piragua S. Rapid intraoperative histology of unprocessed surgical specimens via fibre-laser-based stimulated Raman scattering microscopy. Nat Biomed Eng 1:0027, 2017. doi: 10.1038/s41551-016-0027. . [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 139. Hollon TC, Pandian B, Adapa AR, Urias E, Save AV, Khalsa SS, , et al. Near real-time intraoperative brain tumor diagnosis using stimulated Raman histology and deep neural networks. Nat Med 26: 52–58, 2020. doi: 10.1038/s41591-019-0715-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 140. Rivenson Y, Liu T, Wei Z, Zhang Y, de Haan K, Ozcan A. PhaseStain: the digital staining of label-free quantitative phase microscopy images using deep learning. Light Sci Appl 8: 23, 2019. doi: 10.1038/s41377-019-0129-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 141. Mikada T, Kanno T, Kawase T, Miyazaki T, Kawashima K. Suturing support by human cooperative robot control using deep learning. IEEE Access 8: 167739–167746, 2020. doi: 10.1109/ACCESS.2020.3023786. [DOI] [Google Scholar]
  • 142. Wang Z, Majewicz Fey A. Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int J Comput Assist Radiol Surg 13: 1959–1970, 2018. doi: 10.1007/s11548-018-1860-1. [DOI] [PubMed] [Google Scholar]
  • 143. Khalid S, Goldenberg M, Grantcharov T, Taati B, Rudzicz F. Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw Open 3: e201664, 2020. doi: 10.1001/jamanetworkopen.2020.1664. [DOI] [PubMed] [Google Scholar]
  • 144. Jakicic JM, Davis KK, Rogers RJ, King WC, Marcus MD, Helsel D, Rickman AD, Wahed AS, Belle SH. Effect of wearable technology combined with a lifestyle intervention on long-term weight loss: the IDEA randomized clinical trial. JAMA 316: 1161–1171, 2016. doi: 10.1001/jama.2016.12858. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 145. Tang PC, Smith MD. Democratization of health care. JAMA 316: 1663–1664, 2016. doi: 10.1001/jama.2016.14060. [DOI] [PubMed] [Google Scholar]
  • 146. Hershman SG, Bot BM, Shcherbina A, Doerr M, Moayedi Y, Pavlovic A, Waggott D, Cho MK, Rosenberger ME, Haskell WL, Myers J, Champagne MA, Mignot E, Salvi D, Landray M, Tarassenko L, Harrington RA, Yeung AC, McConnell MV, Ashley EA. Physical activity, sleep and cardiovascular health data for 50,000 individuals from the MyHeart Counts Study. Sci Data 6: 24, 2019. doi: 10.1038/s41597-019-0016-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 147. Torres-Soto J, Ashley EA. Multi-task deep learning for cardiac rhythm detection in wearable devices. NPJ Digit Med 3: 116, 2020. doi: 10.1038/s41746-020-00320-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 148. Gotlibovych I, Crawford S, Goyal D, Liu J, Kerem Y, Benaron D, Yilmaz D, Marcus G, Li Y. End-to-end deep learning from raw sensor data: atrial fibrillation detection using wearables (Online). arXiv 1807.10707, 2018. doi: 10.48550/arXiv.1807.10707. [DOI]
  • 149. Acharya UR, Fujita H, Lih OS, Hagiwara Y, Tan JH, Adam M. Automated detection of arrhythmias using different intervals of tachycardia ECG segments with convolutional neural network. Inf Sci 405: 81–90, 2017. doi: 10.1016/j.ins.2017.04.012. [DOI] [Google Scholar]
  • 150. Mathews SM, Kambhamettu C, Barner KE. A novel application of deep learning for single-lead ECG classification. Comput Biol Med 99: 53–62, 2018. doi: 10.1016/j.compbiomed.2018.05.013. [DOI] [PubMed] [Google Scholar]
  • 151. Mannino RG, Myers DR, Tyburski EA, Caruso C, Boudreaux J, Leong T, Clifford GD, Lam WA. Smartphone app for non-invasive detection of anemia using only patient-sourced photos. Nat Commun 9: 4924, 2018. doi: 10.1038/s41467-018-07262-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 152. Avram R, Olgin JE, Kuhar P, Hughes JW, Marcus GM, Pletcher MJ, Aschbacher K, Tison GH. A digital biomarker of diabetes from smartphone-based vascular signals. Nat Med 26: 1576–1582, 2020. doi: 10.1038/s41591-020-1010-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 153. Chiu H, Lee P, Ku B, Liu Y. Automated sleep apnea assessment based on machine learning and wearable technology (Abstract). Sleep 43: A461, 2020. doi: 10.1093/sleep/zsaa056.1200. [DOI] [Google Scholar]
  • 154. De Vos M, Prince J, Buchanan T, FitzGerald JJ, Antoniades CA. Discriminating progressive supranuclear palsy from Parkinson’s disease using wearable technology and machine learning. Gait Posture 77: 257–263, 2020. doi: 10.1016/j.gaitpost.2020.02.007. [DOI] [PubMed] [Google Scholar]
  • 155. Zhan A, Mohan S, Tarolli C, Schneider RB, Adams JL, Sharma S, Elson MJ, Spear KL, Glidden AM, Little MA, Terzis A, Dorsey ER, Saria S. Using smartphones and machine learning to quantify Parkinson disease severity: the mobile Parkinson Disease score. JAMA Neurol 75: 876–880, 2018. doi: 10.1001/jamaneurol.2018.0809. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 156. Berry SE, Valdes AM, Drew DA, Asnicar F, Mazidi M, Wolf J, Capdevila J, Hadjigeorgiou G, Davies R, Al Khatib H, Bonnett C, Ganesh S, Bakker E, Hart D, Mangino M, Merino J, Linenberg I, Wyatt P, Ordovas JM, Gardner CD, Delahanty LM, Chan AT, Segata N, Franks PW, Spector TD. Human postprandial responses to food and potential for precision nutrition. Nat Med 26: 964–973, 2020. doi: 10.1038/s41591-020-0934-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 157. Morawski K, Ghazinouri R, Krumme A, Lauffenburger JC, Lu Z, Durfee E, Oley L, Lee J, Mohta N, Haff N, Juusola JL, Choudhry NK. Association of a smartphone application with medication adherence and blood pressure control: the MedISAFE-BP randomized clinical trial. JAMA Intern Med 178: 802–809, 2018. doi: 10.1001/jamainternmed.2018.0447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 158. Mohr DC, Zhang M, Schueller SM. Personal sensing: understanding mental health using ubiquitous sensors and machine learning. Annu Rev Clin Psychol 13: 23–47, 2017. doi: 10.1146/annurev-clinpsy-032816-044949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 159. Sultana M, Al-Jefri M, Lee J. Using machine learning and smartphone and smartwatch data to detect emotional states and transitions: exploratory study. JMIR Mhealth Uhealth 8: e17818, 2020. doi: 10.2196/17818. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 160. Krittanawong C, Rogers AJ, Johnson KW, Wang Z, Turakhia MP, Halperin JL, Narayan SM. Integration of novel monitoring devices with machine learning technology for scalable cardiovascular management. Nat Rev Cardiol 18: 75–91, 2021. doi: 10.1038/s41569-020-00445-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 161. McGinley A, Pearse RM. A national early warning score for acutely ill patients. BMJ 345: e5310, 2012. doi: 10.1136/bmj.e5310. [DOI] [PubMed] [Google Scholar]
  • 162. Artzi NS, Shilo S, Hadar E, Rossman H, Barbash-Hazan S, Ben-Haroush A, Balicer RD, Feldman B, Wiznitzer A, Segal E. Prediction of gestational diabetes based on nationwide electronic health records. Nat Med 26: 71–76, 2020. doi: 10.1038/s41591-019-0724-8. [DOI] [PubMed] [Google Scholar]
  • 163. Schultebraucks K, Shalev AY, Michopoulos V, Grudzen CR, Shin SM, Stevens JS, Maples-Keller JL, Jovanovic T, Bonanno GA, Rothbaum BO, Marmar CR, Nemeroff CB, Ressler KJ, Galatzer-Levy IR. A validated predictive algorithm of post-traumatic stress course following emergency department admission after a traumatic stressor. Nat Med 26: 1084–1088, 2020. doi: 10.1038/s41591-020-0951-z. [DOI] [PubMed] [Google Scholar]
  • 164. Yim J, Chopra R, Spitz T, Winkens J, Obika A, Kelly C, Askham H, Lukic M, Huemer J, Fasler K, Moraes G, Meyer C, Wilson M, Dixon J, Hughes C, Rees G, Khaw PT, Karthikesalingam A, King D, Hassabis D, Suleyman M, Back T, Ledsam JR, Keane PA, De Fauw J. Predicting conversion to wet age-related macular degeneration using deep learning. Nat Med 26: 892–899, 2020. doi: 10.1038/s41591-020-0867-7. [DOI] [PubMed] [Google Scholar]
  • 165. Hyland SL, Faltys M, Hüser M, Lyu X, Gumbsch T, Esteban C, Bock C, Horn M, Moor M, Rieck B, Zimmermann M, Bodenham D, Borgwardt K, Rätsch G, Merz TM. Early prediction of circulatory failure in the intensive care unit using machine learning. Nat Med 26: 364–373, 2020. doi: 10.1038/s41591-020-0789-4. [DOI] [PubMed] [Google Scholar]
  • 166. Soleimani H, Henry K, Zhan A, Pronovost P. Early identification of gastrointestinal bleeding requiring critical care using machine learning (Abstract). Am J Respir Crit Care Med 207: A7145, 2023. [Google Scholar]
  • 167. Begoli E, Kistler D, Bates J. Towards a heterogeneous, polystore-like data architecture for the US Department of Veteran Affairs (VA) enterprise analytics. In: 2016 IEEE International Conference on Big Data (Big Data), 2016, p. 2550–2554. doi: 10.1109/BigData.2016.7840896. [DOI]
  • 168. Sohn M-W, Arnold N, Maynard C, Hynes DM. Accuracy and completeness of mortality data in the Department of Veterans Affairs. Popul Health Metr 4: 2, 2006. doi: 10.1186/1478-7954-4-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 169. Pollard TJ, Johnson AE, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data 5: 180178, 2018. doi: 10.1038/sdata.2018.178. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 170. Futoma J, Hariharan S, Sendak M, Brajer N, Clement M, Bedoya A, O’Brien C, Heller K. An improved multi-output Gaussian process RNN with real-time validation for early sepsis detection (Online). arXiv 1708.05894, 2017. doi: 10.48550/arXiv.1708.05894. [DOI]
  • 171. Raghu A, Komorowski M, Celi LA, Szolovits P, Ghassemi M. Continuous state-space models for optimal sepsis treatment—a deep reinforcement learning approach (Online). arXiv 1705.08422, 2017. doi: 10.48550/arXiv.1705.08422. [DOI]
  • 172. Komorowski M, Celi LA, Badawi O, Gordon AC, Faisal AA. The Artificial Intelligence Clinician learns optimal treatment strategies for sepsis in intensive care. Nat Med 24: 1716–1720, 2018. doi: 10.1038/s41591-018-0213-5. [DOI] [PubMed] [Google Scholar]
  • 173. Henry KE, Hager DN, Pronovost PJ, Saria S. A targeted real-time early warning score (TREWScore) for septic shock. Sci Transl Med 7: 299ra122, 2015. doi: 10.1126/scitranslmed.aab3719. [DOI] [PubMed] [Google Scholar]
  • 174. Burdick H, Pino E, Gabel-Comeau D, McCoy A, Gu C, Roberts J, Le S, Slote J, Pellegrini E, Green-Saxena A, Hoffman J, Das R. Effect of a sepsis prediction algorithm on patient mortality, length of stay and readmission: a prospective multicentre clinical outcomes evaluation of real-world patient data from US hospitals. BMJ Health Care Inform 27: e100109, 2020. doi: 10.1136/bmjhci-2019-100109. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 175. Mutlu U, Colijn JM, Ikram MA, Bonnemaijer PW, Licher S, Wolters FJ, Tiemeier H, Koudstaal PJ, Klaver CC, Ikram MK. Association of retinal neurodegeneration on optical coherence tomography with dementia: a population-based study. JAMA Neurol 75: 1256–1263, 2018. doi: 10.1001/jamaneurol.2018.1563. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 176. Bora A, Balasubramanian S, Babenko B, Virmani S, Venugopalan S, Mitani A, de Oliveira Marinho G, Cuadros J, Ruamviboonsuk P, Corrado GS, Peng L, Webster DR, Varadarajan AV, Hammel N, Liu Y, Bavishi P. Predicting the risk of developing diabetic retinopathy using deep learning. Lancet Digit Health 3: e10–e19, 2021. doi: 10.1016/S2589-7500(20)30250-8. [DOI] [PubMed] [Google Scholar]
  • 177. Courtiol P, Maussion C, Moarii M, Pronier E, Pilcer S, Sefta M, Manceron P, Toldo S, Zaslavskiy M, Le Stang N, Girard N, Elemento O, Nicholson AG, Blay JY, Galateau-Sallé F, Wainrib G, Clozel T. Deep learning-based classification of mesothelioma improves prediction of patient outcome. Nat Med 25: 1519–1525, 2019. doi: 10.1038/s41591-019-0583-3. [DOI] [PubMed] [Google Scholar]
  • 178. Katzman JL, Shaham U, Cloninger A, Bates J, Jiang T, Kluger Y. DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Med Res Methodol 18: 24, 2018. doi: 10.1186/s12874-018-0482-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 179. Saria S. Individualized sepsis treatment using reinforcement learning. Nat Med 24: 1641–1642, 2018. doi: 10.1038/s41591-018-0253-x. [DOI] [PubMed] [Google Scholar]
  • 180. Lou B, Doken S, Zhuang T, Wingerter D, Gidwani M, Mistry N, Ladic L, Kamen A, Abazeed ME. An image-based deep learning framework for individualising radiotherapy dose: a retrospective analysis of outcome prediction. Lancet Digit Health 1: e136–e147, 2019. doi: 10.1016/S2589-7500(19)30058-5. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 181. Rampášek L, Hidru D, Smirnov P, Haibe-Kains B, Goldenberg A. Dr.VAE: improving drug response prediction via modeling of drug perturbation effects. Bioinformatics 35: 3743–3751, 2019. doi: 10.1093/bioinformatics/btz158. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 182. Padmanabhan R, Meskin N, Haddad WM. Reinforcement learning-based control of drug dosing for cancer chemotherapy treatment. Math Biosci 293: 11–20, 2017. doi: 10.1016/j.mbs.2017.08.004. [DOI] [PubMed] [Google Scholar]
  • 183. Shen C, Gonzalez Y, Klages P, Qin N, Jung H, Chen L, Nguyen D, Jiang SB, Jia X. Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer. Phys Med Biol 64: 115013, 2019. doi: 10.1088/1361-6560/ab18bf. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 184. Daoud S, Mdhaffar A, Jmaiel M, Freisleben B. Q-Rank: reinforcement learning for recommending algorithms to predict drug sensitivity to cancer therapy. IEEE J Biomed Health Inform 24: 3154–3161, 2020. doi: 10.1109/JBHI.2020.3004663. [DOI] [PubMed] [Google Scholar]
  • 185. Jalalimanesh A, Shahabi Haghighi H, Ahmadi A, Soltani M. Simulation-based optimization of radiotherapy: Agent-based modeling and reinforcement learning. Math Comput Simul 133: 235–248, 2017. doi: 10.1016/j.matcom.2016.05.008. [DOI] [Google Scholar]
  • 186. Xu Y, Su GH, Ma D, Xiao Y, Shao ZM, Jiang YZ. Technological advances in cancer immunity: from immunogenomics to single-cell analysis and artificial intelligence. Signal Transduct Target Ther 6: 312, 2021. doi: 10.1038/s41392-021-00729-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 187. Borcherding N, Voigt AP, Liu V, Link BK, Zhang W, Jabbari A. Single-cell profiling of cutaneous T-cell lymphoma reveals underlying heterogeneity associated with disease progression. Clin Cancer Res 25: 2996–3005, 2019. doi: 10.1158/1078-0432.CCR-18-3309. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 188. Wu Z, Trevino AE, Wu E, Swanson K, Kim HJ, D’Angio HB, Preska R, Charville GW, Dalerba PD, Egloff AM, Uppaluri R, Duvvuri U, Mayer AT, Zou J. Graph deep learning for the characterization of tumour microenvironments from spatial protein profiles in tissue specimens. Nat Biomed Eng 6: 1435–1448, 2022. doi: 10.1038/s41551-022-00951-w. [DOI] [PubMed] [Google Scholar]
  • 189. Esfahani MS, Hamilton EG, Mehrmohamadi M, Nabet BY, Alig SK, King DA, Steen CB, Macaulay CW, Schultz A, Nesselbush MC, Soo J, Schroers-Martin JG, Chen B, Binkley MS, Stehr H, Chabon JJ, Sworder BJ, Hui AB-Y, Frank MJ, Moding EJ, Liu CL, Newman AM, Isbell JM, Rudin CM, Li BT, Kurtz DM, Diehn M, Alizadeh AA. Inferring gene expression from cell-free DNA fragmentation profiles. Nat Biotechnol 40: 585–597, 2022. doi: 10.1038/s41587-022-01222-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 190. Zviran A, Schulman RC, Shah M, Hill ST, Deochand S, Khamnei CC, , et al. Genome-wide cell-free DNA mutational integration enables ultra-sensitive cancer monitoring. Nat Med 26: 1114–1124, 2020. doi: 10.1038/s41591-020-0915-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 191. Tang W, Wan S, Yang Z, Teschendorff AE, Zou Q. Tumor origin detection with tissue-specific miRNA and DNA methylation markers. Bioinformatics 34: 398–406, 2018. doi: 10.1093/bioinformatics/btx622. [DOI] [PubMed] [Google Scholar]
  • 192. Hegde N, Hipp JD, Liu Y, Emmert-Buck M, Reif E, Smilkov D, Terry M, Cai CJ, Amin MB, Mermel CH, Nelson PQ, Peng LH, Corrado GS, Stumpe MC. Similar image search for histopathology: SMILY. NPJ Digit Med 2: 56, 2019. doi: 10.1038/s41746-019-0131-z. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 193. Kadakia K, Patel B, Shah A. Advancing digital health: FDA innovation during COVID-19. NPJ Digit Med 3: 161, 2020. doi: 10.1038/s41746-020-00371-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 194. Bachtiger P, Peters NS, Walsh SL. Machine learning for COVID-19—asking the right questions. Lancet Digit Health 2: e391–e392, 2020. doi: 10.1016/S2589-7500(20)30162-X. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 195. Mei X, Lee HC, Diao KY, Huang M, Lin B, Liu C, Xie Z, Ma Y, Robson PM, Chung M, Bernheim A, Mani V, Calcagno C, Li K, Li S, Shan H, Lv J, Zhao T, Xia J, Long Q, Steinberger S, Jacobi A, Deyer T, Luksza M, Liu F, Little BP, Fayad ZA, Yang Y. Artificial intelligence-enabled rapid diagnosis of patients with COVID-19. Nat Med 26: 1224–1228, 2020. doi: 10.1038/s41591-020-0931-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 196. Richardson P, Griffin I, Tucker C, Smith D, Oechsle O, Phelan A, Rawling M, Savory E, Stebbing J. Baricitinib as potential treatment for 2019-nCoV acute respiratory disease. Lancet 395: e30–e31, 2020. doi: 10.1016/S0140-6736(20)30304-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 197. Esteva A, Kale A, Paulus R, Hashimoto K, Yin W, Radev D, Socher R. CO-Search: COVID-19 information retrieval with semantic search, question answering, and abstractive summarization (Online). arXiv 2006.09595, 2020. doi: 10.48550/arXiv.2006.09595. [DOI] [PMC free article] [PubMed]
  • 198. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLoS Med 15: e1002683, 2018. doi: 10.1371/journal.pmed.1002683. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 199. Long E, Lin H, Liu Z, Wu X, Wang L, Jiang J, An Y, Lin Z, Li X, Chen J, Li J, Cao Q, Wang D, Liu X, Chen W, Liu Y. An artificial intelligence platform for the multihospital collaborative management of congenital cataracts. Nat Biomed Eng 1: 0024, 2017. doi: 10.1038/s41551-016-0024. [DOI] [Google Scholar]
  • 200. Nestor B, McDermott MB, Chauhan G, Naumann T, Hughes MC, Goldenberg A, Ghassemi M. Rethinking clinical prediction: why machine learning must consider year of care and feature aggregation (Online). arXiv 1811.12583, 2018. doi: 10.48550/arXiv.1811.12583. [DOI]
  • 201. Cruz Rivera S, Liu X, Chan AW, Denniston AK, Calvert MJ; SPIRIT-AI and CONSORT-AI Working Group, SPIRIT-AI and CONSORT-AI Steering Group, SPIRIT-AI and CONSORT-AI Consensus Group. Guidelines for clinical trial protocols for interventions involving artificial intelligence: the SPIRIT-AI extension. Nat Med 26: 1351–1363, 2020. doi: 10.1038/s41591-020-1037-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 202. Liu X, Cruz Rivera S, Moher D, Calvert MJ, Denniston AK; SPIRIT-AI and CONSORT-AI Working Group. Reporting guidelines for clinical trial reports for interventions involving artificial intelligence: the CONSORT-AI extension. Nat Med 26: 1364–1374, 2020. doi: 10.1038/s41591-020-1034-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 203. Amodei D, Olah C, Steinhardt J, Christiano P, Schulman J, Mané D. Concrete problems in AI safety (Online). arXiv 1606.06565, 2016. doi: 10.48550/arXiv.1606.06565. [DOI]
  • 204. Saria S, Subbaswamy A. Tutorial: safe and reliable machine learning (Online). arXiv 1904.07204, 2019. doi: 10.48550/arXiv.1904.07204. [DOI]
  • 205. He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med 25: 30–36, 2019. doi: 10.1038/s41591-018-0307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 206. Clifford GD, Liu C, Moody B, Lehman LW, Silva I, Li Q, Johnson AE, Mark RG. AF classification from a short single lead ECG recording: the PhysioNet/Computing in Cardiology Challenge 2017. Comput Cardiol 44: 10.22489/CinC.2017.065-469, 2017. doi: 10.22489/CinC.2017.065-469. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 207. Goel V. Netflix challenge—improving movie recommendations. In: Recommender System with Machine Learning and Artificial Intelligence, edited by Mohanty SN, Chatterjee JM, Jain S, Elngar AA, Gupta P.. Beverly, MA: Scrivener, 2020, p. 251–267. [Google Scholar]
  • 208. Lehne M, Sass J, Essenwanger A, Schepers J, Thun S. Why digital medicine depends on interoperability. NPJ Digit Med 2: 79, 2019. doi: 10.1038/s41746-019-0158-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 209. Panch T, Mattie H, Celi LA. The “inconvenient truth” about AI in healthcare. NPJ Digit Med 2: 77, 2019. doi: 10.1038/s41746-019-0155-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 210. McMahan HB, Moore E, Ramage D, Agüera y Arcas B. Federated learning of deep networks using model averaging (Online). ArXiv 1602.05629, 2017. doi: 10.148550/arXiv.1602.05629. [DOI]
  • 211. Sheller MJ, Reina GA, Edwards B, Martin J, Bakas S. Multi-institutional deep learning modeling without sharing patient data: a feasibility study on brain tumor segmentation. Brainlesion 11383: 92–104, 2019. doi: 10.1007/978-3-030-11723-8_9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 212. Brisimi TS, Chen R, Mela T, Olshevsky A, Paschalidis IC, Shi W. Federated learning of predictive models from federated electronic health records. Int J Med Inform 112: 59–67, 2018. doi: 10.1016/j.ijmedinf.2018.01.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 213. Xu J, Glicksberg BS, Su C, Walker P, Bian J, Wang F. Federated learning for healthcare informatics (Online). arXiv 1911.06270, 2019. doi: 10.48550/arXiv.1911.06270. [DOI] [PMC free article] [PubMed]
  • 214. Zou J, Schiebinger L. AI can be sexist and racist—it’s time to make it fair. Nature 559: 324–326, 2018. doi: 10.1038/d41586-018-05707-8. [DOI] [PubMed] [Google Scholar]
  • 215. Chen I, Johansson FD, Sontag D. Why is my classifier discriminatory? (Online) arXiv 1805.12002, 2018. doi: 10.48550/arXiv.1805.12002. [DOI]
  • 216. Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D. Grad-CAM: visual explanations from deep networks via gradient-based localization (Online). arXiv 1610.02391, 2016. doi: 10.48550/arXiv.1610.02391. [DOI]
  • 217. Shrikumar A, Greenside P, Kundaje A. Learning important features through propagating activation differences (Online). arXiv 1704.02685, 2017. doi: 10.48550/arXiv.1704.02685. [DOI]
  • 218. Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: Association for Computing Machinery, 2016, p. 1135–1144. [Google Scholar]
  • 219. Koh PW, Liang P. Understanding black-box predictions via influence functions (Online). arXiv 1703.04730, 2017. doi: 10.48550/arXiv.1703.04730. [DOI]
  • 220. Kim B, Wattenberg M, Gilmer J, Cai C, Wexler J, Viegas F, Sayres R. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV) (Online). arXiv 1711.11279, 2017. doi: 10.48550/aeXiv.1711.11279. [DOI]
  • 221. Reyes M, Meier R, Pereira S, Silva CA, Dahlweid FM, von Tengg-Kobligk H, Summers RM, Wiest R. On the interpretability of artificial intelligence in radiology: challenges and opportunities. Radiol Artif Intell 2: e190043, 2020. doi: 10.1148/ryai.2020190043. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 222. Lee H, Yune S, Mansouri M, Kim M, Tajmir SH, Guerrier CE, Ebert SA, Pomerantz SR, Romero JM, Kamalian S, Gonzalez RG, Lev MH, Do S. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng 3: 173–182, 2019. doi: 10.1038/s41551-018-0324-9. [DOI] [PubMed] [Google Scholar]
  • 223. Li W, Yang Y, Zhang K, Long E, He L, Zhang L, Zhu Y, Chen C, Liu Z, Wu X, Yun D, Lv J, Liu Y, Liu X, Lin H. Dense anatomical annotation of slit-lamp images improves the performance of deep learning for the diagnosis of ophthalmic disorders. Nat Biomed Eng 4: 767–777, 2020. doi: 10.1038/s41551-020-0577-y. [DOI] [PubMed] [Google Scholar]
  • 224. Winkler JK, Fink C, Toberer F, Enk A, Deinlein T, Hofmann-Wellenhof R, Thomas L, Lallas A, Blum A, Stolz W, Haenssle HA. Association between surgical skin markings in dermoscopic images and diagnostic performance of a deep learning convolutional neural network for melanoma recognition. JAMA Dermatol 155: 1135–1141, 2019. doi: 10.1001/jamadermatol.2019.1735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 225. Badgeley MA, Zech JR, Oakden-Rayner L, Glicksberg BS, Liu M, Gale W, McConnell MV, Percha B, Snyder TM, Dudley JT. Deep learning predicts hip fracture using confounding patient and healthcare variables. NPJ Digit Med 2: 31, 2019. doi: 10.1038/s41746-019-0105-1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 226.Center for Devices and Radiological Health. Artificial intelligence and machine learning in software as a medical device (Online). US Food and Drug Administration, 2021. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device. [2022 Aug 13]. [Google Scholar]
  • 227. Wu E, Wu K, Daneshjou R, Ouyang D, Ho DE, Zou J. How medical AI devices are evaluated: limitations and recommendations from an analysis of FDA approvals. Nat Med 27: 582–584, 2021. doi: 10.1038/s41591-021-01312-x. [DOI] [PubMed] [Google Scholar]
  • 228. Angus DC. Randomized clinical trials of artificial intelligence. JAMA 323: 1043–1045, 2020. doi: 10.1001/jama.2020.1039. [DOI] [PubMed] [Google Scholar]
  • 229. Kaushal A, Altman R, Langlotz C. Geographic distribution of US cohorts used to train deep learning algorithms. JAMA 324: 1212–1213, 2020. doi: 10.1001/jama.2020.12067. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 230. Seyyed-Kalantari L, Liu G, McDermott MB, Ghassemi M. CheXclusion: fairness gaps in deep chest X-ray classifiers. In: Pacific Symposium on Biocomputing, 2020, p. 232–243. [PubMed] [Google Scholar]
  • 231. Seyyed-Kalantari L, Zhang H, McDermott MB, Chen IY, Ghassemi M. Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations. Nat Med 27: 2176–2182, 2021. doi: 10.1038/s41591-021-01595-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 232. Banerjee I, Bhimireddy AR, Burns JL, Celi LA, Chen LC, Correa R, Dullerud N, Ghassemi M, Huang SC, Kuo PC, Lungren MP, Palmer L, Price BJ, Purkayastha S, Pyrros A, Oakden-Rayner L, Okechukwu C, Seyyed-Kalantari L, Trivedi H, Wang R, Zaiman Z, Zhang H, Gichoya JW. Reading race: AI recognises patient’s racial identity in medical images (Online). arXiv 2107.10356, 2021. doi: 10.48550/arXiv.2107.10356. [DOI]
  • 233. Thambawita V, Jha D, Hammer HL, Johansen HD, Johansen D, Halvorsen P, Riegler MA. An extensive study on cross-dataset bias and evaluation metrics interpretation for machine learning applied to gastrointestinal tract abnormality classification. ACM Trans Comput Healthcare 1: 1–29, 2020. doi: 10.1145/3386295. [DOI] [Google Scholar]
  • 234. Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. Proc Natl Acad Sci USA 117: 12592–12594, 2020. doi: 10.1073/pnas.1919012117. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 235. Chirra P, Leo P, Yim M, Bloch BN, Rastinehad AR, Purysko A, Rosen M, Madabhushi A, Viswanath SE. Multisite evaluation of radiomic feature reproducibility and discriminability for identifying peripheral zone prostate tumors on MRI. J Med Imaging (Bellingham) 6: 024502, 2019. doi: 10.1117/1.JMI.6.2.024502. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 236. Ferryman K. Addressing health disparities in the Food and Drug Administration’s artificial intelligence and machine learning regulatory framework. J Am Med Inform Assoc 27: 2016–2019, 2020. doi: 10.1093/jamia/ocaa133. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 237.Center for Devices and Radiological Health. Postmarket requirements (devices) (Online). U.S. Food and Drug Administration, 2018. https://www.fda.gov/medical-devices/device-advice-comprehensive-regulatory-assistance/postmarket-requirements-devices. [2022 Aug 8]. [Google Scholar]
  • 238.Food and Drug Administration (FDA). Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD)—Discussion Paper and Request for Feedback (Online), 2021. https://www.fda.gov/media/122535/download.
  • 239.Center for Devices and Radiological Health. Artificial Intelligence and Machine Learning Program: research on AI/ML-based medical devices (Online). US Food and Drug Administration, 2021. https://www.fda.gov/medical-devices/medical-device-regulatory-science-research-programs-conducted-osel/artificial-intelligence-and-machine-learning-program-research-aiml-based-medical-devices. [2022 Aug 8]. [Google Scholar]
  • 240.US FDA. Artificial intelligence/machine learning (AI/ML)-based software as a medical device (SaMD) action plan (Online), 2021. https://www.fda.gov/media/145022/download.
  • 241. Forrest S. Artificial intelligence/machine learning (AI/ML)-enabled medical devices: tailoring a regulatory framework to encourage responsible innovation in AI/ML (Online). FDA, 2022. https://www.fda.gov/media/160125/download. [2022 Aug 14]. [Google Scholar]
  • 242. Price WN 2nd, Gerke S, Cohen IG. Potential liability for physicians using artificial intelligence. JAMA 322: 1765–1766, 2019. doi: 10.1001/jama.2019.15064. [DOI] [PubMed] [Google Scholar]
  • 243. Liu X, Glocker B, McCradden MM, Ghassemi M, Denniston AK, Oakden-Rayner L. The medical algorithmic audit. Lancet Digit Health 4: e384–e397, 2022. doi: 10.1016/S2589-7500(22)00003-6. [DOI] [PubMed] [Google Scholar]
  • 244. Abràmoff MD, Roehrenbeck C, Trujillo S, Goldstein J, Graves AS, Repka MX, Silva E 3rd.. A reimbursement framework for artificial intelligence in healthcare. NPJ Digit Med 5: 72, 2022. doi: 10.1038/s41746-022-00621-w. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 245. Parikh RB, Helmchen LA. Paying for artificial intelligence in medicine. NPJ Digit Med 5: 63, 2022. doi: 10.1038/s41746-022-00609-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 246. Cerrato P, Halamka J, Pencina M. A proposal for developing a platform that evaluates algorithmic equity and accuracy. BMJ Health Care Inform 29: e100423, 2022. doi: 10.1136/bmjhci-2021-100423. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 247. Pesapane F, Volonté C, Codari M, Sardanelli F. Artificial intelligence as a medical device in radiology: ethical and regulatory issues in Europe and the United States. Insights Imaging 9: 745–753, 2018. doi: 10.1007/s13244-018-0645-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 248. Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M. Ethical machine learning in healthcare. Annu Rev Biomed Data Sci 4: 123–144, 2021. doi: 10.1146/annurev-biodatasci-092820-114757. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 249. Erion G, Janizek JD, Hudelson C, Utarnachitt RB, McCoy AM, Sayre MR, White NJ, Lee SI. A cost-aware framework for the development of AI models for healthcare applications. Nat Biomed Eng 6: 1384–1398, 2022. doi: 10.1038/s41551-022-00872-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 250. Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat Med 27: 136–140, 2021. doi: 10.1038/s41591-020-01192-7. [DOI] [PubMed] [Google Scholar]
  • 251. Emanuel EJ, Wachter RM. Artificial intelligence in health care: will the value match the hype? JAMA 321: 2281–2282, 2019. doi: 10.1001/jama.2019.4914. [DOI] [PubMed] [Google Scholar]
  • 252. Wong TY, Bressler NM. Artificial intelligence with deep learning technology looks into diabetic retinopathy screening. JAMA 316: 2366–2367, 2016. doi: 10.1001/jama.2016.17563. [DOI] [PubMed] [Google Scholar]
  • 253. Morse KE, Bagley SC, Shah NH. Estimate the hidden deployment cost of predictive models to improve patient care. Nat Med 26: 18–19, 2020. doi: 10.1038/s41591-019-0651-8. [DOI] [PubMed] [Google Scholar]
  • 254. Kelly CJ, Karthikesalingam A, Suleyman M, Corrado G, King D. Key challenges for delivering clinical impact with artificial intelligence. BMC Med 17: 195, 2019. doi: 10.1186/s12916-019-1426-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 255. Lehman CD, Wellman RD, Buist DS, Kerlikowske K, Tosteson AN, Miglioretti DL; Breast Cancer Surveillance Consortium. Diagnostic accuracy of digital screening mammography with and without computer-aided detection. JAMA Intern Med 175: 1828–1837, 2015. doi: 10.1001/jamainternmed.2015.5231. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 256.INFANT Collaborative Group. Computerised interpretation of fetal heart rate during labour (INFANT): a randomised controlled trial. Lancet 389: 1719–1729, 2017. doi: 10.1016/S0140-6736(17)30568-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 257. Liu X, Faes L, Kale AU, Wagner SK, Fu DJ, Bruynseels A, Mahendiran T, Moraes G, Shamdas M, Kern C, Ledsam JR, Schmid MK, Balaskas K, Topol EJ, Bachmann LM, Keane PA, Denniston AK. A comparison of deep learning performance against health-care professionals in detecting diseases from medical imaging: a systematic review and meta-analysis. Lancet Digit Health 1: e271–e297, 2019. doi: 10.1016/S2589-7500(19)30123-2. [DOI] [PubMed] [Google Scholar]
  • 258. Shah NH, Milstein A, Bagley SC. Making machine learning models clinically useful. JAMA 322: 1351–1352, 2019. doi: 10.1001/jama.2019.10306. [DOI] [PubMed] [Google Scholar]
  • 259. Jung K, Kashyap S, Avati A, Harman S, Shaw H, Li R, Smith M, Shum K, Javitz J, Vetteth Y, Seto T, Bagley SC, Shah NH. A framework for making predictive models useful in practice. J Am Med Inform Assoc 28: 1149–1158, 2021. doi: 10.1093/jamia/ocaa318. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Physiological Reviews are provided here courtesy of American Physiological Society

RESOURCES