Skip to main content
Deutsches Ärzteblatt International logoLink to Deutsches Ärzteblatt International
. 2023 Jul 10;120(27-28):463–469. doi: 10.3238/arztebl.m2023.0124

The Quality and Utility of Artificial Intelligence in Patient Care

Kai Wehkamp 1,2,*, Michael Krawczak 3, Stefan Schreiber 1,4
PMCID: PMC10487679  PMID: 37218054

Abstract

Background

Artificial intelligence (AI) is increasingly being used in patient care. In the future, physicians will need to understand not only the basic functioning of AI applications, but also their quality, utility, and risks.

Methods

This article is based on a selective review of the literature on the principles, quality, limitations, and benefits AI applications in patient care, along with examples of individual applications.

Results

The number of AI applications in patient care is rising, with more than 500 approvals in the United States to date. Their quality and utility are based on a number of interdependent factors, including the real-life setting, the type and amount of data collected, the choice of variables used by the application, the algorithms used, and the goal and implementation of each application. Bias (which may be hidden) and errors can arise at all these levels. Any evaluation of the quality and utility of an AI application must, therefore, be conducted according to the scientific principles of evidence-based medicine—a requirement that is often hampered by a lack of transparency.

Conclusion

AI has the potential to improve patient care while meeting the challenge of dealing with an ever-increasing surfeit of information and data in medicine with limited human resources. The limitations and risks of AI applications require critical and responsible consideration. This can best be achieved through a combination of scientific


cme plus

This article has been certified by the North Rhine Academy for Continuing Medical Education. Participation in the CME certification program is possible only over the internet: cme.aerzteblatt.de. The deadline for submission is 09 July 2024.

Human intelligence is one of the most remarkable results of evolution. Of crucial importance for the intellectual performance of our brain is its ability to build models that are able to provide a detailed representation of complex reality with the goal of making predictions in order to successfully interact with our environment (1). Artificial intelligence (AI), in contrast, is a collective term for processes that enable computers to fulfill tasks that normally require human intelligence. As such, even the algorithms in a simple chess computer constitute AI. One form of AI is what is known as machine learning (ML), in which patterns are derived from data in order to either better interpret underlying data or make certain predictions based on these data.

Significant advances have been made in the field of ML in the last 10 years, not least through the development of multilayer (deep) artificial neural networks (deep neural networks, DNN) (2). However, it will likely take years or decades (if ever) before ML or AI is able to fully match the broad spectrum of human intelligence (3). Nevertheless, AI in the form of ML is already achieving results that exceed human performance in some areas of medicine. However, the future development of methods of this kind must be accompanied by critical and independent expertise in order that medicine can continue to do justice to the maxim of providing the best-possible patient care. The medical profession carries special responsibility here.

This article provides an overview of the important aspects of assessing the quality, utility, and limitations of AI applications in patient care, not least to contribute to the responsible use of this technology.

Methods

Based on a selective literature search in PubMed, the article presents selected aspects in the evaluation of the quality and benefits of (in particular ML-based) AI applications in patient care. This presentation of the status quo is expanded on with examples of current applications taken from the relevant specialist media and scientific studies.

Results

Data as the basis of machine learning

Machine learning (ML) is based on data that provide an example representation of a particular learning world. Patterns or abstract rules need to be recognized in the learning data and then applied to new data in order to recognize characteristics, make predictions, or generate statements. Thus, conceptually, ML exhibits considerable similarity to human learning from examples and the recognition of similarities and differences.

The more unstructured the data used and the greater the need to combine different data modalities, the greater the challenges posed to an AI application (4). For example, there are AI-based techniques that are able to detect breast cancer on mammograms with a sensitivity and specificity comparable to that of an averagely experienced radiologist (but hitherto not that of an expert) (5). However, an AI-based approach to gaining knowledge from a variety of unstructured data types, such as DNA sequences, histopathological images, and laboratory findings, still fails in practice (6). Moreover, the use of extensive medical datasets always harbors the risk of violating individual personal rights, which could lead to restrictions in line with data protection laws (7). At present, the limited quality and availability of complex and heterogeneous data are among the challenges in broad areas that cannot be satisfactorily solved for the implementation of medical AI applications.

Concepts of machine learning in medicine

ML approaches can be divided into primarily three groups (figure 1):

Figure 1.

Figure 1

Principal concepts of machine learning

Unsupervised learning attempts, without concrete specifications, to identify associations, structures, or anomalies in data. This approach is used, for example, to identify subgroups in multiomics datasets (8). In patient care, methods of unsupervised learning are still at an experimental stage, but their use is conceivable in the future, for example, in syndromic surveillance—potentially as part of outbreak monitoring for infectious diseases (9).

Reinforcement learning trains in terms of rewards given for a particular outcome. In medicine, this approach has also only been investigated in studies so far, but may be suitable in the future, for example, to adapt insulin administration to the individual patient in a closed-loop approach (10).

Approaches in supervised learned are often aimed at classifying data or predicting future events. The respective algorithms are trained with training data in which the learning objective is specified (for example, X-rays with marked masses and images for comparison that contain no masses). The patterns recognized are then validated in terms of their quality on test datasets. The majority of AI applications that already have marketing authorization are based on supervised learning from uniform, unimodal data (for example, solely analyzing images of possible skin lesions to identify malignant lesions) (11, 12).

Risks and limitations of AI applications

In order to be able to assess the risks and limitations in the validity of ML applications, it is important to be aware of the ML lifecycle that always underlies them and which is based on multiple, strongly interdependent stages (figure 2). The first stage focuses on the real-life conditions which are mapped as representatively as possible in the form of digital data. The related variables need to be selected and prepared (referred to as feature selection and engineering, which are partially dispensed with in DNNs), in order to then be analyzed by the ML algorithm. The results are utilized by users (namely, physicians) and in turn have an effect on the real world (namely, the treatment of patients) (13).

Figure 2.

Figure 2

Selected limitations and risks in terms of the quality of artificial intelligence (AI) applications at stages of the learning and application lifecycle of machine learning (ML)

At each stage of the ML lifecycle, a multitude of partially redundant influencing factors are at work that can significantly distort the results of an AI application and limit its validity. These limitations are mainly responsible for the fact that the practical application of AI in patient care still falls short of expectations and hopes in many areas. Therefore, a critical reflection on the individual stages of the ML lifecycle is essential for a realistic evaluation of the potentials and qualities of ML applications (14, 15).

Real-life world

People live in real-life worlds. These are, as a rule, characterized by socioeconomic, biological, and other inhomogeneities that may be associated with a threat or disadvantage to certain individuals or population groups. When collecting the data that underlie an AI application, potential biases of this kind must be taken into account and, where necessary, compensated for (14, 16, 17).

Digital data

In principle, data are only able to represent the real-life world incompletely and in partial aspects. In order to nevertheless achieve an adequately good picture of the real-life world, the data collection itself must be as objective, precise, and accurate as possible. Also, when selecting a data source, it is important to ensure an appropriate level of representativeness (18). However, a lot of medical information, in particular individual- specific information, can only be recorded in text form using the complexity of natural language, that is to say, in the form of unstructured data that need to be preprocessed by means of language recognition (natural language processing) (19). And finally, information that cannot be digitally documented cannot generally be made usable for AI applications (for example, making an assessment of a patient’s overall clinical picture based on experience and intuition) (15, 20).

Selection and preparation of variables

In order to obtain the most valid models possible of the real world from AI applications, the variables included therein need to be suitably selected and prepared (for example, restricted to X-ray findings and specific clinical parameters in oncology diagnostics). This selection, as well as the subsequent standardization and normalization of data, can reduce their representativeness and limit the validity of the results of an AI application (15).

Algorithm design

The design of an ML algorithm includes programming the software code and integrating the previously selected variables. Likewise at this stage, errors and biases may occur, for example as a result of inadequate consideration of special features of the data to be used, unclear definitions of targets for pattern recognition, or embedding unsuitable cut-off values (13, 21). To ensure sufficient acceptance and critical reflection on design by the user, the algorithm should also be able to provide explanations of the obtained results in each case (referred to as explainability) (22, 23).

Application in the real world

At the stage of practical application, namely, patient care, errors and biases at all the aforementioned stages of the ML lifecyle can have a negative effect. Particular risks arise as a result of unconsidered differences between the learning world and the application world and through a lack of orientation of AI applications to their subsequent practical deployment (24). Imprecise results, technical hurdles, insufficient content transparency, and mistrust quickly result in failure to fully exploit the potential of AI applications in patient care, for example, when programs for the AI-based analysis of histopathological findings cannot be integrated into existing workflows or they fail to save time (18, 25, 26). On the other hand, the noncritical, overly confident use of AI applications can lead to, for example, important differential diagnostic considerations being ignored in practice. In principle, the unreflected pursuit of AI-based treatment concepts carries the risk that medicine will be robbed of important human factors through over-technification. For example, the quasi-objective calculation of outcome probabilities poses a major challenge in terms of the differentiated communication between physicians and patients (27, 28). Ethical dilemmas can also escalate if the results of AI applications are used, without reflection, as the basis for decisions regarding allocation and prioritization (29).

Quality and utility of clinical AI applications

Evidence base for the assessment of ML applications

A scientific basis is one of the fundamental quality requirements placed on modern medicine. Accordingly, it should also be possible to transparently assess the objectivity (freedom from uncontrolled influencing factors), reliability, and validity of AI-based medical applications. In order to represent the quality of AI algorithms for decision-making, the statistical test variables sensitivity, specificity, and precision (positive predictive value) are mostly used. These should be complemented by a critical assessment of bias and risk in the respective ML lifecycle (figure 2). Furthermore, an evidence-based evaluation of utility includes an investigation of the method in a real-world setting and in comparison to alternative procedures, similar to the approach used in a clinical trial (for example, a prospective intervention study that compares an AI-based with a classic diagnostic procedure) (30).

Depending on the application, appropriate patient-specific endpoints such as quality of life, survival, disease progression, and symptom reduction should be assessed in addition to accuracy. Ideally, the improvement of these kinds of patient-specific endpoints should be borne in mind as early on as at the training stage of an AI application. Only in this way can the application have the prospect of becoming better than an evaluation of diagnosis and treatment data undertaken by humans (24). However, to date, only a handful of prospective studies have examined AI applications compared to the status quo of medical care or have already been able to demonstrate a utility in this regard (31, 32). A comprehensive evaluation of the additional benefit of an AI application includes not only an assessment of potential risks to patient safety but also that of cost-effectiveness (including potential savings in terms of time and resources) and ethical and sociocultural consequences (3336).

Practical implementation of ML-based applications in patient care

The US Food and Drug Administration (FDA) currently lists 521 authorized medical AI applications (37). Official data are lacking in Germany, but one can assume a few dozen authorizations to date.

Judging by the number of AI-related publications, the real relevance of applications approved in Germany is comparatively modest. The reasons for this include, in particular, the aforementioned limitations with regard to the data basis, the limited transferability of applications from the learning world to the application world, and the challenges faced in terms of their practicable and economically beneficial integration in existing care processes (25). As a rule, AI applications for patient care are medical devices that can only be marketed or used after a conformity assessment has been carried out for the respective risk class. Technically, their approval usually relates only to decision support and requires that responsibility remain with the physicians using them. Thus, the unconsidered use of these methods (for example, in the sense of an automation bias) may involve risks (18). Moreover, it has not been mandatory to date to publish utility-oriented application studies for medical device approval. Rather, the basis for approval is often limited to functionality testing without any published scientific research, or the associated studies took place in artificial settings. Thus, the quality and utility of many of the ML-based applications already in use in Germany are presented in an accordingly non-transparent manner.

The Table presents a synopsis of some of the systems approved (or in the process of being approved) in Germany, together with examples of the available scientific evidence in each case. The vast majority of applications are based on unimodal, uniform data. Overall, the publication basis with regard to approved AI applications provides a disparate and at times non-transparent picture. For some techniques, one can infer utility from published application studies, for example, ML-facilitated colonoscopic detection of colorectal polyps or photo-based detection of malignant skin lesions. These applications were shown to perform at a level comparable to standard techniques in both cases (12, 38). For other authorized applications, either no data on clinical utility or only selected statistical parameters (for example, sensitivity, specificity) have been published. For other applications, utility is not demonstrated by an additional clinical benefit, but rather by more efficient processes or a lowering of treatment barriers. Examples of this can be taken from areas in which specific specialist knowledge is lacking on site (for example, identifying rare electrocardiographic findings) or in which high throughput is required (for example, mammography screening). Lastly, cost savings and the facilitation of access to certain treatments in underserved regions can significantly contribute to the practical utility of AI applications, such as in the diagnosis of diabetic retinopathy and malignant melanoma (e1e11).

Summary

A rapidly growing knowledge and information base as well as the diagnostic and therapeutic opportunities resulting therefrom pose the challenge to medicine of either compressing this information in such a way that it remains manageable or using it in its entirety and in the best possible way for the good of patients and society.

Physicians today are forced to expend an ever greater amount of effort to stay abreast of the current state of science and technology and, at the same time, cope with the economic boundary conditions and meet the demands of humane medicine. In so doing, they sometimes reach the limits of their capacities. Machine learning, as currently the most powerful development in artificial intelligence, mimics human learning and, depending on the quality of data and the available computing power, is able to provide ever better medical predictions and classifications.

The current evidence shows that preventive, diagnostic, and therapeutic patient care can increasingly benefit from AI support. However, any technology that has indirect effects on medical practice, and thus potentially on people’s lives, diseases, and deaths, must be especially carefully scrutinized in terms of utility and risks. In the learning and application lifecycle of ML, risks can arise at various stages as a result of possible biases, negative reinforcements, and errors. Therefore, AI applications represent a potential risk to patients and, as such, still need to be subjected to the critical scrutiny of human judgment.

A broad array of skills is required for the quality assessment of AI applications, ranging from the original medical expertise, the design of care processes, and data science to computer science, ethics, and law. Precisely since not all of these facets belong to the narrower domain of medicine, the medical profession must gain a comprehensive understanding of AI in order to be able to fulfill its social responsibility by implementing AI in patient care in a critically considered manner (for example, via the course: www.ki-campus.org/courses/drmedki_basics_cme) (ebox) (39, 40).

eBOX. Explanations of selected terms and constructs.

Big Data

The term Big Data refers to large and often unstructured volumes of data that, due to their size and complexity, can no longer be readily processed and interpreted by humans or simple algorithms. Applications of → machine learning are developed to be trained with large datasets and recognize patterns therein. It is hoped that in the future these applications will identify hitherto unrecognized patterns in complex medical data (for example, certain medical correlations and risk factors). An example of Big Data would be in the field of → multiomics.

Deep learning, deep neural network (DNN)

Deep learning refers to a particular type of → machine learning based on what are known as deep neural networks. Its hallmark is an external input layer that receives the information to be processed (for example, the pixels of an X-ray image). The individual pieces of input information are weighted, connected, and forwarded via digital connection points (referred to as neurons) through several layers of nodes in order to ultimately obtain as an output the processing or interpretation of the input information (for example, classification of a lung nodule on an X-ray). Deep neural networks get their name from their multilayer design. In supervised learning, the training of a neural network corresponds, in simple terms, to a weighting of the digital neurons. This is achieved by presenting a multitude of examples on the input layer (for example, X-ray images with and without lung nodules) that are optimized to the specified output (for example, “this is a lung nodule,” or “this is a normal finding”). The connections and weightings of neurons are continuously adjusted by the system during training and have a complexity and depth that can generally no longer be made transparent. Optimally, the trained neural network is then capable of linking previously unknown inputs (for example, new X-ray images) with the correct output.

Conformity assessment for medical devices

Medical devices, which include machine learning-based applications, must demonstrate their compliance with the German Medical Devices Act (which in turn implements the provisions of the European Medical Device Regulation [MDR]). Depending on the → risk class of the medical product, the manufacturer must, among other things, provide technical documentation and a specific quality management system. The content hereof must show that the product fulfills the presented functions. To date, no clinical application studies have needed to be provided for the conformity assessment of common AI applications. However, there is talk that in the future—much like in the USA—clinical studies will be required for the

approval of certain applications.

Machine learning

Machine learning (ML) refers to a particular technology to generate artificial intelligence. The hallmark of ML is digital, technical learning of patterns based on datasets with the aim of using these patterns to interpret new, previously unknown data. Supervised learning, unsupervised learning, and reinforcement learning are subgroups of machine learning (figure 1). There are various techniques for programming machine learning systems. These include, for example, deep neural

networks (→ deep learning) as well as what are known as decision forests.

Multiomics

Multiomics is the term used to refer to the integration and interpretation of different biological categories of data, that is to say, the genome, transcriptome, proteome, metabolome, epigenome, microbiome, etc. The term is derived from the last syllables that are common to English terms used in technological fields (for example, genomics). An important branch of research is currently developing here. It is hoped that by connecting these large amounts of data (→ Big Data), in combination with machine learning, it will be possible to identify previously unknown mechanisms and associations for risk

factors and diseases.

Natural language processing

Natural language processing (NLP) refers to digital techniques to process and interpret language. Today, NLP is generally based on → deep learning, that is to say, deep neural networks as a technique of → machine learning (ML). This involves training the systems with text or audio data. In complex ML applications, text data (for example, medical notes) are partially processed with NLP, to then be merged with other data in a subsequent step.

Medical device risk classes

In line with the European and German regulations, a distinction is made between four different risk classes, into which machine learning applications are also classified:

  • Class I: Low risk (for example, reading glasses, period tracker apps)

  • Class IIa: Medium risk (for example, ultrasound equipment, software to aid medical diagnosis that poses no immediate risk to patients)

  • Class IIb: High risk (for example, ventilators, software that monitors vital functions)

  • Class III: Very high risk (for example, drugs, autonomous software systems that can directly cause death in the case of malfunction).

Depending on the level, different requirements are set for the → conformity assessment. Many machine learning-based applications technically serve only to aid medical decision-making, meaning that responsibility remains with the human. Therefore, only the less stringent safety criteria (Class 1 or IIa) need to be met, which means that risks arise if these systems are nevertheless used in practice without human supervision.

Due to the complexity of AI applications and their tendency to be opaque, there is also a need for regulatory safeguards that place a strong emphasis on the practical benefits, the application-related and sometimes considerable risks, and on ensuring a high level of content transparency. Used responsibly, AI can promote evidence-based and cost-efficient patient care in the future, while at the same time supporting the human essence (and human intelligence) of medicine.

Table. Examples of AI-based applications in patient care that have been approved or are in the approval process*.

Specialty/function Data basis Example application AI concept Associated studies Results
Dermatology
Recognition of dermal melanoma
Dermoscopic images Moleanalyzer pro Supervised learning Haenssle et al.,
Ann Oncol, 2020 (e1)
Sensitivity: 95% (vs. expert: 89%)
Specificity: 76.7% (vs. expert: 80.7%)
Accuracy: 84% (vs. expert: 84%)
Diabetology
Blood glucose control in intensive care patients
Glucose level,
carbohydrate intake, insulin administration
Space GlucoseControl System Model-based predictionalgorithm Blaha et al.,
BMC Anesthesiology, 2016 (e2)
Blood glucose in target range: 83% of the time
Episodes of severe hypoglycemia: 0.01% of the time
Gastroenterology
Recognition of colorectal neoplasia
Colonoscopy images GI-Genius Supervised learning Repici et al.,
Gastroenterology, 2020 (e3)
Sensitivity: 99.7%
Specificity: 91.1%
Detection rate with AI: 54.8%
(vs. 40.4% without AI support)
Gynecology
Breast cancer recognition
Digital mammography Transpara Supervised learning Romere-Martín et al.,
Radiology, 2022 (e4)
Sensitivity: 70.8%
(vs. double expert reading: 67.3%)
Heart surgery
Prediction of risk of postoperative bleeding after heart surgery
Structured clinical data
(vital parameters, surgery, test results, etc.)
x-c-bleeding Supervised learning Meyer et al.,
Lancet Respiratory Medicine, 2018 (e5)
Sensitivity: 74% (vs. Bojar algorithm: 21%)
Specificity: 84% (vs. Bojar algorithm: 95%)
Accuracy: 80% (vs. Bojar algorithm: 58%)
PPV 84% (vs. Bojar algorithm: 81%)
Cardiology
Myocardial ischemia detection
Vectorcardiography data Cardisio Supervised learning Braun et al.,
Journal of Electrocardiology,
2020 (e6)
Sensitivity: 90.2%/97.2% (females/males)
Specificity: 74.4%/76.1% (females/males)
Accuracy: 82.5%/90.7% (females/males)
Nephrology
Prediction of postoperative risk of renal injury after heart surgery
Structured clinical data
(vital parameters, surgery, test results, etc.
x-c-renal injury Supervised learning Meyer et al.,
Lancet Respiratory Medicine,
2018 (e5)
Sensitivity 94% (vs. KDIGO renal failure: 53%)
Specificity: 86% (vs. KDIGO renal failure 92%)
Accuracy: 90% (vs. KDIGO renal failure 73%)
PPV: 87% (vs. KDIGO renal failure 87%)
Ophthalmology
Detection of diabetic retinopathy (mtmDR)
Tw-field undilated
fundus photography
EyeArt Supervised learning Ipp et al.,
JAMA network open,
2021 (e7)
Sensitivity: 95.5%
Specificity: 85.0%
PPV: 59.5%
Orthopedics
Optimizing functionality of upper limb prostheses
Continuous
myoelectric derivation
Myo Plus Algorithm-basedpattern recognition Franzke et al.,
Plos One, 2019 (e8)
Qualitative assessment: intuitive control possible, not always reliable in daily use, extensive training required (better than conventional control)
Pathology
Detection of NSCLC tumor cells (lung)
Histopathology with
immunohistochemical staining for PD-L1
Mindpeak Lung (NSCLC) PD-L1 Supervised learning Daifalle, Günther,
Mindpeak Website, 2022 (e9)
Rate of agreement with expert assessment (at a1% proportion of tumor cells in analysis): 85% (vs. conventional assessment: 83%)
Psychiatry
Early detection of delirium risk in inpatients
Structured clinical data
(vital parameters, laboratory results, diagnoses, etc.)
Clinalytix Supervised learning Sun et al.,
Journal of Medical Internet
Research, 2022 (e10)
Sensitivity: 80%
Specificity: > 85%
Radiology
Detection of lung nodules
Low-dose chest CT AI-Rad Companion Chest CT Supervised learning Chamberlin et al.,
BMC Medicine, 2021 (e11)
Sensitivity: 100%
Specificity: 70%
PPV: 83.1%
AI variability vs. expert: 0.741 (Cohen’s kappa)

* The results of associated studies are shown; further key figures, reference methods, and study concepts can be found in the respective publications (for references, see Appendix). The applications generally offer only decision-making aids and should never replace a physician’s decision. The table shows only a selection of examples of approved applications.CT, computed tomography; AI, artificial intelligence; mtmDR, more than mild diabetic retinopathy; KDIGO, Kidney Disease: Improving Global Outcomes; NSCLC, non-small cell lung cancer; PPV, positive predictive value

Questions on the article in issue 27–28/2023: The Quality and Utility of Artificial Intelligence in Patient Care.

The submission deadline is 09 July 2024. Only one answer is possible per question. Please select the answer that is most appropriate.

Question 1

Which statement on the relationships between human intelligence, artificial intelligence, and machine learning is correct?

  1. Today, artificial intelligence is superior to all facets of human intelligence.

  2. Due to the development of deep neural networks (deep learning), great advances have been made in the field of machine learning in recent years.

  3. A chess computer generally does not possess artificial intelligence.

  4. It is generally anticipated that artificial intelligence will surpass human intelligence in all fields by the end of the current decade.

  5. As a rule, the tasks typically performed by machine learning applications can be performed faster and more accurately by humans.

Question 2

For a machine learning application, findings suspicious for malignancy in a large dataset of mammography images are labeled by an experienced radiologist. An algorithm is trained on the basis of these images and the resulting pattern is tested on new, previously unprocessed mammography images. It has been demonstrated that abnormal findings can be identified with high precision. Which type of machine learning is involved here?

  1. Unsupervised learning

  2. Reinforcement learning

  3. Genetic learning

  4. Supervised learning

  5. Explanatory learning

Question 3

The goal is to train a machine learning algorithm to identify epidemic outbreaks of infection in the data on infectious disease reported to public health authorities. In order to do this, the algorithm must learn to differentiate between the background noise of sporadic infections and an emerging outbreak. Which type of machine learning is involved here?

  1. Unsupervised learning

  2. Reinforcement learning

  3. Genetic learning

  4. Supervised learning

  5. Explanatory learning

Question 4

The machine learning lifecycle represents the relationship between the various interdependent stages. Which sequence correctly reflects the content and order?

  1. Programming → Data collection → Testing → Database creation → Programming

  2. Real-life word → Creation of random data → Programming → Creation of application data

  3. Research, diagnostic score → Programming, data recognition → Calculation of diagnostic scores applied → Publication

  4. Application → Feedback → Data validation → Test environment → Application

  5. Real-life world → Digital data → Preparation of variables → Design of the AI algorithm, application → Real-life world

Question 5

Which argument is put forward in the text to support the view that physicians should embrace the concepts of artificial intelligence?

  1. To be able to assess the risks of AI applications

  2. To impress the patients and their relatives

  3. To be able to develop their own AI applications

  4. To be able to program interfaces

  5. To collect data

Question 6

Approximately how many medical AI applications have already been approved by the FDA?

  1. Approximately 10

  2. Approximately 55

  3. Approximately 180

  4. Approximately 520

  5. Approximately 1200

Question 7

A number of artificial intelligence systems are used in a hospital in the field of diagnostics. As a general rule, who bears the responsibility for treatment decisions made on the basis of these systems?

  1. The developer of the AI application

  2. The commercial management

  3. The treating physicians

  4. The artificial intelligence

  5. The trade supervisory board

Question 8

Which type of AI learning works on a trial-and-error basis?

  1. Supervised learning

  2. Unsupervised learning

  3. Cumulative learning

  4. Dependent learning

  5. Reinforcement learning

Question 9

What type of validation of the utility of a new AI-based medical application is called for in the text?

  1. Prospective interventional studies

  2. Retrospective database analyses

  3. Physician satisfaction questionnaire

  4. Studies in different countries

  5. Cross-sectional studies

Question 10

In which of the following examples in the article was the diagnostic detection rate higher with than without AI?

  1. Preeclampsia

  2. Colorectal neoplasia

  3. Wet macular degeneration

  4. Sjögren syndrome

  5. Attention deficit/hyperactivity disorder (ADHD)

Acknowledgments

Translated from the original German by Christine Rye.

Footnotes

Conflict of interest statement

The authors declare that no conflict of interest exists.

References

  • 1.Hawkins J, Lewis M, Klukas M, Purdy S, Ahmad S. A framework for intelligence and cortical function based on grid cells in the neocortex. Front Neural Circuits. 2019;12 doi: 10.3389/fncir.2018.00121. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. [DOI] [PubMed] [Google Scholar]
  • 3.Katritsis DG. Artificial intelligence, superintelligence and intelligence. Arrhythm Electrophysiol Rev. 2021;10:223–224. doi: 10.15420/aer.2021.61. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Zhang D, Yin C, Zeng J, Yuan X, Zhang P. Combining structured and unstructured data for predictive models: a deep learning approach. BMC Med Inform Decis Mak. 2020;20 doi: 10.1186/s12911-020-01297-6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Rodriguez-Ruiz A, Lång K, Gubern-Merida A, et al. Stand-alone artificial intelligence for breast cancer detection in mammography: comparison with 101 radiologists. J Natl Cancer Inst. 2019;111:916–922. doi: 10.1093/jnci/djy222. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Acosta JN, Falcone GJ, Rajpurkar P, Topol EJ. Multimodal biomedical AI. Nat Med. 2022;28:1773–1784. doi: 10.1038/s41591-022-01981-2. [DOI] [PubMed] [Google Scholar]
  • 7.Vidalis T. Artificial intelligence in biomedicine: a legal insight. BioTech (Basel) 2021;10 doi: 10.3390/biotech10030015. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Eicher T, Kinnebrew G, Patt A, et al. Metabolomics and multi-omics integration: a survey of computational methods and resources. Metabolites. 2020;10 doi: 10.3390/metabo10050202. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Wen A, Wang L, He H, et al. An aberration detection-based approach for sentinel syndromic surveillance of COVID-19 and other novel influenza-like illnesses. J Biomed Inform. 2021;113 doi: 10.1016/j.jbi.2020.103660. 103660. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Tejedor M, Woldaregay AZ, Godtliebsen F. Reinforcement learning application in diabetes blood glucose control: a systematic review. Artif Intell Med. 2020;104 doi: 10.1016/j.artmed.2020.101836. 101836. [DOI] [PubMed] [Google Scholar]
  • 11.Rajpurkar P, Chen E, Banerjee O, Topol EJ. AI in health and medicine. Nat Med. 2022;28:31–38. doi: 10.1038/s41591-021-01614-0. [DOI] [PubMed] [Google Scholar]
  • 12.Haenssle HA, Fink C, Toberer F, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020;31:137–143. doi: 10.1016/j.annonc.2019.10.013. [DOI] [PubMed] [Google Scholar]
  • 13.Kocak B, Kus EA, Kilickesmez O. How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts. Eur Radiol. 2021;31:1819–1830. doi: 10.1007/s00330-020-07324-4. [DOI] [PubMed] [Google Scholar]
  • 14.Leslie D, Mazumder A, Peppin A, Wolters MK, Hagerty A. Does “AI“ stand for augmenting inequality in the era of covid-19 healthcare? BMJ. 2021;372 doi: 10.1136/bmj.n304. n304. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Suresh H, Guttag J. A framework for understanding sources of harm throughout the machine learning life cycle In: ACM International Conference Proceeding Series. www.doi.org/10.1145/3465416.3483305 (last accessed on 16 March 2022) 2021 [Google Scholar]
  • 16.Celi LA, Cellini J, Charpignon ML, et al. Sources of bias in artificial intelligence that perpetuate healthcare disparities—a global review. PLOS Digit Health. 2022;1 doi: 10.1371/journal.pdig.0000022. e0000022. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat Med. 2021;27:136–140. doi: 10.1038/s41591-020-01192-7. [DOI] [PubMed] [Google Scholar]
  • 18.Challen R, Denny J, Pitt M, Gompels L, Edwards T, Tsaneva-Atanasova K. Artificial intelligence, bias and clinical safety. BMJ Qual Saf. 2019;28:231–237. doi: 10.1136/bmjqs-2018-008370. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Goh KH, Wang L, Yeow AYK, et al. Artificial intelligence in sepsis early prediction and diagnosis using unstructured data in healthcare. Nat Commun. 2021;12 doi: 10.1038/s41467-021-20910-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 20.van der Niet AG, Bleakley A. Where medical education meets artificial intelligence: ‘Does technology care? Med Educ. 2021;55:30–36. doi: 10.1111/medu.14131. [DOI] [PubMed] [Google Scholar]
  • 21.Barboi C, Tzavelis A, Muhammad LN. Comparison of severity of illness scores and artificial intelligence models that are predictive of intensive care unit mortality: meta-analysis and review of the literature. JMIR Med Inform. 2022;10 doi: 10.2196/35293. e35293. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Loftus TJ, Tighe PJ, Ozrazgat-Baslanti T, et al. Ideal algorithms in healthcare: explainable, dynamic, precise, autonomous, fair, and reproducible. PLOS Digit Health. 2022;1 doi: 10.1371/journal.pdig.0000006. e0000006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Amann J, Vetter D, Blomberg SN, et al. To explain or not to explain?—Artificial intelligence explainability in clinical decision support systems. PLOS Digital Health. 2022;1 doi: 10.1371/journal.pdig.0000016. e0000016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Obermeyer Z, Topol EJ. Artificial intelligence, bias, and patients’ perspectives. Lancet. 2021;397 doi: 10.1016/S0140-6736(21)01152-1. 10289 2038. [DOI] [PubMed] [Google Scholar]
  • 25.Cabitza F, Campagner A, Balsano C. Bridging the “last mile” gap between AI implementation and operation: “data awareness” that matters. Ann Transl Med. 2020;8 doi: 10.21037/atm.2020.03.63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Gaube S, Suresh H, Raue M, et al. Do as AI say: susceptibility in deployment of clinical decision-aids. NPJ Digit Med. 2021;4 doi: 10.1038/s41746-021-00385-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Nagy M, Sisk B. How will artificial intelligence affect patient-clinician relationships? AMA J Ethics. 2020;22:E395–E400. doi: 10.1001/amajethics.2020.395. [DOI] [PubMed] [Google Scholar]
  • 28.Lu SC, Xu C, Nguyen CH, Geng Y, Pfob A, Sidey-Gibbons C. Machine learning-based short-term mortality prediction models for patients with cancer using electronic health record data: systematic review and critical appraisal. JMIR Med Inform. 2022;10 doi: 10.2196/33182. e33182. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Wingfield LR, Ceresa C, Thorogood S, Fleuriot J, Knight S. Using artificial intelligence for predicting survival of individual grafts in liver transplantation: a systematic review. Liver Transpl. 2020;26:922–934. doi: 10.1002/lt.25772. [DOI] [PubMed] [Google Scholar]
  • 30.Caliebe A, Leverkus F, Antes G, Krawczak M. Does big data require a methodological change in medical research? BMC Med Res Methodol. 2019;19 doi: 10.1186/s12874-019-0774-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ. 2020;368 doi: 10.1136/bmj.m689. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Zhou Q, Chen ZH, Cao YH, Peng S. Clinical impact and quality of randomized controlled trials involving interventions evaluating artificial intelligence prediction tools: a systematic review. NPJ Digit Med. 2021;4 doi: 10.1038/s41746-021-00524-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Ryan M. In AI we trust: ethics, artificial intelligence, and reliability. Sci Eng Ethics. 2020;26:2749–2767. doi: 10.1007/s11948-020-00228-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Collins GS, Dhiman P, Andaur Navarro CL, et al. Protocol for development of a reporting guideline (TRIPOD-AI) and risk of bias tool (PROBAST-AI) for diagnostic and prognostic prediction model studies based on artificial intelligence. BMJ Open. 2021;11 doi: 10.1136/bmjopen-2020-048008. e048008. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Wiens J, Saria S, Sendak M, et al. Do no harm: a roadmap for responsible machine learning for health care. Nat Med. 2019;25:1337–1340. doi: 10.1038/s41591-019-0548-6. [DOI] [PubMed] [Google Scholar]
  • 36.He J, Baxter SL, Xu J, Xu J, Zhou X, Zhang K. The practical implementation of artificial intelligence technologies in medicine. Nat Med. 2019;25:30–36. doi: 10.1038/s41591-018-0307-0. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.FDA US. Artificial Intelligence and Machine Learning (AI/ML)-Enabled Medical Devices. www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-aiml-enabled-medical-devices (last accessed on 5 May 2023) [Google Scholar]
  • 38.Repici A, Spadaccini M, Antonelli G, et al. Artificial intelligence and colonoscopy experience: lessons from two randomised trials. Gut. 2022;71:757–765. doi: 10.1136/gutjnl-2021-324471. [DOI] [PubMed] [Google Scholar]
  • 39.Keane PA, Topol EJ. AI-facilitated health care requires education of clinicians. Lancet. 2021 397 doi: 10.1016/S0140-6736(21)00722-4. 10281 1254. [DOI] [PubMed] [Google Scholar]
  • 40.Young AT, Amara D, Bhattacharya A, Wei ML. Patient and general public attitudes towards clinical artificial intelligence: a mixed methods systematic review. Lancet Digital Health. 2021;3:e599–e611. doi: 10.1016/S2589-7500(21)00132-1. [DOI] [PubMed] [Google Scholar]
  • E1.Haenssle HA, Fink C, Toberer F, et al. Man against machine reloaded: performance of a market-approved convolutional neural network in classifying a broad spectrum of skin lesions in comparison with 96 dermatologists working under less artificial conditions. Ann Oncol. 2020;31:137–143. doi: 10.1016/j.annonc.2019.10.013. [DOI] [PubMed] [Google Scholar]
  • E2.Blaha J, Barteczko-Grajek B, Berezowicz P, et al. Space GlucoseControl system for blood glucose control in intensive care patients—a European multicentre observational study. BMC Anesthesiol. 2016;16 doi: 10.1186/s12871-016-0175-4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • E3.Repici A, Badalamenti M, Maselli R, et al. Efficacy of real-time computer-aided detection of colorectal neoplasia in a randomized trial. Gastroenterology. 2020;159:512–520e7. doi: 10.1053/j.gastro.2020.04.062. [DOI] [PubMed] [Google Scholar]
  • E4.Romero-Martín S, Elías-Cabot E, Raya-Povedano JL, Gubern-Mérida A, Rodríguez-Ruiz A, Álvarez-Benito M. Stand-alone use of artificial intelligence for digital mammography and digital breast tomosynthesis screening: a retrospective evaluation. Radiology. 2022;302:535–542. doi: 10.1148/radiol.211590. [DOI] [PubMed] [Google Scholar]
  • E5.Meyer A, Zverinski D, Pfahringer B, et al. Machine learning for real-time prediction of complications in critical care: a retrospective study. Lancet Respir Med. 2018;6:905–914. doi: 10.1016/S2213-2600(18)30300-X. [DOI] [PubMed] [Google Scholar]
  • E6.Braun T, Spiliopoulos S, Veltman C, et al. Detection of myocardial ischemia due to clinically asymptomatic coronary artery stenosis at rest using supervised artificial intelligence-enabled vectorcardiography—a five-fold cross validation of accuracy. J Electrocardiol. 2020;59:100–105. doi: 10.1016/j.jelectrocard.2019.12.018. [DOI] [PubMed] [Google Scholar]
  • E7.Ipp E, Liljenquist D, Bode B, et al. Pivotal evaluation of an artificial intelligence system for autonomous detection of referrable and vision-threatening diabetic retinopathy. JAMA Netw Open. 2021;4 doi: 10.1001/jamanetworkopen.2021.34254. e2134254. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • E8.Franzke AW, Kristoffersen MB, Bongers RM, et al. Users’ and therapists’ perceptions of myoelectric multi-function upper limb prostheses with conventional and pattern recognition control. PLoS One. 2019;14 doi: 10.1371/journal.pone.0220899. e0220899. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • E9.Daifalla K, Günther S. Eigener Report 2022: Mindpeak Breast HER2 RoI Clinical Performance Evaluation Summary. www.uploads-ssl.webflow.com/60424989e8e0f02a922616f9/631072d2e19725a967c1735f_Mindpeak%20Breast%20HER2%20RoI%20-%20Clinical%20performance%20evaluation%20summary%20-%20APPROVED.pdf (last accessed on 20 November 2022) [Google Scholar]
  • E10.Sun H, Depraetere K, Meesseman L, et al. Machine learning-based prediction models for different clinical risks in different hospitals: evaluation of live performance. J Med Internet Res. 2022;24 doi: 10.2196/34295. e34295. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • E11.Chamberlin J, Kocher MR, Waltz J, et al. Automated detection of lung nodules and coronary artery calcium using artificial intelligence on low-dose CT scans for lung cancer screening: accuracy and prognostic value. BMC Med. 2021;19 doi: 10.1186/s12916-021-01928-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Deutsches Ärzteblatt International are provided here courtesy of Deutscher Arzte-Verlag GmbH

RESOURCES