Abstract
Since the publication of its 2020 position statement on artificial intelligence (AI) in sleep medicine by the American Academy of Sleep Medicine, there has been a tremendous expansion of AI-related software and hardware options for sleep clinicians. To help clinicians understand the current state of AI and sleep medicine, and to further enable these solutions to be adopted into clinical practice, a discussion panel was conducted on June 7, 2022, at the Associated Professional Sleep Societies Sleep Conference in Charlotte, North Carolina. The article is a summary of key discussion points from this session, including aspects of considerations for the clinician in evaluating AI-enabled solutions including but not limited to what steps might be taken both by the Food and Drug Administration and clinicians to protect patients, logistical issues, technical challenges, billing and compliance considerations, education and training considerations, and other unique challenges specific to AI-enabled solutions. Our summary of this session is meant to support clinicians in efforts to assist in the clinical care of patients with sleep disorders utilizing AI-enabled solutions.
Citation:
Bandyopadhyay A, Bae C, Cheng H, et al. Smart sleep: what to consider when adopting AI-enabled solutions in clinical practice of sleep medicine. J Clin Sleep Med. 2023;19(10):1823–1833.
Keywords: AI-enabled solutions, clinical sleep medicine, artificial intelligence, machine learning
INTRODUCTION
In 2020, the American Academy of Sleep Medicine (AASM) released a position statement on artificial intelligence in sleep medicine.1 In the past two years, several worldwide events have created a huge shift in the world of emerging health technology. While consumer devices have gained worldwide popularity, medical-grade devices have also incorporated new technology, and the interface between consumer and medical-grade devices has started to blur. This increase in popularity has led to a need for increased governance and standardization at various levels.2,3 Adoption of artificial intelligence (AI)-enabled consumer and medical solutions in medicine has unique challenges.4 This is a summary of a discussion panel conducted on June 7, 2022, at the Associated Professional Sleep Societies Sleep Conference in Charlotte, North Carolina. The goal of this discussion panel was to help sleep clinicians make an informed decision when choosing AI-enabled tools in clinical practice. The learning objectives of the discussion panel included:
- Prepare a checklist of things to consider when vetting new AI-enabled solutions.
- Steps to evaluate AI-enabled solutions with a discussion of the Food and Drug Administration (FDA) approval process
- Identify common biases involved with AI-enabled solutions
- Prepare a checklist on how to implement new AI-enabled solutions at the practice/enterprise level.
- Logistical/technical considerations when choosing AI-enabled solutions for a clinical practice
- Steps for dissemination and implementation of AI-enabled solutions in a clinical practice
- Unique challenges requiring ongoing support for continuously learning AI-enabled solutions
- Coding barriers and challenges while adopting AI-enabled solutions in clinical practice
METHODS
The discussion group fostered an interactive format by asking the audience questions (Table S1 (171.6KB, pdf) in the supplemental material) about validation, ethics, and biases associated with implementing AI-enabled solutions and the future of AI. These questions were incorporated in the ensuing discussion, thus representing the general level of concern and curiosity about AI in sleep medicine. The discussion panel was open to all paid attendees of the 36th annual Associated Professional Sleep Societies Sleep Conference. The questions were solicited during the panel discussion. After each speaker provided an overview of their respective topic, the chair and speakers answered audience questions. For the purpose of the manuscript, each of the speakers authored the sections they had presented during the panel discussion, including responses to audience questions, if it was within the scope of the manuscript. Additional edits were made for readability and to meet the high standards of the journal for manuscript submission.
The panelists first provided an overview of the following topics:
1A. What should the sleep clinician consider when evaluating AI-enabled solutions?
1B. What are some potential inherent biases associated when evaluating AI-enabled solutions?
2A. What are some logistical/technical considerations while choosing AI-enabled solutions for clinical practice?
2B. What are the steps for dissemination and implementation of AI-enabled solutions in your clinical practice?
2C. What support is needed to update AI-enabled solutions as machine learning (ML) continuously learns and relearns on the go?
2D. What are the coding barriers posed by payers that make it difficult to integrate these solutions, and what steps can we take for successful incorporation at the Current Procedural Technology (CPT) level?
The presented overviews as well as comments from panel discussion were collated and are summarized in this manuscript. This summary of the discussion assumes a modest understanding of AI-related technical terms and was written to help the sleep clinician evaluate and implement AI-enabled solutions in clinical practice.
TOPICS
1A. What should the sleep clinician consider when evaluating AI-enabled solutions?
In 1997, John McCarthy defined artificial intelligence as the branch of science and engineering of making intelligent machines.5 Despite the debate surrounding its definition and scope, AI has been adopted in many fields including medicine.6,7 However, while AI has gained popularity in clinical research, its widespread clinical adoption has been challenging.8 Lack of regulation, interpretability, interoperability, and evidence-based practice have hindered the use of AI in clinical practice.8 One of the fundamental challenges has been the wide variety in the types of AI used in medicine. One of the most common forms of AI is machine learning (ML). ML is the branch of AI that focuses on computers’ ability to learn without explicitly being programmed. Deep learning (DL), a subset of machine learning, is inspired by the structure and function of the brain learning through artificial neural networks. How the machine is programmed to learn, and the level of complexity required for that knowledge, dictates the rigor of regulation, interpretability, and interoperability. Multiple learning models may be used to program the machine, including supervised learning (when the training dataset has a labeled inputs and known outputs) and unsupervised learning (when the training dataset has unlabeled inputs and unknown outputs).9 Validation of these different learning models may involve complex statistical methods, and it is rarely a one size fits all.
AI-enabled solutions are gradually gaining popularity in the realm of clinical sleep practice. The most common application has been AI-enabled polysomnogram (PSG) scoring and home sleep apnea testing devices.10 There are several points a sleep clinician should thoughtfully consider when evaluating clinical AI-enabled solutions. Table 1 illustrates a checklist for sleep clinicians to review before investing in any AI-powered products for clinical application.11–14
Table 1.
Examine the AI Model Task
|
Check Reliable Resources (eg, FDA access data, AASM #SleepTechnology Resource Website14)
|
Understand the AI Model Design
|
Explore the AI Model Datasets
|
Appraise the AI Model Performance
|
Comprehend the AI Model Application
|
Investigate the AI Model Security
|
Understand the FDA Regulatory Oversight Related to an AI Model
|
AI = artificial intelligence, AASM = American Academy of Sleep Medicine, DL = deep learning, FDA = Food and Drug Administration, ML = machine learning.
Examine the AI model task
When evaluating an AI-enabled solution, it is essential to investigate and have a clear understanding of the clinical utility and capability of the AI solutions of interest. For example, if one intends to screen OSA in a specific patient population, a commercially available AI-enabled device for OSA diagnosis may not suffice as it may lack sensitivity in the population in question. One must also determine the category of the AI-enabled solution: assistive, augmentative, autonomous.15 These categories are based on the amount of work that is done by a machine compared to a human for data analysis, data interpretation, and report generation.
Check reliable resources
In the United States, the FDA is an agency of the Department of Health and Human Services, which enforces multiple laws that protects the public health. The FDA Data Dashboard (datadashboard.fda.gov) allows users to search for data submitted by pharmaceutical and device manufacturers to the FDA. The AASM offers a useful resource for clinicians interested in evaluating AI-enabled solutions. Sleep clinicians can check AASM’s #SleepTechnology resource website under the AASM Clinical Resources section to see if a specific AI-enabled device, application, or software as a medical device has been reviewed by the AASM Emerging Technology Committee.14 This website has written assessments on various new, innovative sleep technologies and presents a concise technical review, including FDA status, AI/ML/DL-related information, claimed use, sensors, mechanisms, data outputs, raw data availability, application programming interfaces, and peer-reviewed publications. Other resources include a search for clinical trials through ClinicalTrials.gov.
Understand the AI model design
The next step is understanding the AI/ML/DL method used in the clinical AI model.16 AI includes many different techniques. Unlike traditional rule-based algorithms that are generated based on the features specifically delineated by humans, ML computer programming algorithms transform the inputs into outputs using automatically derived statistical, data-driven rules. DL algorithms are complex mathematical systems that require no feature extractions. DL neural networks are fed with a large set of raw data to develop their own multilayer representations, with each arithmetic layer continuously fed to the next. With inputs appropriately selected, DL algorithms can identify subtle features from the input signals, potentially even discover unknown features, and handle intricate relationships. In clinical medicine, DL can play a central role in assessing complex data, such as medical imaging in radiology, dermatology, ophthalmology, pathology, and cardiology.17 In sleep medicine, AI/ML/DL algorithms have rapidly proliferated in recent years, from the automated PSG scoring of sleep stages and respiratory events to numerous other aspects of clinical sleep medicine.10
Explore the AI model datasets
When assessing a specific AI-enabled solution, it is critical to understand how the clinical AI model of interest was constructed by examining all related studies. To assess the robustness of the model, it is important to understand the training, tuning, and external testing datasets to ensure that they are clearly defined13 (Figure 1). When an AI model is developed, the algorithm is first trained using a training dataset (adjusting parameters to best match the model output) and then tuned using an unseen dataset (repeatedly adjusting a trained model to best fit the tuning dataset11). Once the model is built, it needs to be validated using an external dataset from another source to confirm that the AI model can generalize. Overall, the three datasets should not overlap, and the external testing dataset should be used in the final statistical report.11,13
Figure 1. Artificial intelligence model datasets and potential associated errors.
Another relevant task is to ensure that the data quality of a performance evaluation study is high. The “garbage in, garbage out” adage is relevant in AI model construction. AI model performance problems tend to arise when data quality is poor or data diversity is confined. Lack of accurately labeled and versatile development datasets can lead to poor performance in real-world environments.12 Several publicly available sleep medicine databases, such as the Sleep Heart Health Study, MGH Sleep Laboratory, and Wisconsin Sleep Cohort, contain raw data from thousands of PSGs (the standard of reference) and can be used for training and tuning in AI model construction.18 During training, larger training datasets result in a more robust AI model performance, as compared to use of smaller datasets, when all other factors (ie, accuracy, bias) are kept equal. In practice, clinical and PSG datasets are typically on the order of hundreds or thousands in commercial image identification AI models.19 Using mixed cohort datasets for algorithm training has been associated with improved performance.20 Additionally, utilizing datasets scored by multiple experts may improve performance of automated sleep scoring algorithms.21 For external validation datasets, the conventional sample size may suffice, but gaps in testing will be present and must be addressed when detected.13
Examine the AI model performance
Next, the performance of an AI model should be compared to a gold standard if available. For example, for sleep staging, the reference standard will likely be in-laboratory PSG, and the PSG should be manually scored using rules from the latest AASM scoring manual. Additionally, an epoch-by-epoch comparison should be performed.22 If a gold standard is not available, then additional rigor will be needed to evaluate for the application and utility of the model. It is critical to determine which population was studied by an AI model: Was a healthy population studied or a population of people with confirmed or suspected sleep disorders? The inclusion and exclusion criteria, disease prevalence, and disease severity distribution of a validation study must be carefully reviewed. Furthermore, the testing dataset must not contain previously learned data (ie, data from training dataset, previous processed data on continuously learning AI model), as doing so will artificially inflate the accuracy results.
Standard performance metrics such as accuracy, sensitivity, specificity, positive predictive value, negative predictive value, and detailed measures of agreement such as Cohen’s kappa, Bland-Altman plots, and confusion matrix should be reviewed. It is crucial to note that correlation coefficients only indicate the relationship between two variables but not the diagnostic accuracy or agreement.23 As a result, clinicians should avoid overrelying on correlations when evaluating AI model performance.24,25 Additionally, clinicians should scrutinize the misclassification or diagnostic discordance by inspecting the concordance rate (eg, classification agreement). For example, when assessing OSA severity or sleep stage classification accuracy, one can examine the confusion matrix table when available. In this example, clinicians should be aware of imbalanced classification, where skewed class proportions may affect performance measures unless we account for the minority classes.26 F1 score; precision-recall, receiver operating characteristic curve; area under curve; Cohen’s kappa; intraclass correlation coefficient; and Bland-Altman plot can also evaluate an agreement.23
Examine the AI model application
As biases are inevitable in AI models, sleep clinicians should also be aware that AI models are susceptible to errors in generalizability. Clinicians should also be familiar with the strengths and weaknesses of the AI-enabled solutions. It is worth noting that AI models are susceptible to overfitting, a scenario where an AI model only works well to predict the training data and thus can lead to failure to generalize.27 Underfitting, on the other hand, can cause an AI algorithm to miss relevant relationships.28
The FDA encourages independent review of a software as a medical device’s clinical evalution.29 For example, before procuring an AI-enabled automated PSG scoring software, clinicians should ensure that the software has been adequately tested in the specific PSG system that is being used by the clinical practice, showing good performance. They should also check that the input data used for an AI model is not difficult to obtain in the routine clinical workflow. Furthermore, it will be ideal if a “local validation” trial can be run before committing to an AI-enabled solutions.
Examine the AI model security
Cybersecurity is increasingly important for the protection of confidential data. One must ensure that the AI model of interest has implemented robust cybersecurity practices following Health Insurance Portability and Accountability Act guidelines and meet current standards.30
Understand the FDA regulatory oversight related to an AI model
Devices are classified by the FDA as class I, II, or III based on the degree of regulation deemed necessary for the establishment of reasonable certainty of safety and efficacy, with class III devices having the greatest degree of regulation.31 Classification determines which requirements must be fulfilled by the manufacturer and evaluated by the FDA before a device can be commercially distributed.32 The device class is based on three components: (1) intended use of the device, (2) indications for use (product labeling that may include which populations or under which specific circumstances the device is designed to be used), and (3) risks associated with the device for patient and/or user.32
Classification of a device can be determined by review of FDA regulations. Alternatively, manufacturers can submit a section 513(g) request, through which the FDA would provide a recommendation about the classification and the regulatory requirements for a particular device.33 It is feasible for regulatory requirements/classification to change for a particular device if there is alternation of one of the three components previously described. It is important to understand if the device is FDA registered (level of device risk may be sufficiently managed by the least amount of regulatory general control), FDA cleared (general controls alone are insufficient to assure safety and effectiveness of device and special control necessary), FDA approved (insufficient information exists to assure safety and effectiveness solely through general or special control) (Table 2).
Table 2.
FDA classification of device properties.
Term | Definition and Explanation |
---|---|
FDA Registered | Refers to the process by which device manufacturers must inform the FDA of their facilities, the devices that are made there, and activities performed with those devices.34 The FDA does not certify registration information, and the process does not imply FDA clearance or approval. Most class I devices are exempt from the 510(k)-premarket submission process, held to less rigorous controls, and would be FDA registered (but not cleared or approved).32 Class I devices that are low risk and intended for health promotion and education, without reference to diseases or conditions, are considered general wellness products.34 |
FDA Cleared | Refers to the 510(k) premarket notification process, whereby a manufacturer must inform the FDA of the intent to market a device at least 90 days in advance, and the FDA determines the devices’ equivalence to a device already legally marketed. The 510(k) process is required for most class II devices. The standard utilized for FDA clearance through the 510(k) process is comparative—the applicant must establish enough similarity to a predicate legally marketed device, ie, substantial equivalence.35 When there is no legally marketed predicate device, class I or II devices can also be cleared by the FDA as a de novo classification request.36 |
FDA Approved | Refers to the most stringent device marketing application, whereby sufficient scientific evidence must be provided to conclude that the device is safe and effective for its intended use(s).37 Most class III devices must undergo the premarket approval process, which establishes safety and efficacy based on independent review (“FDA approved”).38 The FDA has a separate regulatory process for software as a medical device, which is defined as software intended for medical purposes that performs these purposes without being part of a medical device’s hardware.29 |
FDA = Food and Drug Administration.
It is important to note that FDA clearance or approval may apply to the overall claimed use or data output but may also be limited to a specific function or component sensor of a device.39 Devices@FDA is a database that provides the details of FDA clearance or approval for a particular device and can be helpful for making these distinctions.40 Importantly, emerging technology is blurring the lines between the clinical and consumer realms.3 The presence of such hybrid devices (which contain some components that are FDA cleared) highlights the importance of understanding the quality and limits of available evidence as well as regulatory nuances as they relate to a particular technology.39
In addition to understanding the differences among FDA device classes, it is critical to be cognizant of the FDA regulatory oversight of the AI models. AI-enabled models are highly iterative and adaptive as AI/ML/DL algorithms can learn from new datasets to continually improve their clinical performance. The FDA has published several informative documents that can help sleep physicians understand how to assess the performance of AI-enabled solutions.41–43 Software as a medical device that extends beyond a traditional medical device or hardware has resulted in medical device regulators to formulate a common framework and principles for its regulation. Framework for risk categorization, quality management, and continuous clinical evaluations including post-market monitoring of software as a medical device need to be evaluated.29 Additionally, as AI has gained global popularity, it may be beneficial to form international collaborative efforts with other countries to gain a deeper understanding of the various regulatory methods.
1B. What are some potential inherent biases associated when evaluating AI-enabled solutions?
It is critical for both developers and clinicians to be aware of the biases that may be inherent in AI-enabled applications. The types of biases that are commonly taught are listed in Table 3; notably sometimes the biases continue to perpetuate further errors in the application. It is important therefore to establish processes and practices to test for and mitigate bias in AI systems. Clinicians should be prepared to engage in fact-based conversations about potential biases in human decisions. As an industry, we believe there should be a call to action for all developers to invest more in bias research and make more data available for research (while respecting privacy). This is a call to not only developers, engineers, and clinicians but others in research and development as we believe that a multidisciplinary approach is necessary to protect patients and the public from some of the biases that may be inherent in these solutions.
Table 3.
Common biases in AI-enabled solutions.
Types of Bias | Definition |
---|---|
Psychometric and Technical Biases | |
Overfitting | Occurs when an algorithm is trained too closely to the training data, resulting in the algorithm being too specialized to the training data and performing poorly on new, unseen data. |
Underfitting | Occurs when an algorithm is too simple and does not capture the complexity of the underlying data. |
Outliers | Data points that are significantly different from the rest of the data. For AI, inclusion of these edge cases and accurately labeled outliers are generally important to generalization and accuracy, although the infrequent nature of outliers will result in uncertainty of behavior with inputs sharing partial similarity with outlier. |
Measurement bias | Occurs when the data used to train an algorithm is measured in a way that is not accurate or consistent. This can result in the algorithm learning to make inaccurate predictions or decisions. |
Recall bias | Commonly takes place in the data labeling stage when labels are inconsistently given based on subjective observations. |
Observer bias/confirmation bias (choosing wrong labels) | Occurs when the person collecting or labeling the data is biased in some way, such as having preconceived notions or beliefs. This can lead to inaccurate or inconsistent labeling, which in turn can lead to bias in the algorithm’s predictions or decisions. |
Selection bias | Occurs when the data used to train an algorithm is not representative of the entire population. This can result in the algorithm learning to make inaccurate predictions or decisions for certain groups of people. |
Exclusion bias | Occurs when certain groups of people are excluded from the data used to train an algorithm. This can result in the algorithm learning to make inaccurate predictions or decisions for certain groups of people. |
Racial bias | Occurs when an algorithm is biased toward or against certain racial groups. This can result in the algorithm making inaccurate predictions or decisions based on race. |
Association bias | Occurs when an algorithm is biased toward or against certain associations, such as sex, age, or socioeconomic status. This can result in the algorithm making inaccurate predictions or decisions based on these associations. |
Inferential and Computational Bias | |
Automation bias | Occurs when people rely too heavily on automated decision-making systems and trust their output without critically evaluating it. This can result in the algorithm making inaccurate predictions or decisions that are blindly accepted by the end user. |
Cognitive bias | Refers to the systematic errors in thinking and decision-making that can occur due to a range of psychological factors, such as stereotypes, heuristics, and emotions. Cognitive bias present in people, algorithms, and procedures are reflected in the data that is generated and is subsequently learned and perpetuated by AI models. Examples of this include the higher criminal risk assessment scores by COMPAS based on race and negative sentiment associated with certain demographics in Google’s Cloud Natural Language application programming interface.44,45 |
2A. What are some logistical/technical considerations while choosing AI-enabled solutions for clinical practice?
There are several considerations when evaluating AI-enabled solutions for the clinician, including a careful needs assessment. It is worth taking a deeper look at operational, financial, legal, and quality assurance factors, with emphasis on the data processing and handling. There are many facets to consider, so clinicians and their teams might be wise to consider using some or all of the checklist in Table 4.
Table 4.
Checklist of considerations by which to evaluate AI solutions for clinical practice.
Domain | Considerations |
---|---|
Clinical |
|
Operational |
|
Financial |
|
Data storage and security (HIPPA) |
|
Legal |
|
Quality assurance standards |
|
AI = artificial intelligence, FDA = Food and Drug Administration, HIPPA = Health Insurance Portability and Accountability Act.
2B. What are the steps for dissemination and implementation of AI-enabled solutions in your clinical practice?
Once an AI-enabled solution is determined to be a good fit for a particular clinical practice, clinicians and their respective practice should pause and reflect on whether the AI solution proposed really solves the problems at hand and question the overall value proposition. It may be worth developing a formal or informal business plan with a gap or strength, weakness, opportunity, and threats analyses. It would be wise to create a data work group and then evaluate the pilot data with objective metrics for performance/success. If this is successful, then scaling the process is potentially feasible. While AI has gained increasing popularity worldwide, education regarding AI and in particular how to critically evaluate AI-enabled algorithms is sparse in the current medical school curriculum. The integration of professional data experts, including biomedical engineers and data scientists, is crucial to the data work group in order to help clinicians evaluate these novel technologies.
2C. What support is needed to update AI-enabled solutions as ML continuously learns and relearns on the go?
Continuously learning AI solutions pose additional challenges for maintaining accuracy and preventing bias. These challenges result from continuously learning AI constantly adjusting the weights in its neural network over time, resulting in gradual changes in its decision-making. For noncontinuous learning AI, performance to live data changes over time if there is concept drift or covariate shift. For continuous learning models, current performance differs from past performance because they are continuously learning from the data they are currently exposed to. This is shown in Figure 2.
Figure 2. Challenges of continuously learning artificial intelligence-enabled solutions.
Concept drift occurs when there is an unpredictable change in the relationship of input to output over time, without significant change in the distribution of input combinations. Covariate shift occurs when there is a significant change in the distribution of input combinations without change of the relationship between input and output.
In a standard AI algorithm (without continuous learning), training utilizes specific training datasets at a specific time, subsequently followed by testing and deployment in a live environment. In this kind of ML algorithm, the decision-making of the algorithm is optimized during training and remains fixed afterwards. The resulting consistency in decision-making is ideal in situations without significant concept drift and covariate shift, as it ensures consistent accuracy of the algorithm. Additionally, bias is easier to limit in these algorithms without continuous learning by minimizing its presence in the training dataset, as presence of bias in a live environment will not affect an algorithm’s decision-making.46
In a continuously learning AI algorithm, the neural network uses data from the live environment and continuously adjusts its decision-making to improve its own perceived accuracy (determined by a cost function that is programmed by the software developers, which is not the same as objectively tested accuracy). These types of algorithms have the advantage of improved accuracy in the setting of concept drift or covariate shift, from continuous adjustment of decision-making based on current data. But since its decision-making is continuously changing, its current accuracy will differ from previously objectively tested accuracy statistics. Unless the algorithm receives periodic retraining with previous datasets, continuous learning generally results in higher perceived accuracy with newer data, at the expense of lower perceived accuracy with older data. This phenomenon is called catastrophic forgetting.47,48 Managing bias is significantly more difficult as well. Unless the AI algorithm has some effective process for filtering out bias, any bias present in live data will be learned by the AI algorithm. There is currently no known effective process to detect all forms of bias algorithmically, with cognitive bias being especially difficult. Nonmedical examples of this drawback of continuously learning AI algorithms are in the news recently, involving Meta’s Blenderbot, Microsoft’s Tay, and others.49–51 Minimizing bias for continuous learning algorithms is significantly more challenging, as this process requires ongoing identification and correction of bias in all new data the algorithm is consuming, rather than only minimizing bias in training datasets.52 Mitigation strategies for continuously learning AI enabled solutions are discussed in Table 5.
Table 5.
Mitigation strategies for continuously learning AI algorithms to avoid loss in accuracy and limit learning of bias.
Mitigation Strategies | Description | Limitations |
---|---|---|
Ensure optimal architecture is utilized |
|
|
Repeated testing |
|
|
Repeated training on previous datasets |
|
|
Specialized architectures |
|
|
Bias filtering and bias correction |
|
|
AI = artificial intelligence.
2D. What are the coding barriers posed by payers that make it difficult to integrate these solutions, and what steps can we take for successful incorporation at the CPT level?
There are three categories for CPT codes.53 Category 1 codes cover procedures and current medical practices that are FDA approved or cleared. Category 2 codes are used to collect information about the quality-of-care delivery. Category 3 codes cover emerging technologies and procedures. Typically, Category 3 codes are temporary and are not reimbursed by Medicare, and some third-party payors may provide some reimbursement. The Centers for Medicare & Medicaid Services does track the usage of Category 3 codes, and, based on utilization of the procedure or clinical service in addition to peer-reviewed publications, some Category 3 codes can be used up to 5 years before they are changed to a Category 1 code. Just because an AI solution is FDA approved does not mean that there is a CPT code that exists for that solution. Category 1 codes have an associated relative value unit (RVU), which reflects the amount of work that a physician performs. Category 3 codes do not have an associated RVU.
In 2021, a taxonomy for AI was created by the American Medical Association.15 This can be found in Appendix 8 of that publication, which is part of the CPT code set published on January 1, 2022. There are three categories for AI that are determined by the amount of work done by the physician:
Assistive—device detects data
Augmentative—device detects and analyzes data
Autonomous—device detects devices and analyzes and interprets data
Industry representatives should consider the CPT nomenclature early in the process of AI development, in addition to working with content experts to help determine the level of work that is provided by physicians. An AI solution may be FDA cleared or approved but may not have an associated RVU initially; it can be challenging to determine the proper RVU, and the ultimate value may be counterintuitive. For example, a CPT code for a fully autonomous AI solution may not have a high RVU value since the work done by a physician is minimal. There is an American Medical Association resource that is currently available for free: the CPT developer program (https://platform.ama-assn.org/ama/#/dev-program), which grants early access to industry representatives who want to understand CPT codes before a product goes to market.
CONCLUSIONS
AI-enabled solutions have gained popularity in many industries, including sleep medicine. Adoption of AI-enabled solutions includes determining the nature and scope of the AI-enabled solution as well as determining whether it is a good fit for one’s clinical practice. It is important to understand the accuracy and bias of these AI solutions and map out current state workflows before implementing an AI solution in sleep medicine. AI-enabled solutions hold promise as they may help realize many opportunities for improving efficiencies. However, caution is noted in that the solutions have several potential biases and confounders and may pose significant logistical and technical challenges. But if used cautiously and wisely, AI-enabled solutions may significantly improve access to care for the patient with sleep disorders.54
DISCLOSURE STATEMENT
Jaspal Singh is a consultant for Intuitive Surgical. Ambrose Chiang has received two research grants from Belun Technology Company for conducting Belun Sleep Platform validation trials at University Hospitals Cleveland Medical Center. Azizi Seixas currently serves as a per diem consultant for Philips’ Sleep Advisory Board, Idorsia Pharmaceutical, and is on the board for Moshi Kids. None of these roles had any influence on the current paper. The other authors report no conflicts of interest.
ABBREVIATIONS
- AASM
American Academy of Sleep Medicine
- AI
artificial intelligence
- CPT
Current Procedural Technology
- DL
deep learning
- FDA
Food and Drug Administration
- ML
machine learning
- PSG
polysomnogram
REFERENCES
- 1. Goldstein CA , Berry RB , Kent DT , et al . Artificial intelligence in sleep medicine: an American Academy of Sleep Medicine position statement . J Clin Sleep Med. 2020. ; 16 ( 4 ): 605 – 607 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. de Zambotti M , Menghini L , Grandner MA , Redline S , Zhang Y , Wallace ML , Buxton OM . Rigorous performance evaluation (previously, “validation”) for informed use of new technologies for sleep health measurement . Sleep Health. 2022. ; 8 ( 3 ): 263 – 269 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3. Mathews DJH , Balatbat CA , Dzau VJ . Governance of emerging technologies in health and medicine—creating a new framework . N Engl J Med. 2022. ; 386 ( 23 ): 2239 – 2242 . [DOI] [PubMed] [Google Scholar]
- 4. de Zambotti M , Cellini N , Goldstone A , Colrain IM , Baker FC . Wearable sleep technology in clinical and research settings . Med Sci Sports Exerc. 2019. ; 51 ( 7 ): 1538 – 1557 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Carthy JM . What is artificial intelligence? http://www-formal.stanford.edu/jmc/whatisai/whatisai.html ; 1997. . Accessed August 31, 2022.
- 6. Ramesh AN , Kambhampati C , Monson JR , Drew PJ . Artificial intelligence in medicine . Ann R Coll Surg Engl. 2004. ; 86 ( 5 ): 334 – 338 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Wang P . On defining artificial intelligence . J Artif Gen Intell. 2019. ; 10 ( 2 ): 1 – 37 . [Google Scholar]
- 8. Varghese J . Artificial intelligence in medicine: chances and challenges for wide clinical adoption . Visc Med. 2020. ; 36 ( 6 ): 443 – 449 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9. Ayyadevara VK , Ayyadevara VK . Basics of Machine Learning . In: Ayyadevara VK. ed. Pro Machine Learning Algorithms: A Hands-On Approach to Implementing Algorithms in Python and R. New York: : Springer; ; 2018. : 1 – 15 . [Google Scholar]
- 10. Bandyopadhyay A , Goldstein C . Clinical applications of artificial intelligence in sleep medicine: a sleep clinician’s perspective . Sleep Breath. 2023. : 27 ( 1 ): 39 – 55 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Liu Y , Chen PC , Krause J , Peng L . How to read articles that use machine learning: users’ guides to the medical literature . JAMA. 2019. ; 322 ( 18 ): 1806 – 1816 . [DOI] [PubMed] [Google Scholar]
- 12. Lopez-Jimenez F , Attia Z , Arruda-Olson AM , et al . Artificial intelligence in cardiology: present and future. Mayo Clin Proc. 2020. ; 95 ( 5 ): 1015 – 1039 . [DOI] [PubMed] [Google Scholar]
- 13. Bluemke DA , Moy L , Bredella MA , et al . Assessing radiology research on artificial intelligence: a brief guide for authors, reviewers, and readers—from the Radiology Editorial Board . Radiology. 2020. ; 294 ( 3 ): 487 – 489 . [DOI] [PubMed] [Google Scholar]
- 14. AASM . Emerging technology. https://aasm.org/clinical-resources/emerging-technology/ . Accessed August 31, 2022. .
- 15. AMA . CPT Appendix S: AI taxonomy for medical services & procedures. https://www.ama-assn.org/practice-management/cpt/cpt-appendix-s-ai-taxonomy-medical-services-procedures . Accessed August 31, 2022. .
- 16. Esteva A , Robicquet A , Ramsundar B , et al . A guide to deep learning in healthcare . Nat Med. 2019. ; 25 ( 1 ): 24 – 29 . [DOI] [PubMed] [Google Scholar]
- 17. Topol EJ . High-performance medicine: the convergence of human and artificial intelligence . Nat Med. 2019. ; 25 ( 1 ): 44 – 56 . [DOI] [PubMed] [Google Scholar]
- 18. NSSR . Datasets. https://sleepdata.org/datasets . Accessed August 31, 2022. .
- 19. Varoquaux G , Cheplygina V . Machine learning for medical imaging: methodological failures and recommendations for the future . NPJ Digit Med. 2022. ; 5 ( 1 ): 48 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20. Olesen AN , Jørgen Jennum P , Mignot E , Sorensen HBD . Automatic sleep stage classification with deep residual networks in a mixed-cohort setting . Sleep. 2021. ; 44 ( 1 ): zsaa161 . [DOI] [PubMed] [Google Scholar]
- 21. Nasiri S , Ganglberger W , Sun H , Thomas RJ , Westover MB . Exploiting labels from multiple experts in automated sleep scoring . Sleep. 2023. ; 46 ( 5 ): zsad034 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Depner CM , Cheng PC , Devine JK , et al . Wearable technologies for developing sleep and circadian biomarkers: a summary of workshop discussions . Sleep. 2020. ; 43 ( 2 ): zsz254 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Ranganathan P , Pramesh CS , Aggarwal R . Common pitfalls in statistical analysis: measures of agreement . Perspect Clin Res. 2017. ; 8 ( 4 ): 187 – 191 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Ioachimescu OC , Allam JS , Samarghandi A , et al . Performance of peripheral arterial tonometry-based testing for the diagnosis of obstructive sleep apnea in a large sleep clinic cohort . J Clin Sleep Med. 2020. ; 16 ( 10 ): 1663 – 1674 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. Massie F , Van Pee B , Bergmann J . Correlations between home sleep apnea tests and polysomnography outcomes do not fully reflect the diagnostic accuracy of these tests . J Clin Sleep Med. 2022. ; 18 ( 3 ): 871 – 876 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 26. Lango M , Stefanowski J . What makes multi-class imbalanced problems difficult? An experimental study . Expert Syst Appl. 2022. ; 199 : 116962 . [Google Scholar]
- 27. Faes L , Liu X , Wagner SK , et al . A clinician’s guide to artificial intelligence: how to critically appraise machine learning studies . Transl Vis Sci Technol. 2020. ; 9 ( 2 ): 7 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28. Handelman GS , Kok HK , Chandra RV , et al . Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods . AJR Am J Roentgenol. 2019. ; 212 ( 1 ): 38 – 43 . [DOI] [PubMed] [Google Scholar]
- 29. USFDA . Software as a Medical Device (SAMD): Clinical Evaluation Guidance for Industry and Food and Drug Administration Staff. https://www.fda.gov/media/100714/download . Accessed July 24, 2022. .
- 30. NIST . Implementing the Health Insurance Portability and Accountability Act (HIPAA) Security Rule: A Cybersecurity Resource Guide. https://csrc.nist.gov/publications/detail/sp/800-66/rev-2/draft ; 2022. . Accessed March 8, 2023.
- 31. USDHHSFDACDRH . The 510(k) Program: Evaluating Substantial Equivalence in Premarket Notifications [510(k)] Guidance for Industry and Food and Drug Administration Staff. https://www.fda.gov/media/82395/download . Accessed July 12, 2022. .
- 32. USFDA . Classify your device. https://www.fda.gov/medical-devices/overview-device-regulation/classify-your-medical-device . Accessed July 12, 2022. .
- 33. USFDA . FDA and Industry Procedures for Section 513(g) Requests for Information under the Federal Food, Drug, and Cosmetic Act Guidance for Industry and Food and Drug Administration Staff. https://www.fda.gov/media/78456/download . Accessed July 23, 2022. .
- 34. USFDA . Device Registration and Listing. https://www.fda.gov/medical-devices/how-study-and-market-your-device/device-registration-and-listing . Accessed July 13, 2022. .
- 35. USFDA . Premarket Notifications 510(k). https://www.fda.gov/medical-devices/premarket-submissions-selecting-and-preparing-correct-submission/premarket-notification-510k . Accessed July 22, 2022. .
- 36. USFDA . De Novo Classification Request. https://www.fda.gov/medical-devices/premarket-submissions-selecting-and-preparing-correct-submission/de-novo-classification-request . Accessed July 22, 2022. .
- 37. USFDA . Denials and Clearances. https://www.fda.gov/medical-devices/products-and-medical-procedures/device-approvals-denials-and-clearances . Accessed March 11, 2023. .
- 38. USFDA . Premarket Approval. https://www.fda.gov/medical-devices/premarket-submissions/premarket-approval-pma . Accessed July 22, 2022. .
- 39. Schutte-Rodin S , Deak MC , Khosla S , et al . Evaluating consumer and clinical sleep technologies: an American Academy of Sleep Medicine update . J Clin Sleep Med. 2021. ; 17 ( 11 ): 2275 – 2282 . [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40. USFDA . Devices@FDA. https://www.accessdata.fda.gov/scripts/cdrh/devicesatfda/index.cfm . Accessed July 19, 2022. .
- 41. USFDA . Artificial Intelligence and Machine Learning in Software as a Medical Device. https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device . Accessed August 31, 2022. .
- 42. USFDA . FDA Releases Artificial Intelligence/Machine Learning Action Plan. https://www.fda.gov/news-events/press-announcements/fda-releases-artificial-intelligencemachine-learning-action-plan . Accessed August 31, 2022. .
- 43. USFDA . Good Machine Learning Practice for Medical Device Development: Guiding Principles. https://www.fda.gov/medical-devices/software-medical-device-samd/good-machine-learning-practice-medical-device-development-guiding-principles . Accessed August 31, 2022. .
- 44. Angwin JLJ , Mattu S , Kirchner L . Machine Bias. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing ; May 23, 2016. . Accessed April 15, 2023.
- 45. A. T. Google’s Sentiment Analyzer Thinks Being Gay Is Bad . https://www.vice.com/en/article/j5jmj8/google-artificial-intelligence-bias ; 2017. . Accessed April 15, 2023.
- 46. Friedler SA , Scheidegger C , Venkatasubramanian S , Choudhary S , Hamilton EP , Roth D . A comparative study of fairness-enhancing interventions in machine learning. In: Proceedings of the Conference on Fairness, Accountability, and Transparency. 2019. : 329 – 338 . [Google Scholar]
- 47. McCloskey M , Cohen NJ . Catastrophic interference in connectionist networks: the sequential learning problem . Psychol Learn Motiv. 1989. ; 24 : 109 – 165 . [Google Scholar]
- 48. Kemker R , McClure M , Abitino A , Hayes T , Kanan C . Measuring catastrophic forgetting in neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018. : 32 ( 1 ). [Google Scholar]
- 49. Meta’s AI internet Chatbot Demo Quickly Starts Spewing Fake News and Racist Remarks . www.theregister.com/2022/08/14/in_brief_ai/ ; 2022. . Accessed October 1, 2022.
- 50. In 206, Microsoft’s Racist Chatbot Revealed the Dangers of Online Conversation . spectrum.ieee.org/in-2016-microsofts-racist-chatbot-revealed-the-dangers-of-online-conversation ; 2019. Accessed October 1, 2022. .
- 51. Fuchs DJ . The dangers of human-like bias in machine-learning algorithms . Missouri S&T’s Peer to Peer. 2018. ; 2 ( 1 ): 1 . [Google Scholar]
- 52. Parisi GI , Kemker R , Part JL , Kanan C , Wermter S . Continual lifelong learning with neural networks: a review . Neural Netw. 2019. ; 113 : 54 – 71 . [DOI] [PubMed] [Google Scholar]
- 53. AMA . CPT® overview and code approval. https://www.ama-assn.org/practice-management/cpt/cpt-overview-and-code-approval . Accessed August 31, 2022. .
- 54. Watson NF , Rosen IM , Chervin RD ; Board of Directors of the American Academy of Sleep Medicine . The past is prologue: the future of sleep medicine . J Clin Sleep Med. 2017. ; 13 ( 1 ): 127 – 135 . [DOI] [PMC free article] [PubMed] [Google Scholar]