Skip to main content
Nature and Science of Sleep logoLink to Nature and Science of Sleep
. 2025 Nov 1;17:2807–2818. doi: 10.2147/NSS.S559788

AI-Assisted Awake Endoscopic Video Analysis for Obstructive Sleep Apnea Detection

Wen-Sen Lai 1,2, Ting-Wei Li 3, Chung-Feng Jeffrey Kuo 3, Shao-Cheng Liu 2,
PMCID: PMC12587850  PMID: 41200474

Abstract

Objective

Obstructive sleep apnea (OSA) is a common sleep disorder characterized by upper airway obstruction during sleep, leading to hypoxia and serious health consequences. Conventional diagnostic methods such as polysomnography and drug-induced sleep endoscopy (DISE) are often costly, invasive, or time-consuming. This study aimed to develop a safe, rapid, and fully automated AI-assisted platform for OSA detection using nasopharyngoscopic videos acquired during wakefulness, and to propose diagnostic criteria comparable to the apnea–hypopnea index (AHI).

Methods

Flexible nasopharyngoscopic videos of supine, awake patients were analyzed using an AI system comprising four Xception-based image classifiers to identify scan boundaries and classify anatomical regions (nasopharynx, velopharyngeal/oropharyngeal \[VO] wall, tongue base, and epiglottis \[TE]). Five U-Net-based semantic segmentation models were then applied to extract quantitative airway features. Key variables—including maximum and minimum airway cross-sectional areas, VO wall area ratio, and TE ratio—were entered into a support vector regression model to predict OSA. A total of 103 clinical samples (59 non-OSA, 44 OSA) were analyzed, with 35 cases reserved for testing.

Results

Classification accuracy for nostril, VO/TE, vocal fold, and nasopharynx regions was 100%, 95.8%, 95%, and 98.5%, respectively. The mean intersection over union (IoU) for segmentation models reached 82.72%. The prediction model achieved 97.14% accuracy on the test set. OSA-associated thresholds were identified—VO wall area ratio < 0.41 and TE ratio > 36.97—all comparable to AHI-based diagnosis. The complete diagnostic workflow, including video upload, classification, segmentation, and prediction, was completed in an average of 85 seconds.

Conclusion

This study is the first to implement a fully automated, AI-based dynamic endoscopic video analysis for OSA detection in awake patients. The system accurately predicts OSA and localizes potential obstruction sites in a non-invasive, real-time manner, offering a practical outpatient screening tool to help select candidates who require further evaluation with polysomnography or DISE.

Keywords: Muller maneuver, drug-induced sleep endoscopy, obstructive sleep apnea, semantic segmentation, endoscopic video analysis, AI

Introduction

Obstructive sleep apnea (OSA) is an increasingly common sleep disorder associated with cardiovascular disease, metabolic dysfunction, cognitive decline, and reduced quality of life.1–4 Early diagnosis and treatment are crucial to preventing these adverse outcomes. In outpatient settings, awake endoscopy with the Müller maneuver—where the patient forcefully inhales against a closed airway—allows direct visualization of upper airway collapse and helps identify the site and severity of obstruction,5,6 despite limitations arising from its performance while the patient is awake. Drug-induced sleep endoscopy (DISE), performed under sedation to mimic sleep conditions, offers greater accuracy in localizing obstruction sites by enabling real-time observation of the airway.7–9 However, both awake endoscopy and DISE are subject to inter-observer variability, as findings can be interpreted differently by clinicians.10,11 Polysomnography (PSG) remains the gold standard for OSA diagnosis, providing comprehensive physiological measurements and an accurate apnea–hypopnea index (AHI), but its high cost and time requirements hinder its use for rapid or large-scale screening in routine practice.12–14 Therefore, there is an unmet clinical need for a rapid, noninvasive, and operator-independent tool that can efficiently identify patients who truly require further diagnostic evaluation such as PSG or DISE.

A major clinical challenge lies in developing objective and quantitative methods for interpreting endoscopic images. With advancements in image processing, computer vision, and artificial intelligence (AI), researchers have begun exploring machine learning algorithms to assist in the analysis of upper airway endoscopy. Studies have shown that computer-aided systems can facilitate anatomical assessments through techniques such as three-dimensional reconstruction, enabling precise measurement of cross-sectional areas, retropalatal distances, and dynamic airway volumes.15–19 These metrics can help identify obstruction sites and support preoperative planning by quantifying airway dimensions and collapse severity—paving the way for personalized treatment strategies. However, most previous studies have relied on manually selected static images rather than continuous video sequences or complete CT/MRI datasets. Such human selection inevitably introduces bias and limits the reproducibility of AI training and model generalization.

In our previous study, we developed an image analysis system capable of detecting subtle laryngeal anatomical changes before and after endotracheal intubation—differences often imperceptible to the human eye.20 However, that approach was limited to static image analysis and did not incorporate dynamic video or AI-based interpretation. In the present study, we aim to develop a fully automated endoscopic video analysis system using machine learning to assist in the anatomical assessment of the upper airway. This system is designed to process videos obtained during awake endoscopy with the Müller maneuver, performed in a safe, supine position. Our first goal is to establish an AI-driven algorithm capable of autonomously identifying key anatomical regions—such as the nasopharynx and oropharynx—within the recorded video. A core function of the system is to automatically extract clinically relevant frames that represent the maximal and minimal cross-sectional areas of the airway at each anatomical level. Our second goal is to perform quantitative analysis on these extracted images to calculate objective parameters for comparison between healthy subjects and patients with OSA. These extracted features will be used to train diagnostic models aimed at developing an automated screening and evaluation tool. By establishing a fully automated video-based image analysis system, our approach eliminates human bias during image selection and enables consistent, quantitative, and objective assessment. This noninvasive and operator-independent tool has the potential to complement existing diagnostic methods by assisting clinicians in identifying patients who require further invasive evaluation such as DISE or PSG.

Materials and Methods

Endoscopic video samples were prospectively collected at a single institution between January 2023 and December 2024, with all procedures performed by the same technician to ensure consistency. A total of 233 patients (aged 20–70 years) were enrolled, comprising both OSA and non-OSA cases, as determined by PSG results. The OSA group included patients with an AHI > 5, while the non-OSA group consisted of individuals undergoing endoscopic examination primarily for chronic lump sensation, all with no history or symptoms of OSA and an AHI < 5. Exclusion criteria included edentulous jaws, craniomandibular pain, recent upper airway infections, history of oropharyngeal trauma or tumors, prior radiotherapy, any neck surgery, and asthma patients requiring inhalation therapy. Nasopharyngeal endoscopy videos were recorded for 60 to 90 seconds following a standardized protocol designed to facilitate automated analysis. Video capture began at the nostril (first pause point), followed by sequential pauses at the nasopharynx, retropalatal region, retroglossal region, and finally at the vocal folds (fifth pause point), before termination. Each pause point required the endoscope to remain stationary for at least 10 seconds. The retropalatal pause focused on observing the velopharyngeal/oropharyngeal (VO) wall, while the retroglossal pause targeted the tongue base and epiglottis (TE). This protocol was adapted from the VOTE classification system used in DISE,10,21 aiming to facilitate future applications of our model for automated DISE analysis.

Frames were extracted at one frame per five frames throughout the video for image analysis. Preprocessing included Gamma correction, histogram equalization, binarization, and filtering to enhance image quality. Data augmentation techniques such as normalization (scaling pixel values to \[0,1]), rotation, scaling, cropping, and horizontal flipping were applied to expand dataset diversity and improve model feature extraction. The Sobel operator was employed to detect and remove blurry images. Additionally, color space conversion, dilation, and erosion were used to optimize segmentation accuracy and reduce noise.

An automated AI-assisted diagnostic system was developed to identify regions of interest (ROI) and assess OSA severity. The system utilized sequential image classifiers for dynamic video frame classification: (1) a nostril classifier initiated positional classification when the endoscope entered the nasal cavity, (2) a position classifier identified key anatomical regions (nasopharynx, VO wall, TE) and non-obstructive areas, (3) a termination classifier detected the vocal folds to conclude positional classification, and (4) an adenoid classifier assessed adenoidal hypertrophy in the nasopharynx. Following classification, five semantic segmentation models were applied to delineate obstructive airway structures, including the adenoids, retropalatal space, oropharyngeal wall, tongue base, and epiglottis. Quantitative measurements, such as the cross-sectional area of the VO region and the spatial relationship of the TE, were computed to facilitate OSA assessment. The system architecture is illustrated in Figure 1, while the definitions of airway measurements (VO and TE) are shown in Figure 2. Detailed model architectures for both the classifiers and segmentation models are provided in the supplementary materials.

Figure 1.

Figure 1

Diagram of the research workflow. (B) Distribution of data usage across different research phases.

Figure 2.

Figure 2

Airway Measurement Definitions (VO and TE). (A) VO-max and (B) VO-min represent airway cross-sectional areas measured during exhalation and inhalation, respectively. Automatic segmentation of the epiglottis area (E-max) and the region between the epiglottis and tongue base (AET) is demonstrated in a normal subject (C) and an OSA patient (D). Preprocessing involves detecting the highest and most lateral points of the epiglottis, rotating the superior borders of both sides to align horizontally (E), and drawing a vertical line downward to the tongue base to define and calculate the AET area (F).

Sample-Size Calculations and Statistical Analysis

The primary endpoint of this study was the area under the receiver-operating characteristic curve (AUC) for the automated VO/TE classifier. A priori sample-size calculations were performed using the `pROC::power.roc.test` procedure in R to determine the number of cases and controls required to demonstrate an AUC significantly greater than a clinically relevant benchmark. For planning purposes we considered Ho = 0.70 (a conservative clinical threshold), an anticipated AUC of 0.80 under the alternative hypothesis, two-sided α = 0.05, and power = 0.80. With the observed case/control ratio in our dataset (≈1.19 controls per case), the required sample was estimated to be approximately 162 cases. Meanwhile, in order to analyze the predicted result in more detail, the confusion matrix was employed to analyze the predicted and actual results for position classification, segmentation and OSA diagnosis. The confusion matrix provided a detailed breakdown of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) for each group. The accuracy was defined as: (TP+TN)/ (TP+TN+FP+FN), while the recall was define as TP/ (TP+FN) and the Specificity was define as TN/ (TN+FP).

Ethical Considerations

The research protocol (NO: 1–108-05-132) was reviewed and approved by the Institutional Review Board of the Tri-Service General Hospital, Taipei, Taiwan. All methods were performed in accordance with the relevant guidelines and regulations. All patients provided written informed consents prior to participation.

Results

Image Classifiers and ROI Identification

We developed a series of deep learning-based classifiers using the Xception architecture, which effectively extracted hierarchical features for anatomical identification in endoscopic videos.

In addition to the cases used for training (110 cases), the remaining patients (20 cases) were utilized for validation in positional classification. A flowchart of the patient selection process is shown in Figure 3.

Figure 3.

Figure 3

Distribution of data usage across different research phases.

The initial classifier detected the nostril with 100% accuracy (trained on 40 videos, validated on 5), serving as the trigger for positional classification (Figure 4A and B). The position classifier categorized images into four regions: nasopharynx, VO wall, TE, and non-obstructive areas, achieving 85.6% accuracy, which improved to 95.8% after image preprocessing (Figure 4C and D). The termination classifier, trained to detect when the endoscope reached the vocal folds, achieved 95% accuracy (Figure 4E and F). The adenoid classifier identified adenoid hypertrophy with 98.5% accuracy after preprocessing. These four classifiers enabled precise and automated anatomical recognition, providing the foundation for subsequent segmentation and OSA risk analysis.

Figure 4.

Figure 4

Representative Images and Classifier Training Results. (A) Nostril image. (B) Initial classifier training outcome. (C) Position classifier input image. (D) Xception classifier result. (E) Vocal fold image. (F) Termination classifier training outcome.

Semantic Segmentation and VO/TE Measurement

Semantic segmentation was performed using U-Net models to delineate four key regions: nasopharynx, adenoid, VO wall, and TE. Preprocessing steps—including grayscale conversion, binarization, and histogram equalization—enhanced boundary clarity and improved segmentation performance. For nasopharynx and adenoid segmentation, the IoU increased from 86% (RGB images) to 90% after preprocessing, using 110 training and 20 validation videos. For VO wall segmentation, color space was converted to YCbCr with grayscale substitution for the Y-channel, resulting in an average IoU of 85.38%. TE segmentation yielded IoUs of 84.28% for the tongue base and 82.89% for the epiglottis, based on 40 representative images.

Quantitative measurements were derived from the segmentation masks. The adenoid-to-nasopharynx proportion was defined as the NA ratio. VO wall analysis involved calculating maximum and minimum airway cross-sectional areas (VO-max and VO-min) (Figure 2A and B), with the VO ratio defined as VO-min / VO-max. For TE, the epiglottis area (E-max) and the area between the epiglottis and tongue base (AET-max) were used to compute the TE ratio: E-max / AET-max (Figure 2C–F). A higher TE ratio indicated a narrower space between the tongue base and epiglottis, suggesting hypertrophy or crowding of these structures. Following segmentation and data filtering, seven numerical features were extracted (Table 1). Pearson correlation analysis (n = 68) revealed that VO-max had the strongest correlation with OSA diagnosis, followed by VO-min, whereas the NA ratio showed a weaker association (Table 2). Chi-square analysis confirmed that adenoid hypertrophy (NA ratio) was not significantly related to OSA presence (Figure 5A).

Table 1.

Related Area Values

VO max area VO min area VO_ratio E_max AET_max TE_ratio Adenoid
303636
(pixel)
88518
(pixel)
29.15% 190170
(pixel)
6192
(pixel)
30.71 X

Abbreviations: VO, velopharyngeal wall; VO_ratio, VO-min / VO-max; E-max, the maximum epiglottis area; AET-max, the maximal area between the epiglottis and tongue base; TE ratio, E-max / AET-max.

Table 2.

Pearson Correlation Analysis and p-value

Variable Pearson Correlation P-value
1 VO max area −0.44449 0.00013
2 VO min area −0.44139 0.00015
3 AET_max −0.22385 0.06445
4 E_max −0.20006 0.09932
5 VO_ratio −0.16466 0.17636
6 TE_ratio 0.15341 0.20821
7 NA ratio 0.13125 0.28238

Abbreviations: VO, velopharyngeal wall; AET-max, the maximal area between the epiglottis and tongue base; E-max, the maximum epiglottis area; TE ratio, E-max / AET-max; NA, adenoid-to-nasopharynx.

Figure 5.

Figure 5

Diagnostic Metrics for OSA-Related Upper Airway Obstruction. ROC curves illustrating the diagnostic performance of various machine learning-based metrics: (A) Adenoid occupancy (NA ratio), (B) Support Vector Regression (SVR) model, (C) VO-max, (D) VO-min, (E) VO ratio, and (F) TE ratio.

VO/TE Feature Analysis and OSA Correlation

Using the same 68 datasets, a Support Vector Regression (SVR) model was trained to classify OSA status (OSA = 1, normal = 0) based on VO and TE variables (Figure 3). The model was tested on 35 samples (16 OSA cases), achieving an accuracy of 85.71% and an AUC of 0.924. After excluding E-max and AET-max variables, accuracy improved to 97.14%, with a sensitivity of 100%, specificity of 94.74%, and an AUC of 0.976. The Youden index identified an optimal threshold of 0.0701 for classification (Figure 5B). Threshold analysis indicated that a VO ratio < 0.41 corresponded to airway obstruction at the VO wall, while values > 0.41 suggested patency. However, for certain OSA patients, limited airway movement resulted in VO ratios not reaching 0.41 despite obstruction, making VO-max area (<152,988 pixels) a crucial secondary indicator (Figure 5C–E). For TE analysis, a TE ratio > 36.97 indicated a hypertrophic tongue base causing airflow obstruction, whereas a ratio < 36.97 denoted a non-obstructive condition (Figure 5F).

Automated System for OSA Recognition

We integrated the image classifiers, semantic segmentation models, and VO/TE feature analysis into a fully automated OSA recognition system (Figure 6). The system utilized a SVR model to predict OSA presence and localize airway obstruction, specifying whether the obstruction occurred at the VO wall or TE region. The system’s execution time varied based on the number of obstructed regions detected. On average, the processing time exceeded the original video duration by approximately 9 seconds. In 35 cases, the longest video lasted 91 seconds with a total execution time of 98.35 seconds, while the shortest video was 65 seconds with a processing time of 61.7 seconds, due to early termination by the termination classifier. The average processing time was 85 seconds. In a validation set of 120 cases, the system achieved an overall accuracy of 97.14%, with a sensitivity of 100% and specificity of 94.74% (Table 3).

Figure 6.

Figure 6

Workflow diagram of the proposed automated endoscopic video analysis system. The system integrates classification and segmentation models to evaluate OSA presence and distinguish between velopharyngeal (VO) and tongue base (TE) obstructions.

Table 3.

The Results of the Automated Detection System

Number of Cases Average Time Longest Time Shortest Time Accuracy Sensitivity Specificity
35 85 s 98.35 s 61.7 s 97.1% 100% 94.7%

Discussion

Flexible nasopharyngoscopy is a common diagnostic tool used by otolaryngologists in the evaluation of OSA. This outpatient-based examination offers several clinical advantages: it is radiation-free, requires no sedation (in contrast to DISE), and enables real-time visualization of upper airway structures during natural breathing and voluntary maneuvers. When combined with the Müller maneuver, which simulates negative pressure by asking the patient to inhale forcefully with the mouth closed and nostrils pinched, this technique enhances the ability to localize obstruction “hot spots” (eg, retropalatal or retroglossal regions) and grade the severity of airway collapse.22,23 Static imaging modalities such as CT or MRI can assess airway anatomy but are limited by selection bias due to variability in acquisition levels (eg, hard palate vs epiglottic level) and operator experience.22 Recent advances in digital imaging have enabled quantitative analysis of dynamic endoscopic recordings. Automated video analysis systems using edge-detection algorithms can extract sagittal and transverse airway diameters, calculate a collapsibility index, and provide objective measures of obstruction. However, most previous studies have relied on manually selected endoscopic images showing maximal or minimal airway dimensions during respiration, which cannot eliminate human selection bias. If AI-based analysis is applied only to such pre-selected images, the risk of human error remains. In contrast, our study is the first to analyze the entire endoscopic video sequence directly, automatically identifying the ROI and extracting relevant anatomical features. This fully automated, video-based approach avoids the inherent biases of static image selection and represents a novel step forward in objective airway evaluation. By providing quantitative, operator-independent analysis, our system fills an important clinical gap in early OSA screening—particularly in resource-limited or outpatient settings. Our approach could be integrated into routine endoscopic examinations for patients with suspected OSA. Clinicians could rapidly obtain AI-assisted, quantitative assessments during the same visit, allowing them to identify individuals who may benefit from further evaluation with DISE or PSG. Compared to traditional subjective grading systems like the VOTE classification, this approach significantly reduces inter-observer variability and improves diagnostic accuracy, particularly in cases of multilevel obstruction.

Previous studies have explored the correlation between upper airway anatomy and OSA severity. For instance, Sundaram et al24 reported that minimal cross-sectional area and soft palate length were significantly associated with the AHI. Additional studies have confirmed that a narrower airway and longer soft palate are linked to higher AHI values.25,26 Recent research incorporating computer-assisted image analysis and the Müller maneuver has shown promising results in predicting OSA and developing gender- and region-specific predictive models. For example, in males, cross-sectional areas at the retropalatal and retroglossal levels measured in the supine position during the Müller maneuver are reliable indicators of OSA. Similar findings have been observed in female patients. While adeno-tonsillar hypertrophy is a well-established risk factor for pediatric OSA, its impact in adults is limited due to the low prevalence (6–15%) of residual adenoidal tissue. In adults, factors such as obesity, age, and smoking play a more prominent role.27 Consistent with existing literature, our study found no significant correlation between adenoid size and OSA in our adult cohort.

In cases where the oropharynx is already narrowed or obstructed, patients may be unable to compensate by increasing VO-max (a volume-related index we analyzed) through voluntary effort, due to a critically low VO-min. A severely diminished VO-min during sleep indicates that the velopharyngeal region fails to remain patent, leading to airflow obstruction. One study demonstrated that in severe OSA, the oropharyngeal area at end-expiration is markedly narrowed, highlighting this region as a critical site of obstruction.28 Therefore, even if patients can dilate their airway during wakefulness, persistent airway collapse during sleep may require surgical intervention to improve VO-min. Procedures such as UPPP or maxillomandibular advancement aim to enlarge the oropharyngeal space and enhance lateral wall mobility to improve airway patency.

Previous studies have explored the use of endoscopic video analysis and imaging modalities for assessing upper airway dynamics in patients with obstructive sleep apnea (OSA). Su et al29 utilized endoscopic videos to measure cross-sectional collapse ratios at the retropalatal and retrolingual levels, providing detailed, site-specific data that enhanced preoperative anatomical evaluation. Although predictive models were not fully established at that time, their work laid important groundwork for precision in airway assessment. Subsequently, Miyazaki et al30 employed cine MRI to analyze dynamic airway collapsibility in awake individuals, further emphasizing the potential of non-invasive imaging for airway evaluation. Additionally, Calloway et al31 demonstrated that endoscopy offers more accurate airway measurements compared to 3D reconstructions derived from CT scans. However, a common limitation across these studies is the reliance on manual image selection. Determining which frames represent the maximal or minimal cross-sectional area of a dynamically changing airway remains subject to human judgment, inevitably introducing selection bias. This challenge is particularly significant in the context of dynamic airway assessment, where the airway dimensions fluctuate continuously with respiration. In our study, we addressed this limitation by developing an AI-based system that processes the entire endoscopic video to autonomously identify critical anatomical regions, including the velum, oropharynx, tongue base, and epiglottis. The system automatically locates the frames with the largest and smallest cross-sectional areas within these ROI across the full video sequence, eliminating the need for manual frame selection. Our methodology is conceptually aligned with the clinically established VOTE classification system used in DISE;11,21 however, unlike traditional VOTE scoring, which relies on subjective visual grading, our system provides objective, quantitative assessments of cross-sectional area changes. The derived VO and TE ratio serve as quantitative analogs to the VOTE framework, enabling an objective threshold-based interpretation that can complement or even recalibrate existing VOTE categories. For instance, while conventional VOTE classification depends on the examiner’s perception of collapse severity, our system defines these grades numerically based on measured proportional changes, thereby minimizing inter-observer variability and improving diagnostic reproducibility. This automated, video-based analysis approach offers a level of precision and consistency that has not been demonstrated in previous studies. By leveraging AI to perform comprehensive frame selection and measurement, we effectively eliminate observer bias and enhance the granularity of airway dynamics evaluation. Furthermore, our predictive model demonstrated a high accuracy of 97.14% in distinguishing between OSA and non-OSA patients in the test cohort, underscoring the clinical applicability of this automated analysis system as a reliable screening and diagnostic tool.

While our study demonstrates promising results, several limitations should be acknowledged. First, the relatively small sample size may affect the generalizability of the AI model. The performance of machine learning algorithms is highly dependent on the diversity and quality of the training data. A dataset lacking comprehensive representation of OSA severity levels, patient demographics, and clinical variations may limit the model’s applicability to broader, real-world scenarios. Second, this study did not include direct correlation between endoscopic findings and DISE results, as not all patients underwent DISE. Although awake endoscopy with the Müller maneuver provides valuable airway dynamics in a supine position, it cannot fully replicate the airway conditions observed during sleep. Consequently, anatomical findings from this method may differ from those obtained during DISE. However, the current outpatient-based endoscopic analysis remains essential, as it is impractical to perform DISE on healthy individuals or all suspected OSA patients. Therefore, developing and training AI models using outpatient awake endoscopy data represents a necessary and pragmatic approach. For broader clinical adoption, our next step will be to integrate this automated analysis system into DISE procedures, enabling AI-based evaluation of DISE videos and direct quantitative validation against the established VOTE framework to ensure consistency with current clinical standards. Briefly, our AI algorithm processes the entire video sequence to automatically identify frames with maximal and minimal cross-sectional areas and calculate VO-max, VO-min, and TE ratios, thereby eliminating operator selection bias. Unlike DISE, which requires sedation and extended procedural time, our method offers a rapid, non-invasive screening tool suitable for outpatient clinics. Although it is not intended to replace DISE, this approach can serve as a practical, time-efficient adjunct for early airway assessment and OSA screening.

Conclusion

This study introduces an automated video analysis system for dynamic endoscopy, enabling individualized assessment of VO-max, VO-min, and TE ratios to support targeted surgical planning in OSA patients. By providing early, noninvasive, and accessible screening, the proposed tool has the potential to bridge a key clinical gap in the routine evaluation of OSA. The analysis is both rapid and efficient, requiring approximately 85 seconds per case. Without human intervention, our predictive model achieved an accuracy of 97.14%, with 100% sensitivity and 94.74% specificity in identifying airway obstruction sites and predicting OSA status. While these results demonstrate the feasibility and clinical potential of AI-assisted dynamic airway assessment, further validation using larger, multicenter datasets is essential to ensure generalizability. Future work will also focus on integrating DICOM-format data, clinical parameters (eg, gender, BMI, neck circumference), and endoscopic videos from multiple platforms, including DISE. Incorporating AI-derived parameters such as VO and TE ratios into established classification frameworks like the VOTE system could further facilitate clinical translation.

Funding Statement

This study was partially supported by grants from Tri-Service General Hospital, National Defense Medical University, Taiwan (MND-MAB-D-114081) to Shao-Cheng Liu. This study was also supported by grants from Taichung Armed Forces General Hospital, Taiwan (TCAFGH_E_113048) to Wen-Sen Lai. The founders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Sharing Statement

The datasets generated from this study are available from the corresponding author on reasonable request.

Ethical Considerations

The research protocol (NO: 1-108-05-132) has been reviewed and approved by the Institutional Review Board of Tri-Service General Hospital. Written informed consent to publish has been obtained from all participants (including the healthy volunteers), and this study complied with the Declaration of Helsinki.

Author Contributions

All authors have agreed on the journal to which the article will be submitted.

All authors have reviewed and agreed on all versions of the article before submission, during revision, the final version accepted for publication, and any significant changes introduced at the proofing stage.

All authors agree to take responsibility and be accountable for the contents of the article.

The specific role of each author are as follows:

1. Wen-Sen Lai: formal analysis, data curation, investigation, writing – original draft

2. Ting-Wei Li: formal analysis, data curation, writing – original draft

3. Chung-Feng Jeffrey Kuo: formal analysis, conceptualization, writing-review and editing

4. Shao-Cheng Liu: conceptualization, data curation, formal analysis, writing-review and editing, supervision, visualization.

Disclosure

The authors have no conflicts of interest to declare.

References

  • 1.Gottlieb DJ, Punjabi NM. Diagnosis and management of obstructive sleep apnea: a review. JAMA. 2020;323(14):1389–1400. doi: 10.1001/jama.2020.3514 [DOI] [PubMed] [Google Scholar]
  • 2.Stansbury RC, Strollo PJ. Clinical manifestations of sleep apnea. J Thorac Dis. 2015;7(9):E298–310. doi: 10.3978/j.issn.2072-1439.2015.09.13 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Dempsey JA, Veasey SC, Morgan BJ, O’Donnell CP. Pathophysiology of sleep apnea. Physiol Rev. 2010;90(1):47–112. doi: 10.1152/physrev.00043.2008 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Javaheri S, Barbe F, Campos-Rodriguez F, et al. Sleep apnea: types, mechanisms, and clinical cardiovascular consequences. J Am Coll Cardiol. 2017;69(7):841–858. doi: 10.1016/j.jacc.2016.11.069 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Owens RL, Eckert DJ, Yeh SY, Malhotra A. Upper airway function in the pathogenesis of obstructive sleep apnea: a review of the current literature. Curr Opin Pulm Med. 2008;14(6):519–524. doi: 10.1097/MCP.0b013e3283130f66 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Bachar G, Feinmesser R, Shpitzer T, Yaniv E, Nageris B, Eidelman L. Laryngeal and hypopharyngeal obstruction in sleep disordered breathing patients, evaluated by sleep endoscopy. Eur Arch Otorhinolaryngol. 2008;265(11):1397–1402. doi: 10.1007/s00405-008-0637-5 [DOI] [PubMed] [Google Scholar]
  • 7.Huntley C, Chou D, Doghramji K, Boon M. Preoperative drug induced sleep endoscopy improves the surgical approach to treatment of obstructive sleep apnea. Ann Otol Rhinol Laryngol. 2017;126(6):478–482. doi: 10.1177/0003489417703408 [DOI] [PubMed] [Google Scholar]
  • 8.Ha JG, Lee Y, Nam JS, et al. Can drug-induced sleep endoscopy improve the success rates of tongue base surgery? J Otolaryngol Head Neck Surg. 2020;49(1):8. doi: 10.1186/s40463-020-00405-w [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Suzuki M, Kamei M, Kuboki R, Kawai Y, Kotani R. Prevalence of sleep-disordered breathing in patients who underwent endoscopic nasal sinus surgery: investigation of nasal obstruction as a risk factor for sleep-disordered breathing. Auris Nasus Larynx. 2025;52(4):451–455. doi: 10.1016/j.anl.2025.05.007 [DOI] [PubMed] [Google Scholar]
  • 10.Da cunha Viana A, Mendes DL, de Andrade Lemes LN, Thuler LC, Neves DD, de Araujo-Melo MH. Drug-induced sleep endoscopy in the obstructive sleep apnea: comparison between NOHL and VOTE classifications. Eur Arch Otorhinolaryngol. 2017;274(2):627–635. doi: 10.1007/s00405-016-4081-7 [DOI] [PubMed] [Google Scholar]
  • 11.Dijemeni E, D’Amone G, Gbati I. Drug-induced sedation endoscopy (DISE) classification systems: a systematic review and meta-analysis. Sleep Breath. 2017;21(4):983–994. doi: 10.1007/s11325-017-1521-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Wang T, Lu C, Shen G. Detection of sleep apnea from single-lead ECG signal using a time window artificial neural network. Biomed Res Int. 2019;2019:9768072. doi: 10.1155/2019/9768072 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Cen L, Yu ZL, Kluge T, Ser W. Automatic system for obstructive sleep apnea events detection using convolutional neural network. Annu Int Conf IEEE Eng Med Biol Soc. 2018;2018:3975–3978. doi: 10.1109/EMBC.2018.8513363 [DOI] [PubMed] [Google Scholar]
  • 14.Park MJ, Choi JH, Kim SY, Ha TK. A deep learning algorithm model to automatically score and grade obstructive sleep apnea in adult polysomnography. Digit Health. 2024;10:20552076241291707. doi: 10.1177/20552076241291707 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Hanif U, Kezirian E, Kiar EK, Mignot E, Sorensen HBD, Jennum P. Upper airway classification in sleep endoscopy examinations using convolutional recurrent neural networks(). Annu Int Conf IEEE Eng Med Biol Soc. 2021;2021:3957–3960. doi: 10.1109/EMBC46164.2021.9630098 [DOI] [PubMed] [Google Scholar]
  • 16.Khosravi P, Kazemi E, Zhan Q, et al. Deep learning enables robust assessment and selection of human blastocysts after in vitro fertilization. NPJ Digit Med. 2019;2:21. doi: 10.1038/s41746-019-0096-y [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Li Z, He Y, Keel S, Meng W, Chang RT, He M. Efficacy of a deep learning system for detecting glaucomatous optic neuropathy based on color fundus photographs. Ophthalmology. 2018;125(8):1199–1206. doi: 10.1016/j.ophtha.2018.01.023 [DOI] [PubMed] [Google Scholar]
  • 18.Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans Pattern Anal Mach Intell. 2018;40(4):834–848. doi: 10.1109/TPAMI.2017.2699184 [DOI] [PubMed] [Google Scholar]
  • 19.Badrinarayanan V, Kendall A, Cipolla R. SegNet: a deep convolutional encoder-decoder architecture for image segmentation. IEEE Trans Pattern Anal Mach Intell. 2017;39(12):2481–2495. doi: 10.1109/TPAMI.2016.2644615 [DOI] [PubMed] [Google Scholar]
  • 20.Kuo CFJ, Barman J, Liu SC. Quantitative measurement of adult human larynx post general anesthesia with intubation. Int J Med Sci. 2022;19(3):425–433. doi: 10.7150/ijms.69425 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lan MC, Liu SY, Lan MY, Modi R, Capasso R. Lateral pharyngeal wall collapse associated with hypoxemia in obstructive sleep apnea. Laryngoscope. 2015;125(10):2408–2412. doi: 10.1002/lary.25126 [DOI] [PubMed] [Google Scholar]
  • 22.Hsu PP, Tan BY, Chan YH, et al. Yahya cohen memorial lecture: clinical predictors in obstructive sleep apnoea patients with computer-assisted quantitative videoendoscopic upper airway analysis. Ann Acad Med Singap. 2005;34(11):703–711. doi: 10.47102/annals-acadmedsg.V34N11p703 [DOI] [PubMed] [Google Scholar]
  • 23.Delides A, Viskos A. Fractal quantitative endoscopic evaluation of the upper airway in patients with obstructive sleep apnea syndrome. Otolaryngol Head Neck Surg. 2010;143(1):85–89. doi: 10.1016/j.otohns.2010.03.022 [DOI] [PubMed] [Google Scholar]
  • 24.Sundaram S, Bridgman SA, Lim J, Lasserson TJ. Surgery for obstructive sleep apnoea. Cochrane Database Syst Rev. 2005;(4):CD001004. doi: 10.1002/14651858.CD001004.pub2 [DOI] [PubMed] [Google Scholar]
  • 25.Lim JS, Lee JW, Han C, Kwon JW. Correlation of soft palate length with velum obstruction and severity of obstructive sleep apnea syndrome. Auris Nasus Larynx. 2018;45(3):499–503. doi: 10.1016/j.anl.2017.07.023 [DOI] [PubMed] [Google Scholar]
  • 26.Shigeta Y, Ogawa T, Tomoko I, Clark GT, Enciso R. Soft palate length and upper airway relationship in OSA and non-OSA subjects. Texm Dent J. 2013;130(3):203–211. [PubMed] [Google Scholar]
  • 27.Jordan AS, McSharry DG, Malhotra A. Adult obstructive sleep apnoea. Lancet. 2014;383(9918):736–747. doi: 10.1016/S0140-6736(13)60734-5 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Chousangsuntorn K, Bhongmakapat T, Apirakkittikul N, Sungkarat W, Supakul N, Laothamatas J. Upper airway areas, volumes, and linear measurements determined on computed tomography during different phases of respiration predict the presence of severe obstructive sleep apnea. J Oral Maxillofac Surg. 2018;76(7):1524–1531. doi: 10.1016/j.joms.2017.11.041 [DOI] [PubMed] [Google Scholar]
  • 29.Ko MT, Su CY. Computer-assisted quantitative evaluation of obstructive sleep apnea using digitalized endoscopic imaging with Muller maneuver. Laryngoscope. 2008;118(5):909–914. doi: 10.1097/MLG.0b013e3181638187 [DOI] [PubMed] [Google Scholar]
  • 30.Fukushi I, Okada Y. Dynamics of upper airways during the Muller maneuver in healthy subjects: a cine MRI study. Adv Exp Med Biol. 2013;788:189–195. doi: 10.1007/978-94-007-6627-3_28 [DOI] [PubMed] [Google Scholar]
  • 31.Calloway HE, Kimbell JS, Davis SD, et al. Comparison of endoscopic versus 3D CT derived airway measurements. Laryngoscope. 2013;123(9):2136–2141. doi: 10.1002/lary.23836 [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The datasets generated from this study are available from the corresponding author on reasonable request.


Articles from Nature and Science of Sleep are provided here courtesy of Dove Press

RESOURCES