Skip to main content
Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine logoLink to Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine
. 2020 Feb 15;16(2):161–162. doi: 10.5664/jcsm.8260

Big data: mind your Ps and Qs—the importance of precision and quality in data validation

Tracey L Stierer 1,
PMCID: PMC7053019  PMID: 31992426

CITATION

Stierer TL. Big data: mind your ps and qs—the importance of precision and quality in data validation. J Clin Sleep Med. 2020;16(2):161.–162.


Big data and artificial intelligence have virtually transformed the delivery and quality of United States health care over the past decade.1 Integrated electronic health records, insurance claims, biobanks, and registries have allowed unprecedented sharing of extensive patient data across multiple clinical settings contributing insights into population health as well as driving precision medicine for the individual patient. While analytical real-world evidence and machine learning hold the promise of facilitating earlier diagnosis, improving outcomes, and enhancing patient satisfaction, large volume data comes with inherent challenges including data verification, population bias, and diagnostic coding accuracy.

In this issue of the Journal of Clinical Sleep Medicine, Keenan et al assess the performance of a simple electronic health record (EHR)-based case-identification algorithm using ICD-9 and ICD-10 codes to distinguish patients with diagnosed obstructive sleep apnea (OSA) from patients not diagnosed with the disorder across six health systems in the United States.2 The authors found that in their multisite cohort, an algorithm defining OSA cases as individuals with at least two separate encounters associated with diagnostic codes for sleep apnea in the EHR and non-OSA cases as individuals with no codes yielded high positive and negative predictive values.

This investigation builds upon prior work performed by Laratta et al, who was able to validate a claims-based algorithm that identified true cases of OSA with a high degree of specificity in a cohort of Canadian patients referred for sleep testing as opposed to McIsaac et al, also from Canada, whose group found that no code or combination of codes provided a likelihood ratio high enough to adequately identify patients with OSA who underwent preoperative polysomnogram (PSG).3,4

The appeal of a high performing simple algorithm that accurately classifies cases in a binary manner as having or not having a disease or condition is obvious. However, there are still questions that linger and indicate that the risk of case misclassification in this study may not be fully appreciated by the authors. This is exemplified by the chart review process used to determine true cases of which only 72% had objective data in the form of results from a nocturnal polysomnogram or home sleep test. The remainder of cases used alternate criteria which included the presence of OSA on the patient’s EHR problem list. Clinicians have long been aware that several components of the EHR, including current medication and problem lists are prone to errors that are carried over and perpetuated with each subsequent visit which has led to a national effort to promote improved medication reconciliation; however, it is still unclear who is responsible for updating and maintaining an accurate problem list.5 Primary care providers appear to be better than specialists in ensuring that the problem list reflects the patient’s current condition, nevertheless, Daskavitch in 2018 found that data derived from an EHR problem list did not have sufficient accuracy to be useful in risk adjustment.6,7 Therefore, using the presence of a condition in the problem list as a surrogate for objective data may not be a valid method for case identification for individuals with missing data.

On the other hand, cases identified as non-OSA present an additional source of possible misclassification. Young et al showed that up to 90% of individuals with OSA in a general population had yet to receive a formal diagnosis.8 The validation of non-OSA case status utilizing manual chart review further introduces a factor of human error. Additionally, the absence of coding may be reflective of visits where screening for signs and symptoms of OSA was not performed. In a study of 44 randomly selected sites within a practice-based research network, only 20% of patients with sleep related symptoms spontaneously reported their symptoms to their primary care physician.9 The authors employed the use of the Symptomless Multivariable Apnea Prediction Index to estimate cases with a high risk of OSA among those classified as non-OSA. However, Lyon et al’s validation of this prediction tool was conducted in a surgical population restricted to only those patients who received general anesthesia and used case confirmation as those with a prior formal diagnosis of OSA with no objective data to confirm absence of the condition.10 When examined by the US Preventative Services Task Force, studies analyzing the performance of Multivariable Apnea Prediction Index, which includes symptoms associated with OSA, against home sleep apnea testing devices showed that there was oversampling of patients with high risk for OSA.11 Furthermore, the accuracy of home sleep apnea testing used to validate the tool showed sensitivities and specificities ranging from 7% to 100% and 15% to 100% respectively using an apnea-hypopnea index (AHI) ≥ 15 events/h.

Finally, there is the question of the adequacy of the AHI to accurately characterize the different phenotypes of sleep disordered breathing or reflect the true heterogeneity of the diagnosis of OSA. This may be best illustrated by the patient with severe positional or REM sleep-related sleep-disordered breathing with an overall normal AHI or respiratory disturbance index due to the patient sleeping in the lateral or prone position for the majority of total sleep time during PSG or with only a small percent of total sleep time spent in REM sleep. Variability in interpretation of PSG data based solely upon overall AHI and respiratory disturbance index will continue to confound investigations assessing risk and outcome especially in the perioperative arena where patients are frequently required to recover in the supine position and whose sleep architecture may be dramatically altered by anesthesia and surgical procedure, 24-hour activity with interruption of sleep to obtain vital signs and administer medications, and exposure to artificial light.

The logarithmic increase in knowledge gained leveraging the vast information available using big data and artificial intelligence has the capacity to dramatically alter our approach to patient care. Each study like Keenan’s examining and validating the performance of algorithms used to categorize presence of disease status help to refine the appropriate application of data to identify risk and provide precision treatments to enhance population health. We must also remain cautious in our interpretation of these studies as we have not yet been able to completely eliminate a variety of biases and if not careful, risk missing the mark when utilizing this data to determine surveillance, treatment, and health policy decisions.

DISCLOSURE STATEMENT

The author reports no conflicts of interest.

REFERENCES

  • 1.Horgan D, Romao M, Morré SA, Kalra D. Artificial intelligence: power for civilisation - and for better healthcare. Public Health Genomics. 2019:1–17. doi: 10.1159/000504785. [Epub ahead of print] [DOI] [PubMed] [Google Scholar]
  • 2.Keenan BT, Kirchner HL, Veatch OJ, et al. Multisite validation of a simple electronic health record algorithm for identifying diagnosed obstructive sleep apnea. J Clin Sleep Med. 2020;16(2):175–183. doi: 10.5664/jcsm.8160. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Laratta CR, Tsai WH, Wick J, Pendharkar SR, Johannson KA, Ronksley PE. Validity of administrative data for identification of obstructive sleep apnea. J Sleep Res. 2017;26(2):132–138. doi: 10.1111/jsr.12465. [DOI] [PubMed] [Google Scholar]
  • 4.McIsaac DI, Gershon A, Wijeysundera D, Bryson GL, Badner N, van Walraven C. Identifying obstructive sleep apnea in administrative data: a study of diagnostic accuracy. Anesthesiology. 2015;123(2):253–263. doi: 10.1097/ALN.0000000000000692. [DOI] [PubMed] [Google Scholar]
  • 5.Carpenter JD, Gorman PN. Using medication list–problem list mismatches as markers of potential error. Proc AMIA Symp. 2002:106–110. [PMC free article] [PubMed] [Google Scholar]
  • 6.Luna D, Franco M, Plaza C, et al. Accuracy of an electronic problem list from primary care providers and specialists. Stud Health Technol Inform. 2013;192:417–421. [PubMed] [Google Scholar]
  • 7.Daskivich TJ, Abedi G, Kaplan SH, et al. Electronic health record problem lists: accurate enough for risk adjustment? Am J Manag Care. 2018;24(1):e24–e29. [PubMed] [Google Scholar]
  • 8.Young T, Evans L, Finn L, Palta M. Estimation of the clinically diagnosed proportion of sleep apnea syndrome in middle-aged men and women. Sleep. 1997;20(9):705–706. doi: 10.1093/sleep/20.9.705. [DOI] [PubMed] [Google Scholar]
  • 9.Mold JW, Quattlebaum C, Schinnerer E, Boeckman L, Orr W, Hollabaugh K. Identification by primary care clinicians of patients with obstructive sleep apnea: a practice-based research network (PBRN) study. J Am Board Fam Med. 2011;24(2):138–145. doi: 10.3122/jabfm.2011.02.100095. [DOI] [PubMed] [Google Scholar]
  • 10.Lyons MM, Keenan BT, Li J, et al. Symptomless Multi-Variable Apnea Prediction Index assesses obstructive sleep apnea risk and adverse outcomes in elective surgery. Sleep. 2017;40(3) doi: 10.1093/sleep/zsw081. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Jonas DE, Amick HR, Feltner C, et al. Screening for obstructive sleep apnea in adults: evidence report and systematic review for the US Preventive Services Task Force. JAMA. 2017;317(4):415–433. doi: 10.1001/jama.2016.19635. [DOI] [PubMed] [Google Scholar]

Articles from Journal of Clinical Sleep Medicine : JCSM : Official Publication of the American Academy of Sleep Medicine are provided here courtesy of American Academy of Sleep Medicine

RESOURCES