TABLE VII.
Selected Methods for EHR Data Pre-processing
Method | Advantages | Limitations |
---|---|---|
Missing data: list-wise deletion, mean filling* [118, 119] |
Simple to implement; complete case analysis |
Loss of statistical power; introduces biases; underestimates variances |
Missing data: hot deck, nearest neighbor* [120] |
Simple to implement and interpret; immune to cross-user inconsistencies |
Introduces biases; underestimates variances |
Missing data: interpolation (linear, piece-wise linear, spline, cubic) [121] |
Simple to implement and interpret; direct estimation on the basis of neighbors |
Does not account for relationships among different features |
Missing data: model-based filling (expectation maximization, maximum likelihood, multiple imputations)* [122] |
Accounts for uncertainty in imputations |
Does not account for missing data mechanisms (i.e., MCAR, MAR, and MNAR) |
Waveforms: noise filtering (IIR, FIR, PCA, ICA, Kalman filter, wavelets) [50, 123] |
Generally simple to implement |
Falls short in situations where “true” waveform is obscured by artifact such as patient motion |
Waveforms: signal quality indices [123–125] |
Human-interpretable metrics of signal quality |
Can be complex to implement and computationally intensive; may require ad-hoc calibration based on the features of the target waveform |
Waveforms: sensor fusion [126, 127] |
Improved SNR; reduces data dimensionality while increasing data quality |
Computationally intensive; loss of detail from individual sensor waveforms |
Highly impactful method with more than 50,000 relevant papers.