Skip to main content
. Author manuscript; available in PMC: 2018 Mar 20.
Published in final edited form as: IEEE Trans Biomed Eng. 2016 Oct 10;64(2):263–273. doi: 10.1109/TBME.2016.2573285

TABLE VII.

Selected Methods for EHR Data Pre-processing

Method Advantages Limitations
Missing data: list-wise
deletion, mean filling* [118,
119]
Simple to implement;
complete case analysis
Loss of statistical power;
introduces biases;
underestimates variances
Missing data: hot deck,
nearest neighbor* [120]
Simple to implement and
interpret; immune to
cross-user inconsistencies
Introduces biases;
underestimates variances
Missing data: interpolation
(linear, piece-wise linear,
spline, cubic) [121]
Simple to implement and
interpret; direct estimation
on the basis of neighbors
Does not account for
relationships among
different features
Missing data: model-based
filling (expectation
maximization, maximum
likelihood, multiple
imputations)* [122]
Accounts for uncertainty
in imputations
Does not account for
missing data mechanisms
(i.e., MCAR, MAR, and
MNAR)
Waveforms: noise filtering
(IIR, FIR, PCA, ICA,
Kalman filter, wavelets)
[50, 123]
Generally simple to
implement
Falls short in situations
where “true” waveform is
obscured by artifact such as
patient motion
Waveforms: signal quality
indices [123125]
Human-interpretable
metrics of signal quality
Can be complex to
implement and
computationally intensive;
may require ad-hoc
calibration based on the
features of the target
waveform
Waveforms: sensor fusion
[126, 127]
Improved SNR; reduces
data dimensionality while
increasing data quality
Computationally intensive;
loss of detail from
individual sensor
waveforms
*

Highly impactful method with more than 50,000 relevant papers.