Table 2.
Approaches to handling missing data
Method | Description | Limitations |
---|---|---|
Replacement with mean or median | Inserts the mean or median of the whole dataset in place of missing data | Reduces the variance |
Last observation carried forward/back | Inserts the last observation in place of the missing data points | Ignores existing trends in the data |
Reduces variance | ||
Weakens covariance and correlations | ||
Linear interpolation | Assumes a linear relationship between two points and uses non-missing values from adjacent points to compute a value for the missing data points | Inappropriate in oscillatory data |
Regression substitution | Predicts the most likely value of the missing data | May overestimate model fit |
Does not quantify uncertainty about that value | ||
Reduces variance | ||
Maximum likelihood estimation | Identified likely set of values based on observed data. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data | Limited to linear models |
Multiple imputation | Plausible values for missing observations are created that reflect uncertainty. These values are used to impute the missing values. This process is repeated, to create a number of ‘completed' datasets. Each of these datasets is separately analyzed. The results are then combined allowing the uncertainty of the imputation to be taken into account | Complex to employ |
Choosing the correct model can be difficult | ||
Dimensionality | Views missing data as an additional dimension within the data | Complex to employ. May be difficult to interpret |