Table 2.
Approaches to handling missing data
| Method | Description | Limitations |
|---|---|---|
| Replacement with mean or median | Inserts the mean or median of the whole dataset in place of missing data | Reduces the variance |
| Last observation carried forward/back | Inserts the last observation in place of the missing data points | Ignores existing trends in the data |
| Reduces variance | ||
| Weakens covariance and correlations | ||
| Linear interpolation | Assumes a linear relationship between two points and uses non-missing values from adjacent points to compute a value for the missing data points | Inappropriate in oscillatory data |
| Regression substitution | Predicts the most likely value of the missing data | May overestimate model fit |
| Does not quantify uncertainty about that value | ||
| Reduces variance | ||
| Maximum likelihood estimation | Identified likely set of values based on observed data. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data | Limited to linear models |
| Multiple imputation | Plausible values for missing observations are created that reflect uncertainty. These values are used to impute the missing values. This process is repeated, to create a number of ‘completed' datasets. Each of these datasets is separately analyzed. The results are then combined allowing the uncertainty of the imputation to be taken into account | Complex to employ |
| Choosing the correct model can be difficult | ||
| Dimensionality | Views missing data as an additional dimension within the data | Complex to employ. May be difficult to interpret |