Skip to main content
. 2019 Jun 7;9:162. doi: 10.1038/s41398-019-0484-8

Table 2.

Approaches to handling missing data

Method Description Limitations
Replacement with mean or median Inserts the mean or median of the whole dataset in place of missing data Reduces the variance
Last observation carried forward/back Inserts the last observation in place of the missing data points Ignores existing trends in the data
Reduces variance
Weakens covariance and correlations
Linear interpolation Assumes a linear relationship between two points and uses non-missing values from adjacent points to compute a value for the missing data points Inappropriate in oscillatory data
Regression substitution Predicts the most likely value of the missing data May overestimate model fit
Does not quantify uncertainty about that value
Reduces variance
Maximum likelihood estimation Identified likely set of values based on observed data. The maximum likelihood estimate of a parameter is the value of the parameter that is most likely to have resulted in the observed data Limited to linear models
Multiple imputation Plausible values for missing observations are created that reflect uncertainty. These values are used to impute the missing values. This process is repeated, to create a number of ‘completed' datasets. Each of these datasets is separately analyzed. The results are then combined allowing the uncertainty of the imputation to be taken into account Complex to employ
Choosing the correct model can be difficult
Dimensionality Views missing data as an additional dimension within the data Complex to employ. May be difficult to interpret