Table 3.
Strategies applied in research articles to counter issues of RHIS data
| Type of strategy | Description of strategy |
|---|---|
| Missing data | |
| Exclusion | Exclude facility data if a certain threshold was reached (e.g. more than two-thirds of months in a year; more than a sixth of baseline data; facilities with any missing data) |
| Restrict analysis to a period with a low level of missing data | |
| Sensitivity analysis to compare analysis of restricted period and full period | |
| Imputation | Assign missing observations with mean-value for the year |
| Assign missing observations with the average of precedent and subsequent data | |
| Imputation using conditional autoregressive model | |
| Missing value was replaced as positive (binary form) to prevent exaggeration of the fade-out effect | |
| Sensitivity analysis of imputation strategies: 1) single imputation using means, trimmed means, and median, 2) Poisson generalized linear modeling, 3) iterative singular value decomposition method | |
| Interpolation | Interpolation using space-time kriging |
| Adjust results by dividing each indicator by the percentage of reports submitted | |
| Adjust the data by calibrating to the total population using proportion reported in a household survey to have occurred in health facilities | |
|
Verification Account in the modeling method |
Manual verification of the missing data with register at the health facility |
| Missing data was assumed missing at random and accounted for in the mixed-effect models using standard maximum likelihood estimation | |
| Identifying extreme values | |
| Specific threshold | Establishing a lower and upper limit based on proportion of the annual average or feasible value |
| Univariate regression on individual facility-level to identify deviation from the mean time trend (e.g. if exceed 8 standard deviations) | |
| Visual | Visual inspection of outliers |
| Analytic assessment | Jackknifing analysis to assess influence |
| Student residual higher than an absolute value of 2 and influence on the estimated coefficients determined by high Cook’s distance statistics | |
| Handling of extreme values | |
| Exclusion | Extreme values were excluded from analyses |
| Replacing extreme value with average | Extreme values were assigned the average value of the year; with exceptions of low average values |
| Replacing extreme value with missing | Outliers set to missing |
| Verification with data source | Any drastic change in monthly data reported electronically were manually verified with register at the health facility. Discrepancies were replaced with data in the register |
| Discount observation in estimation | Outliers were allocated a dummy coding to discount the observation in the calculation of coefficients |
| Assess reliability | |
| Data validation process | Randomly selected 10% of the total sample to check accuracy and reliability of data with reports and registers |
| Verify data with another source (e.g. payroll) | |
| Established routine data validation process by health information and records officer (e.g. monthly data review meetings) | |