Skip to main content
Heliyon logoLink to Heliyon
. 2023 Mar 21;9(4):e14648. doi: 10.1016/j.heliyon.2023.e14648

The impact of mortality underreporting on the association of ambient temperature and PM10 with mortality risk in time series study

Ziqiang Lin a, Wayne R Lawrence b, Weiwei Gong c, Lifeng Lin d, Jianxiong Hu e, Sui Zhu a, Ruilin Meng d, Guanhao He a, Xiaojun Xu d, Tao Liu a, Jieming Zhong c, Min Yu c, Karin Reinhold f, Wenjun Ma a,
PMCID: PMC10070596  PMID: 37025823

Abstract

Properly analyzing and reporting data remains a challenging task in epidemiologic research, as underreporting of data is often overlooked. The evaluation on the effect of underreporting remains understudied. In this study, we examined the effect of different scenarios of mortality underreporting on the relationship between PM10, temperature, and mortality. Mortality data, PM10, and temperature data in seven cities were obtained from Provincial Center for Disease Control and Prevention (CDC), China Meteorological Data Sharing Service System, and China National Environmental Monitoring Center, respectively. A time-series design with a distributed lag nonlinear model (DLNM) was used to examine the effects of five mortality underreporting scenarios: 1) Random underreporting of mortality; 2) Underreporting is monotonically increasing (MI) or monotonically decreasing (MD); 3) Underreporting due to holiday and weekends; 4) Underreporting occurs before the 20th day of each month, and these underreporting will be added after the 20th day of the month; and 5) Underreporting due to holiday, weekends, MI, and MD. We observed that underreporting at random (UAR) scenario had little effect on the association between PM10, temperature, and daily mortality. However, other four underreporting not at random (UNAR) scenarios mentioned above had varying degrees of influence on the association between PM10, temperature, and daily mortality. Additionally, in addition to imputation under UAR, the variation of minimum mortality temperature (MMT) and attributable fraction (AF) of mortality attributed to temperature in the same imputation scenarios is inconsistent in different cities. Finally, we observed that the pooled excess risk (ER) below MMT was negatively associated with mortality and the pooled ER above MMT was positively associated with mortality. This study showed that UNAR impacted the association between PM10, temperature, and mortality, and potential underreporting should be dealt with before analyzing data to avoid drawing invalid conclusions.

Keywords: Underreporting, Mortality, Temperature, Underreporting at random, PM10

1. Introduction

Obtaining meteorological and population data has become easier, where the use of time series data to study the impact of air pollution or meteorological factors on health in environmental epidemiology is a common approach [[1], [2], [3]]. However, it is precisely because of the easy availability of data that current studies use mortality data directly for analysis without considering their quality. Among them, the underreporting of mortality is an important aspect of data quality. Underreporting refers to an incomplete count of events of interest for various reasons. For example, data capture on day of death can be incorrectly captured due to holidays, continuous improvement of the monitoring system, gradual deterioration of the monitoring system, or the centralized reporting at the end of the year or month. In epidemiologic research, properly handling missing data remains an important topic of discussion [[4], [5], [6], [7]]. Underreporting of data can reduce statistical power and produce biased estimates that can lead to invalid conclusions [8]. Despite the large literature on handling missing data, approaches on addressing underreporting in epidemiologic studies remains understudied. Generally, although data underreporting does not reduce the statistical power, it can threaten study validity and lead to invalid conclusions. Until recently, most studies have drawn conclusions based on the assumption of a complete data set. The general discussion on data underreporting remains largely overlooked, despite the adverse effect it can have on statistical inference. Moreover, inaccuracies due to underreporting can adversely impact public health prevention, response, and interventions.

Underreporting can have a large impact on study replicability. Replicability refers to the chance of obtaining consistent results from independent studies with similar designs and research questions. Therefore, this is a major concern because it calls into question the credibility of the research community [9]. The lack of handling of underreporting in studies may reduce the chances of achieving high replicability. For example, the results are inconsistent or even contradict from some previous studies focused on the mortality effects of ambient temperature or air pollutants based on time series study design, and one of possible reasons may be mortality underreporting. However, it is unclear how data underreporting affects the use of time series data to analyze exposure-response relationships in environmental epidemiology. Therefore, this is an important scientific question that needs to be solved urgently in environmental epidemiology.

In this study, we examined the effects of different scenarios of mortality underreporting on the relationships between temperature, PM10, and mortality using time series data from China. To achieve this, we employed five mortality underreporting scenarios to evaluate their impacts on the relationships between temperature or PM10 exposure and mortality: 1) Random underreporting of mortality for a certain number of days; 2) Mortality underreporting is monotonically increasing or monotonically decreasing; 3) Underreporting due to holiday and weekends; 4) Underreporting occurs before the 20th of each month, and these underreporting will be added after the 20th of the month; and 5) Mortality underreporting is monotonically increasing or monotonically decreasing, while holidays and weekends also produce underreporting. Our aim was to provide greater understanding of the effect of mortality underreporting on the use of time series data to analyze exposure-response relationships, thereby providing guidance for future use of time series design to assess mortality burden of ambient temperature and air pollutants in environmental epidemiology.

2. Data collection and method

2.1. Data source

In the present study, seven Chinese cities (Guangzhou, Jiangmen, Zhuhai, Hangzhou, Wenzhou, Ningbo, and Jinhua) were chosen from Guangdong Province and Zhejiang Province. To ensure sufficient statistical power, these seven cities had to have a population of at least 200,000 people or an annual mortality rate greater than 0.44% (Annual mortality rate for Guangzhou, Jiangmen, Zhuhai, Hangzhou, Wenzhou, Ningbo, and Jinhua are 0.60%, 0.65%, 0.59%, 0.64%, 0.51%, 0.45%, and 0.60%, respectively). This study was approved by the Ethics Committee of Guangdong Provincial Center for Disease Control and Prevention (No. 2019025).

Daily mortality data (2013–2017) were obtained from the corresponding provincial Centers for Disease Control and Prevention (CDC). Additionally, we obtained daily mean temperature and relative humidity (RH) from the China Meteorological Data Sharing Service System (http://data.cma.cn/), and particulate matter (PM) with an aerodynamic diameter of 10 μm or less (PM10) from the China National Environmental Monitoring Center.

2.2. Data analysis

According to the missing data generation mechanisms described by Rubin [[10], [11], [12]], we divided data underreporting into the following two mechanisms: i) underreporting at random (UAR); and ii) underreporting not at random (UNAR). UAR implies that data underreporting is random, and there were no pattern to follow, while UNAR implies that data underreporting is traceable, and the approach to handling this type of underreporting data generation mechanism is heavily dependent on the cause of the underreporting.

2.3. Underreporting scenario

Scenario 1: UAR - Mortality underreporting at random for a certain number of days.

Suppose “a” is the percentage of days that may be underreported, and “b” is the percentage of mortality that may be underreported in that day, then.

  • 1.

    Modelling original data (d1) using Distributed-lag nonlinear model (DLNM) (S1)

  • 2.

    Randomly select a percentage of days and add “b” percentage of mortality to that day (d2).

  • 3.

    Modelling imputed data (d2) using DLNM (S2).

  • 4.

    Repeat step 2–3 for 10,000 times.

Scenario 2: UNAR - Mortality underreporting is monotonically increasing or monotonically decreasing.

Some data collection sites are improving, and others are getting worse over time. In this case, we assume monotonically increasing (MI) and monotonically decreasing (MD) methods to impute the data.

  • 1.

    Modelling original data (d1) using DLNM (S1).

  • 2.

    Mortality data increased by 10% for the first 10% of days, 20% for the second 10% of days, and so on (d2).

  • 3.

    Modelling imputed data (d2) using DLNM (S2).

  • 4.

    Mortality data increased by 100% for the first 10% of days, 90% for the second 10% of days, and so on (d3).

  • 5.

    Modelling imputed data (d3) using DLNM (S3).

2.4. Scenario 3: UNAR – mortality underreporting during holidays and weekends

Suppose mortality underreporting only occurred on national holidays and weekends, and the number of underreporting was within 50%, then.

  • 1.

    Modelling original data (d1) using DLNM (S1).

  • 2.

    Randomly increased the mortality from 0% to 100% on holidays and weekends (d2).

  • 3.

    Modelling imputed data (d2) using DLNM (S2).

  • 4.

    Repeat step 2–3 for 10,000 times.

Scenario 4: UNAR – Mortality underreporting occurred before the 20th day of each month, and these underreporting are added after the 20th day of the month.

In this case, we assumed after the 20th of every month, mortality counts increase by n%. We removed n% after the 20th of each month and add these mortality counts back randomly before the 1st–20th. (Assuming n = 10–50).

  • 1.

    Modelling original data (d1) using DLNM (S1).

  • 2.

    n% of the death removed after the 20th of each month, and added to the 1st–20th randomly (d2).

  • 3.

    Modelling imputed data (d2) using DLNM (S2).

  • 4.

    Repeat step 2–3 for 10,000 times.

Scenario 5: UNAR – Mortality underreporting is monotonically increasing or monotonically decreasing, and holidays and weekends also produce underreporting.

We assumed mortality underreporting due to scenario 2 and scenario 3 together.

  • 1.

    Modelling original data (d1) using DLNM (S1).

  • 2.

    Mortality data increased by 10% for the first 10% of days, 20% for the second 10% of days, and so on, and mortality randomly increased from 0% to 100% on holidays and weekends (d2).

  • 3.

    Modelling imputed data (d2) using DLNM (S2).

  • 4.

    Mortality data increased by 100% for the first 10% of days, 90% for the second 10% of days, and so on, and mortality randomly increased from 0% to 100% on holidays and weekends (d3).

  • 5.

    Modelling imputed data (d3) using DLNM (S3).

  • 6.

    Repeat step 2–3 for 10,000 times.

2.5. Statistical analysis

DLNM [13] and multivariate meta-analysis method [14] were used to examine associations between daily mean temperature and mortality. Since mortality is a count variable, we used a DLNM with a Quasi-Poisson distribution function to investigate the association between temperature, PM10, and mortality at each location:

E(Mortalityt)=α+βTt,l(TM)+δPt,l(PM10)+ns(RH,df)+ns(time,df)+γDOW

where t is the day of observation, Mortalityt is the observed daily mortality on day t, α is the intercept indicting the baseline risk, TM is the daily mean temperature, Tt,l is a matrix obtained by applying the cross-basis function from DLNM to mean temperature, β is the vector of coefficients for Tt,l, l represents the number of lag days, Pt,l is a matrix obtained by applying the cross-basis function from DLNM to PM10, δ is the vector of coefficients for Pt,l, and γ is the coefficient for day of the week (DOW). A quadratic B-spline (bs) was employed to estimate the non-linear and lagged effect of mean temperature, and we placed three internal knots (the 10th, 50th, and 90th percentiles of location-specific temperature distributions) to model the non-linear effects of temperature. The maximal lag for mean temperature was set to equal 7 [[15], [16], [17]]. A linear function was employed to estimate the lagged effect of PM10, and maximal lag for PM10 was set to equal to 1 (lags of 0–5 days were examined to determine which lag day's exposure had the greatest association with mortality). Additionally, 7° of freedom (df) per year was used to control for the seasonal and long-term trends in mortality, and 3 dfs was used in RH [16,17]. Additionally, multivariate meta-analysis method was used to combine the location-specific cumulative associations of temperature with mortality [13,14,17].

In each scenario, we compared the relationship between mortality, temperature, and PM10 before imputation and after imputation to determine the impact of underreporting. Additionally, the minimum mortality temperature (MMT) of each location was identified based on the curves of temperatures with risk ratio (RR) of mortality, that is, the temperature corresponding to the lowest RR was MMT. The attributable fraction (AF) for mortality attributed to temperature used MMT as the centering value.

Additionally, we used MMT as the threshold to build threshold regression in DLNM [18]. The new DLNM equation are as follow:

E(Mortalityt)=α+β1Tt,l((TMMMT))+β2Tt,l((TMMMT)+)+δPt,l(PM10)+ns(RH,df)+ns(time,df)+γDOW

where (TMMMT) equals TM-MMT when TM < MMT and 0 otherwise, and (TMMMT)+ equals TM-MMT when TM > MMT and 0 otherwise. The excess risk (ER) with 1 °C increase in each location was reported.

2.6. Sensitivity analysis

To test the robustness of our findings, we performed analyses using alternative maximum lag periods of temperature for 5, 9, and 21 days. We also conducted sensitivity analysis by changing the df of time trend with 3 and 5 per year. We further performed the analysis by changing the PM10 as nonlinear with max lag = 5.

3. Result

Table 1 shows the summary of temperature, PM10, and daily mortality count by city. The total average daily mortality was 80.7. Geographically, the average mortality was highest in Wenzhou (120.0) and lowest in Zhuhai (5.3). The overall average daily temperature was 19.3 °C, and highest average daily temperature was in Zhuhai (23.3 °C). Additionally, the overall average PM10 was 70.4 μg/m3, and highest average PM10 was in Hangzhou (85.3 μg/m3).

Table 1.

Summary of temperature, PM10, and daily mortality rate (per 100,000) by location.a

Variables Mean (SD) P25 P50 P75
Daily Mortality
Total 80.7 (40.3) 61.0 84.0 109.0
Hangzhou 115.6 (21.7) 100.0 112.0 129.0
Ningbo 103.6 (18.7) 90.0 101.0 115.0
Wenzhou 120.0 (20.8) 106.0 118.0 132.0
Jinhua 84.2 (17.1) 71.0 81.0 94.0
Guangzhou 61.5 (13.1) 54.0 61.0 69.0
Zhuhai 5.3 (2.4) 4.0 5.0 7.0
Jiangmen 74.5 (16.4) 64.0 73.0 84.0
Temperature (◦C)
Overall 19.3 (7.7) 13.6 20.0 25.9
Hangzhou 17.3 (8.8) 9.7 18.3 24.4
Ningbo 17.5 (8.3) 10.3 18.4 24.3
Wenzhou 18.3 (7.6) 11.7 19.0 25.0
Jinhua 17.8 (8.6) 10.3 18.9 24.8
Guangzhou 22.1 (6.2) 17.3 23.6 27.4
Zhuhai 23.3 (5.7) 18.8 24.7 28.3
Jiangmen 22.9 (5.9) 18.3 24.5 27.9
PM10 (μg/m3)
Overall 70.4 (35.9) 44.6 62.6 87.7
Hangzhou 85.3 (39.8) 55.7 78.5 106.0
Ningbo 71.0 (40.7) 44.2 60.8 85.5
Wenzhou 76.5 (34.4) 51.6 68.8 95.8
Jinhua 79.5 (40.5) 51.3 70.0 99.1
Guangzhou 64.3 (26.8) 44.3 58.4 79.9
Zhuhai 56.1 (27.5) 35.5 50.0 71.2
Jiangmen 59.9 (27.7) 37.6 54.8 75.8
a

SD = Standard deviation, P25 = 25th percentile, P50 = 50th percentile, P75 = 75th percentile.

Fig. 1A presents the pooled impact of mortality underreporting at random on the association between daily ambient temperature and mortality. We observed no difference between random imputation for 10% of days, 50% of days, and 90% of days, as well as no imputation. Table 2 presents the pooled impact of mortality on the association between PM10 and daily mortality, and findings with underreporting at random were the same as Fig. 1A showing no difference for whether imputation was utilized. Additionally, mortality underreporting at random at the city-level had no impact as well (Fig. S1 and Table S1).

Fig. 1.

Fig. 1

Pooled influence of underreporting on the association between ambient temperature and daily mortality. (A) Underreporting at random. (B) Underreporting monotonically increasing or monotonically decreasing. (C) Underreporting due to holidays and weekends. (D) Underreporting occurred before the 20th day of each month, and added after the 20th day of the month. (E) Underreporting is monotonically increasing or monotonically decreasing, while holidays and weekends also produce underreporting.

Table 2.

Pooled influence of underreporting on the association between PM10 and daily mortality.a

Pooled RR (10 μg/m3)
Original 1.002
Scenario 1
Random Imputation 10% of Days 1.002
Random Imputation 50% of Days 1.002
Random Imputation 90% of Days 1.002
Scenario 2
MI Imputation 1.004
MD Imputation 1.001
Scenario 3
Holiday Imputation 0.999
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly 1.003
30% death removed after the 20th of each month, and add it to the 1-20th randomly 1.003
50% death removed after the 20th of each month, and add it to the 1-20th randomly 1.004
Scenario 5
Holiday + MI Imputation 0.999
Holiday + MD Imputation 0.999
a

RR = risk ratio, MI = monotonically increasing, MD = monotonically decreasing.

Fig. 1B presents the pooled impact of mortality underreporting MI and MD on the association between ambient temperature and daily mortality. We observed that the overall RR for MI imputation was higher than that for no imputation, and the overall RR for MD imputation was lower than that for no imputation. The results for underreporting MI and MD were similar to Table 2, having the highest overall RR for MI imputation and the lowest overall RR for MD imputation. Furthermore, the impact patterns of underreporting MI and MD were similar at the city-level (Fig. S2 and Table S1).

Fig. 1C shows the pooled impact of underreporting due to holidays and weekends on the association between ambient temperature and daily mortality. We observed that overall RR with imputation was lower than the RR without imputation at low temperatures. Additionally, as the temperature increased the overall RR with imputation became lower than the RR without imputation. We observed a decrease in overall RR with imputation and an increase in overall RR without imputation as PM10 increased (Table 2 scenario 3). Furthermore, the RR with and without imputation was also different at the city-level in the scenario of underreporting due to holidays and weekends (Fig. S3 and Table S1).

Fig. 1D shows that the pooled effect of underreporting occurs before the 20th day of each month, and these underreporting will be added after the 20th day of the month on the association between ambient temperature and daily mortality. There was no observable pattern in the results, and the RR with and without imputation were different. The observation for scenario 4 in Table 2 was similar to Fig. 1D. Additionally, there was no observable pattern at the city-level as well (Fig. S4 and Table S1).

Fig. 1E presents the pooled impact of underreporting due to holidays, weekends, MI, and MD on the association between ambient temperature and daily mortality. We observed that overall RR with imputation was lower than the RR without imputation at low temperatures, and RR with holiday and MD imputation was higher than RR with holiday and MI imputation. We observed a decrease in overall RR with imputation and an increase in overall RR without imputation as PM10 increased (Table 2 scenario 5). Similar patterns were observed at the city-level (Fig. S5 and Table S1).

Table 3 shows the MMTs overall and for each location under different imputation scenarios. We observed that in addition to imputation under UAR, the MMT for the other imputation scenarios have changed compared with the original MMT. However, we observed that the variation of MMT in the same imputation scenarios were inconsistent in different cities, which may be caused by the inconsistent distribution of mortality and temperature data. Furthermore, we observe that Hangzhou (−5.5) and Ningbo (−4.7) had negative MMT under Scenario 4.

Table 3.

The minimum mortality temperature (MMT) overall and for each location under different imputation scenarios.a

Hangzhou Ningbo Wenzhou Jinhua Guangzhou Zhuhai Jiangmen Pooled
Original 26.0 25.4 26.4 26.4 25.4 29.1 27.2 26.8
Scenario 1
Random Imputation 10% of Days 26.0 25.4 26.4 26.4 25.4 29.1 27.2 26.8
Random Imputation 50% of Days 26.0 25.4 26.4 26.4 25.4 29.1 27.2 26.8
Random Imputation 90% of Days 26.0 25.4 26.4 26.4 25.4 29.1 27.2 26.8
Scenario 2
MI Imputation 20.8 22.2 23.1 23.2 24.9 27.4 26.0 23.7
MD Imputation 10.2 11.7 12.0 27.3 28.7 29.0 28.8 27.8
Scenario 3
Holiday Imputation 11.1 8.9 9.2 26.5 25.5 29.0 27.8 26.6
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly 14.3 26.2 28.6 27.1 24.7 28.1 26.7 26.8
30% death removed after the 20th of each month, and add it to the 1-20th randomly −5.5 −4.7 31.3 31.5 3.5 33.0 26.2 −1.0
50% death removed after the 20th of each month, and add it to the 1-20th randomly −5.5 −4.7 31.3 34.6 3.5 33.0 25.9 −1.0
Scenario 5
Holiday + MI Imputation 10.3 8.1 8.8 13.7 25.7 29.1 28.1 26.8
Holiday + MD Imputation 12.2 10.2 25.2 26.3 25.3 28.9 27.1 26.4
a

MI = monotonically increasing, MD = monotonically decreasing.

Table 4 shows the AF for mortality attributable to temperature overall and for each location under different imputation scenarios using MMT as a centering value. Similar to Table 3, AF from imputation under UAR was unchanged compared with the original AF. In Scenario 2, except Zhuhai, AF from MI imputation was higher than AF from MD imputation, though both were greater than AF without imputation. In Scenario 4, AF increased as more percentages of deaths were removed from the 20th of each month and added randomly to the 1–20th. In the rest of the scenarios, we did not find a certain pattern.

Table 4.

The attributable fraction (AF%) of mortality attributed to temperature overall and for each location under different imputation scenarios.a

Hangzhou Ningbo Wenzhou Jinhua Guangzhou Zhuhai Jiangmen Pooled
Original 4.4 3.9 6.6 10.3 4.0 18.0 8.9 7.7
Scenario 1
Random Imputation 10% of Days 4.4 3.9 6.6 10.3 4.0 18.0 8.9 7.7
Random Imputation 50% of Days 4.4 3.9 6.6 10.3 4.0 18.0 8.9 7.7
Random Imputation 90% of Days 4.4 3.9 6.6 10.3 4.0 18.0 8.9 7.7
Scenario 2
MI Imputation 10.7 13.7 14.0 13.4 12.2 18.0 16.3 13.9
MD Imputation 7.9 10.6 7.2 12.0 8.1 21.2 13.2 11.2
Scenario 3
Holiday Imputation 4.6 4.7 2.9 6.0 3.3 18.4 9.2 6.6
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly 5.7 4.8 7.9 9.3 6.2 17.2 10.7 8.6
30% death removed after the 20th of each month, and add it to the 1-20th randomly 10.7 16.3 24.0 10.8 12.9 24.5 14.3 15.9
50% death removed after the 20th of each month, and add it to the 1-20th randomly 33.4 32.9 38.2 30.4 32.8 40.6 17.9 32.0
Scenario 5
Holiday + MI Imputation 5.4 6.5 4.5 5.3 2.1 19.3 9.0 7.1
Holiday + MD Imputation 3.7 3.1 2.7 7.0 4.5 17.3 9.7 6.5
a

MI = monotonically increasing, MD = monotonically decreasing.

Table 5 presents the AF for mortality attributable to PM10 overall and for each location under different imputation scenarios using 0 as centering value. We observed that AF from imputation under UAR was unchanged compared with the original AF. However, unlike Table 4, we did not find a pattern in scenarios 2 through 5.

Table 5.

The attributable fraction (AF%) of mortality attributed to PM10 (10 μg/m3) overall and for each location under different imputation scenarios.a

Hangzhou Ningbo Wenzhou Jinhua Guangzhou Zhuhai Jiangmen Pooled
Original 1.9 1.3 0.4 2.2 3.1 2.2 3.3 2.0
Scenario 1
Random Imputation 10% of Days 1.9 1.3 0.4 2.2 3.1 2.2 3.3 2.0
Random Imputation 50% of Days 1.9 1.3 0.4 2.2 3.1 2.2 3.3 2.0
Random Imputation 90% of Days 1.9 1.3 0.4 2.2 3.1 2.2 3.3 2.0
Scenario 2
MI Imputation 4.0 2.9 1.6 3.4 2.6 1.7 2.4 2.7
MD Imputation 0.1 −0.1 −0.7 1.2 3.3 2.8 3.9 1.4
Scenario 3
Holiday Imputation −0.6 −0.9 −1.8 0.1 −1.4 −1.3 −0.8 −0.9
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly 2.8 1.6 −0.1 2.8 2.1 3.6 2.7 2.2
30% death removed after the 20th of each month, and add it to the 1-20th randomly 4.4 2.2 −1.1 4.1 0.5 2.7 1.5 2.1
50% death removed after the 20th of each month, and add it to the 1-20th randomly 6.0 2.7 −2.1 5.2 −1.2 4.0 0.3 2.2
Scenario 5
Holiday + MI Imputation −1.0 −1.1 −2.1 −0.1 −1.3 −1.1 −0.7 −1.0
Holiday + MD Imputation −0.1 −0.7 −1.5 0.2 −1.6 −1.5 −1.0 −0.8
a

MI = monotonically increasing, MD = monotonically decreasing.

Table 6 presents the association between mortality ER and 1 °C increase overall and in each city. We observed that the pooled ER with 1 °C increase below MMT was negatively associated with mortality, and the pooled ER with 1 °C increase above MMT was positively associated with mortality, except scenario 4. Except for the first scenario, the ER of each city in other scenarios were different than the original ER, with no observable patterns between them. In addition, after MD imputation in Jinhua, mortality changed by 9.4% for every 1 °C increase in average temperature, almost doubling compared to the original ER (see Table 7).

Table 6.

Percent change of mortality risk per 1 °C increase of average temperature by city under different imputation scenarios.a

Hangzhou
Ningbo
Wenzhou
Jinhua
Cold effect Hot effect Cold effect Hot effect Cold effect Hot effect Cold effect Hot effect
Original −1.6 1.0 −0.4 2.5 −0.8 1.5 −1.8 4.7
Scenario 1
Random Imputation 10% of Days −1.6 1.0 −0.4 2.5 −0.8 1.5 −1.8 4.7
Random Imputation 50% of Days −1.6 1.0 −0.4 2.5 −0.8 1.5 −1.8 4.7
Random Imputation 90% of Days −1.6 1.0 −0.4 2.5 −0.8 1.5 −1.8 4.7
Scenario 2
MI Imputation −2.2 1.9 −1.4 1.8 −1.7 0.8 −2.4 2.0
MD Imputation −2.6 1.8 −1.7 2.5 −2.4 1.5 −1.5 9.4
Scenario 3
Holiday Imputation −1.4 0.9 −0.9 1.1 −1.7 0.6 −1.0 5.1
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly −1.6 0.6 0.4 1.5 −0.7 1.3 −1.6 3.1
30% death removed after the 20th of each month, and add it to the 1-20th randomly −0.9 −0.7 −0.9 −1.4 3.2
50% death removed after the 20th of each month, and add it to the 1-20th randomly −1.2 −10.2 −1.1 −2.0
Scenario 5
Holiday + MI Imputation −0.8 1.3 0.3 3.3 0.03 2.6 −0.9 5.1
Holiday + MD Imputation −1.2 0.9 −0.02 2.6 −0.3 2.2 −1.2 5.0

Note: No excess rate due to Minimum mortality temperature equal to maximum of temperature or minimal of temperature.

a

MI = monotonically increasing, MD = monotonically decreasing.

Table 7.

Percent change of mortality risk per 1 °C increase of average temperature by city under different imputation scenarios (Continue).a

Guangzhou
Zhuhai
Jiangmen
Pooled
Cold effect Hot effect Cold effect Hot effect Cold effect Hot effect Cold effect Hot effect
Original −0.8 0.8 −3.6 2.5 −1.9 3.9 −1.3 2.3
Scenario 1
Random Imputation 10% of Days −0.8 0.8 −3.6 2.5 −1.9 3.9 −1.3 2.3
Random Imputation 50% of Days −0.8 0.8 −3.6 2.5 −1.9 3.9 −1.3 2.3
Random Imputation 90% of Days −0.8 0.8 −3.6 2.5 −1.9 3.9 −1.3 2.3
Scenario 2
MI Imputation −2.0 3.6 −5.3 1.9 −3.2 6.5 −2.2 2.7
MD Imputation −0.3 1.9 −2.6 −0.2 −1.2 5.3 −1.8 3.5
Scenario 3
Holiday Imputation −0.1 1.0 −2.8 1.7 −1.1 4.7 −1.1 1.6
Scenario 4
10% death removed after the 20th of each month, and add it to the 1-20th randomly −1.2 1.4 −4.3 0.8 −2.1 3.5 −1.4 1.7
30% death removed after the 20th of each month, and add it to the 1-20th randomly −1.0 −3.5 −2.7 3.8 −0.2
50% death removed after the 20th of each month, and add it to the 1-20th randomly −1.3 −3.1 −3.4 4.3 −0.9
Scenario 5
Holiday + MI Imputation 0.03 0.8 −2.6 1.6 −0.8 3.0 −1.2 1.3
Holiday + MD Imputation −0.3 1.3 −2.9 3.8 −1.3 3.9 −1.0 1.2

Note: No excess rate due to Minimum mortality temperature equal to maximum of temperature or minimal of temperature.

a

MI = monotonically increasing, MD = monotonically decreasing.

For the sensitivity analysis, a similar pattern was observed, except for UAR. The RR with and without imputation was different by using maximum temperature lags of 5, 9, and 21 days, or by changing the df of the time trend to 3 and 5 days per year, or by changing the PM10 as nonlinear with max lag = 5 (Figs. S6–S11).

4. Discussion

Although a large number of studies have evaluated the impact of missing data on results, the effect of underreporting is often overlooked, thereby, adversely impacting study findings [10,[19], [20], [21]]. Missing data usually refers to the absence of independent variables, but data underreporting can be either independent variables or dependent variables, such as respondents providing incorrect responses to a questionnaire and daily mortality cannot be fully counted [22]. Mortality surveillance data are often underreported, especially in developing countries [23]. Failing to correct mortality underreporting may lead to misinformed decision making. In the present study, we used mortality data, temperature, and PM10 to discuss the impact of five different mortality underreporting scenarios on the association between temperature/PM10 exposure and mortality.

First, we observed that UAR had little effect on the association between temperature or PM10 and mortality. This is consistent with previous studies where the estimated parameters are not biased by data missing in the case of missing-complete-at-random (MCAR) [24]. This is potentially due to the analysis remains unbiased in the case of UAR [24]. Unlike data MCAR, which can be tested using little's test or t-test [25], there is no specific way to detect that data underreporting belong to UAR. Since the UAR had no effect on the findings, we can then analyze the data without bias after we excluded the fact that the data was not UNAR.

Second, we observed that any of the UNAR scenarios mentioned above had varying degrees of influence on the association between temperature, PM10, and mortality, as well as on MMT, AF and ER. As with missing not at random (MNAR) data, data with UNAR can lead to biased estimates of parameters [24]. The only way to obtain an unbiased estimate of the parameters in such a case is to impute the underreporting data. However, data underreporting is not like data missing, where data underreporting has values, but the value is not recorded correctly. Therefore, it is often not feasible to deal with data underreporting by using methods for dealing with missing data, such as regression imputation, last observation carried forward, maximum likelihood [[26], [27], [28]]. There is no easy solution for dealing with underreporting data, especially when data are not UAR. For instance, Wang and colleagues (2011) reported that the total rate of mortality underreporting in China from 2006 to 2008 was 16.68%, including 16.08% mortality underreporting in urban areas and 18.14% mortality underreporting in rural areas [29]. Based on our hypothetical scenarios, the results analyzed using underreported data may be underestimated, overestimated, or even the opposite. Therefore, we recommend that the integrity of the mortality surveillance system be regularly assessed in time series study [30].

In low-resource settings with limited diagnostic capacity and barriers to accessing healthcare, data for a large proportion of deaths are frequently underreported [31], and these underreporting are often UNAR. For instance, an estimated 50,000 people die each year from yellow fever in Africa, with less than 1% being recorded [32]. Underreporting not only occurs in resource-poor countries, but is also common in developed countries. For instance, according to a United States (U.S.) Centers for Disease Control and Prevention report, the primary reason for coronavirus disease 2019 (COVID-19) case underreporting was lack of testing and relying solely on official mortality statistics, thereby resulting in an incomplete and biased understanding of its impact [33]. A study by Silva and colleagues (2020) suggested that significant underreporting of COVID-19 deaths prevented the Brazilian government from taking more effective action to reduce the spread [34]. Therefore, it is crucial to check the data for underreporting prior to analysis.

4.1. Strengths and limitation

To the best of our knowledge, this is the first study to describe in detail the impact of underreporting on the results in time series study of environmental epidemiology. Our findings imply that ignoring possible mortality underreporting in assessing the association of meteorological factors or air pollutants with health outcome can lead to false results. Further, we employed a large data that covered diverse populations from seven cities in China. A data-driven approach ensures that conclusions are supported by sets of factual information, and not only theory. However, this study has limitations that must be noted. We only simulated four relatively simple single underreporting scenario, and one double underreporting scenario. Underreporting is a complex circumstance, and data may be accompanied by many different situations, such as underreporting due to holiday and system error occurred at the same time. Additionally, we only considered the underreporting of outcome variables, but not the underreporting of independent variables, such as measurement errors of temperature and PM10. Further, when MMT is the maximum or minimum temperature, there will be no hot or cold effect if using MMT as the threshold. Finally, we only used DLNM to evaluate the impact of underreporting, which may ignore the different impacts of other methods on underreporting.

5. Conclusion

The present study described the effect of five different mortality underreporting scenarios on the relationship between temperature exposure, PM10, and mortality. Our results revealed that UAR had little effect on the association between temperature and mortality, but the scenarios of UNAR affected the association between temperature and mortality. Therefore, potential underreporting should be dealt with before analyzing the data to avoid drawing invalid conclusions.

Author contribution statement

Wenjun Ma: Conceived and designed the experiments; Wrote the paper.

Ziqiang Lin: Conceived and designed the experiments; Analyzed and interpreted the data; Wrote the paper.

Tao Liu: Conceived and designed the experiments; Contributed reagents, materials, analysis tools or data.

Guanhao He: Conceived and designed the experiments.

Weiwei Gong, Lifeng Lin, Jianxiong Hu, Sui Zhu, Ruilin Meng, Xiaojun Xu, Jieming Zhong, Min Yu: Performed the experiments; Contributed reagents, materials, analysis tools or data.

Wayne Lawrence: Analyzed and interpreted the data; Wrote the paper.

Karin Reinhold: Contributed reagents, materials, analysis tools or data; Wrote the paper.

Funding statement

This work was supported by the National Key Research and Development Program of China[2018YFA0606200] and National Natural Science Foundation of China[42075173].

Data availability statement

The authors do not have permission to share data.

Acknowledgements

We gratefully acknowledged the contribution of all participants of the present research.

Footnotes

Appendix A

Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2023.e14648.

Appendix A. Supplementary data

The following is the Supplementary data to this article.

Multimedia component 1
mmc1.docx (255.1KB, docx)

References

  • 1.Gong W., Li X., Zhou M., Zhou C., Xiao Y., Huang B., Lin L., Hu J., Xiao J., Zeng W. Mortality burden attributable to temperature variability in China. J. Expo. Sci. Environ. Epidemiol. 2022:1–7. doi: 10.1038/s41370-022-00424-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Liu T., Jiang Y., Hu J., Li Z., Guo Y., Li X., Xiao J., Yuan L., He G., Zeng W. Association of ambient PM1 with hospital admission and recurrence of stroke in China. Sci. Total Environ. 2022;828 doi: 10.1016/j.scitotenv.2022.154131. [DOI] [PubMed] [Google Scholar]
  • 3.Xiao J., Dai J., Hu J., Liu T., Gong D., Li X., Kang M., Zhou Y., Li Y., Quan Y. Co-benefits of nonpharmaceutical intervention against COVID-19 on infectious diseases in China: a large population-based observational study. Lancet Reg. Health-Western Pacific. 2021;17 doi: 10.1016/j.lanwpc.2021.100282. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Bell M.L., Fiero M., Horton N.J., Hsu C.-H. Handling missing data in RCTs; a review of the top medical journals. BMC Med. Res. Methodol. 2014;14(1):1–8. doi: 10.1186/1471-2288-14-118. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Eekhout I., de Boer R.M., Twisk J.W., de Vet H.C., Heymans M.W. Missing data: a systematic review of how they are reported and handled. Epidemiology. 2012;23(5):729–732. doi: 10.1097/ede.0b013e3182576cdb. [DOI] [PubMed] [Google Scholar]
  • 6.Perkins N.J., Cole S.R., Harel O., Tchetgen Tchetgen E.J., Sun B., Mitchell E.M., Schisterman E.F. Principled approaches to missing data in epidemiologic studies. Am. J. Epidemiol. 2018;187(3):568–575. doi: 10.1093/aje/kwx348. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Sterne J.A., White I.R., Carlin J.B., Spratt M., Royston P., Kenward M.G., Wood A.M., Carpenter J.R. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338 doi: 10.1136/bmj.b2393. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Stavseth M.R., Clausen T., Røislien J. How handling missing data may impact conclusions: a comparison of six different imputation methods for categorical questionnaire data. SAGE Open Med. 2019;7 doi: 10.1177/2050312118822912. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Leek J.T., Peng R.D. Opinion: reproducible research can still be wrong: adopting a prevention approach. Proc. Natl. Acad. Sci. USA. 2015;112(6):1645–1646. doi: 10.1073/pnas.1421412111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Hughes R.A., Heron J., Sterne J.A., Tilling K. Accounting for missing data in statistical analyses: multiple imputation is not always the answer. Int. J. Epidemiol. 2019;48(4):1294–1304. doi: 10.1093/ije/dyz032. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Little R.J., Rubin D.B. The analysis of social science data with missing values. Socio. Methods Res. 1989;18(2–3):292–326. doi: 10.1177/0049124189018002004. [DOI] [Google Scholar]
  • 12.Little R.J., Rubin D.B. vol. 793. John Wiley & Sons; 2019. (Statistical Analysis with Missing Data). [Google Scholar]
  • 13.Gasparrini A., Armstrong B., Kenward M.G. Distributed lag non‐linear models. Stat. Med. 2010;29(21):2224–2234. doi: 10.1002/sim.3940. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gasparrini A., Armstrong B., Kenward M.G. Multivariate meta‐analysis for non‐linear and other multi‐parameter associations. Stat. Med. 2012;31(29):3821–3839. doi: 10.1002/sim.5471. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Chen R., Yin P., Wang L., Liu C., Niu Y., Wang W., Jiang Y., Liu Y., Liu J., Qi J. Association between ambient temperature and mortality risk and burden: time series study in 272 main Chinese cities. BMJ. 2018;363:k4306. doi: 10.1136/bmj.k4306. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Hu J., Zhou M., Qin M., Tong S., Hou Z., Xu Y., Zhou C., Xiao Y., Yu M., Huang B. Long-term exposure to ambient temperature and mortality risk in China: a nationwide study using the difference-in-differences design. Environ. Pollut. 2022;292 doi: 10.1016/j.envpol.2021.118392. [DOI] [PubMed] [Google Scholar]
  • 17.Liu T., Zhou C., Zhang H., Huang B., Xu Y., Lin L., Wang L., Hu R., Hou Z., Xiao Y. Ambient temperature and years of life lost: a national study in China. Innovation. 2021;2(1) doi: 10.1016/j.xinn.2020.100072. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Fong Y., Huang Y., Gilbert P.B., Permar S.R. chngpt: threshold regression model estimation and inference. BMC Bioinf. 2017;18(1):1–7. doi: 10.1186/s12859-017-1863-x. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Allison P.D. Sage publications; 2001. Missing Data. [Google Scholar]
  • 20.Austin P.C., White I.R., Lee D.S., van Buuren S. Missing data in clinical research: a tutorial on multiple imputation. Can. J. Cardiol. 2021;37(9):1322–1331. doi: 10.1016/j.cjca.2020.11.010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Baraldi A.N., Enders C.K. An introduction to modern missing data analyses. J. Sch. Psychol. 2010;48(1):5–37. doi: 10.1016/j.jsp.2009.10.001. [DOI] [PubMed] [Google Scholar]
  • 22.Sechidis K., Sperrin M., Petherick E.S., Luján M., Brown G. Dealing with under-reported variables: an information theoretic solution. Int. J. Approx. Reason. 2017;85:159–177. doi: 10.1016/j.ijar.2017.04.002. [DOI] [Google Scholar]
  • 23.de Oliveira G.L., Loschi R.H., Assunção R.M. A random‐censoring Poisson model for underreported data. Stat. Med. 2017;36(30):4873–4892. doi: 10.1002/sim.7456. [DOI] [PubMed] [Google Scholar]
  • 24.Kang H. The prevention and handling of the missing data. Kor. J. Anesthesiol. 2013;64(5):402. doi: 10.4097/kjae.2013.64.5.402. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li C. Little's test of missing completely at random. STATA J. 2013;13(4):795–809. [Google Scholar]
  • 26.Burgette L.F., Reiter J.P. Multiple imputation for missing data via sequential regression trees. Am. J. Epidemiol. 2010;172(9):1070–1076. doi: 10.1093/aje/kwq260. [DOI] [PubMed] [Google Scholar]
  • 27.Enders C.K., Bandalos D.L. The relative performance of full information maximum likelihood estimation for missing data in structural equation models. Struct. Equ. Model. 2001;8(3):430–457. https://psycnet.apa.org/doi/10.1207/S15328007SEM0803_5 [Google Scholar]
  • 28.Hedeker D., Mermelstein R.J., Demirtas H. Analysis of binary outcomes with missing data: missing= smoking, last observation carried forward, and a little multiple imputation. Addiction. 2007;102(10):1564–1573. doi: 10.1111/j.1360-0443.2007.01946.x. [DOI] [PubMed] [Google Scholar]
  • 29.Wang L., Wang L., Cai Y., Ma L., Zhou M. Analysis of under-reporting of mortality surveillance from 2006 to 2008 in China. Zhonghua Yufang Yixue Zazhi. 2011;45(12):1061–1064. doi: 10.3760/cma.j.issn.0253-9624.2011.12.002. [DOI] [PubMed] [Google Scholar]
  • 30.Guo K., Yin P., Wang L., Ji Y., Li Q., Bishai D., Liu S., Liu Y., Astell-Burt T., Feng X. Propensity score weighting for addressing under-reporting in mortality surveillance: a proof-of-concept study using the nationally representative mortality data in China. Popul. Health Metrics. 2015;13(1):1–11. doi: 10.1186/s12963-015-0051-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Caliendo A.M., Gilbert D.N., Ginocchio C.C., Hanson K.E., May L., Quinn T.C., Tenover F.C., Alland D., Blaschke A.J., Bonomo R.A. Better tests, better care: improved diagnostics for infectious diseases. Clin. Infect. Dis. 2013;57(suppl_3):S139–S170. doi: 10.1093/cid/cit578. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Garske T., Van Kerkhove M.D., Yactayo S., Ronveaux O., Lewis R.F., Staples J.E., Perea W., Ferguson N.M., Committee Y.F.E. Yellow fever in Africa: estimating the burden of disease and impact of mass vaccination from outbreak and serological data. PLoS Med. 2014;11(5):e1001638. doi: 10.1371/journal.pmed.1001638. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Whittaker C., Walker P.G., Alhaffar M., Hamlet A., Djaafara B.A., Ghani A., Ferguson N., Dahab M., Checchi F., Watson O.J. Under-reporting of deaths limits our understanding of true burden of covid-19. BMJ. 2021;375 doi: 10.1136/bmj.n2239. [DOI] [PubMed] [Google Scholar]
  • 34.e Silva L.V., de Andrade Abi M.D.P., Dos Santos A.M.T.B., de Mattos Teixeira C.A., Gomes V.H.M., Cardoso E.H.S., da Silva M.S., Vijaykumar N., Carvalho S.V., Frances C.R.L. COVID-19 mortality underreporting in Brazil: analysis of data from government internet portals. J. Med. Internet Res. 2020;22(8) doi: 10.2196/21413. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1
mmc1.docx (255.1KB, docx)

Data Availability Statement

The authors do not have permission to share data.


Articles from Heliyon are provided here courtesy of Elsevier

RESOURCES