Variability analysis in PM2.5 monitoring

Rashmi Bhardwaj; Dimple Pruthi

doi:10.1016/j.dib.2019.103774

. 2019 Mar 14;24:103774. doi: 10.1016/j.dib.2019.103774

Variability analysis in PM2.5 monitoring

Rashmi Bhardwaj ^1,^∗, Dimple Pruthi ¹

PMCID: PMC6468191 PMID: 31016211

Abstract

United States Environmental Protection Agency (US EPA) and Central Pollution Control Board (CPCB) are two major air quality monitoring agencies in India that measure the concentration of particulate matter of size up to 2.5 μm (PM2.5). PM2.5 study over southern Asia has significance from the environment and ecosystem viewpoint (Abdullah et al.,2007; Dockery and Stone, 2007). In order to raise alert and controlling of pollutants, not only forecasting but the accuracy of forecasting has attracted attentions from various departments of research and air quality monitoring agencies. Quest for reducing error in forecasting has never come to pause. The precursor in forecasting is data monitoring. Keeping in focus the initial phase of data analysis, PM2.5 concentration was collected from both agencies within an area of radius 3.1 miles for the year 2016. Using the data, variability analysis is carried out for the efficiency of vital environment protection agencies.

Specifications table

Subject area	Mathematics
More specific subject area	Statistics
Type of data	Table
How data was acquired	www.cpcb.nic.in,www.epa.gov
Data format	Raw, processed and analyzed.
Experimental factors	Investigation of monitoring variability and forecasting
Experimental features	Air quality Index assessment of PM2.5 monitored by US EPA and CPCB
Data source location	R. K. Puram, India
Data accessibility	www.cpcb.nic.in, www.epa.gov
Related research article	L. Zhang, J. Lin, R. Qiu, X. Hu, H. Zhang, Q. Chen, H. Tan, D. Lin, J. Wang, Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model, Ecol. Indic. 95 (2018) 702–710.

Open in a new tab

Value of the data

•
The dataset used in this article reflects the variability in monitoring by United States Environmental Protection Agency and Central Pollution Control Board.
•
Air Quality Index calculated using data gives status of air we breathe in.
•
The dataset will help to determine the effect of fine particles.
•
The information contained in this article can be used to assess environment impact.
•
The information provided can form the basis for issuing health advisory.

Open in a new tab

1. Data

The daily concentration of fine particulate matter monitored by US EPA and CPCB from 1st January 2016 to 19th December 2016 has been taken for the present study. The data has been obtained from http://cpcb.nic.in/ and https://www.epa.gov/ for CPCB and US EPA respectively with time series in Fig. 1 and descriptive statistics shown in Table 1 [1], [3].

Fig. 1 — Daily variation of the US EPA and CPCB monitored PM2.5 values.

Table 1.

Description of statistical parameters

	US EPA	CPCB
Mean	122.1859	135.9284
Median	84.16667	112.4575
Standard Deviation	110.502	99.11627
Kurtosis	9.121813	6.596559
Coefficient of Variation	9.04376e-1	7.2918e-1
Skewness	2.461981	2.036923

Open in a new tab

1.1. Study area

The U.S Embassy and Consulates manage airborne fine particulate matter monitoring. PM2.5 is a standard recognized by US EPA and permit to examine against U.S. standard measures [5]. US EPA covers Chanakyapuri area in Delhi. Central Pollution Control Board (CPCB) of India is the apex organization in country for monitoring pollution [6]. One of its monitoring stations is at RK Puram, Delhi. RK Puram and Chanakyapuri are 3.1 miles away. New Delhi, the capital of India and has Latitude, longitude coordinates as 28.644800, 77.216721 respectively. Chanakyapuri in Delhi has Latitude, longitude coordinates as 28.593853, 77.188736 and RK Puram as 28.566008 and 77.176743 respectively.

2. Experimental design, materials and methods

The study is divided into two sections of statistical and predictive analysis. Descriptive and inferential statistics are a vital part of data analysis. Analyzing data includes studying the statistics of data. The descriptive analysis describes big data using different measurements as indicated in Table 1.

A further step is to observe if there is any significant relation between the data considered. The correlation coefficient is a measure of the strength of the linear relationship between two such variables and is calculated as

r_{u v} = \frac{\sum u_{i} v_{i} - m \bar{u} \bar{v}}{(n - 1) δ_{u} δ_{v}} = \frac{m \sum u_{i} v_{i} - \sum u_{i} \sum v_{i}}{\sqrt{m \sum u_{i}^{2} - {(\sum u_{i})}^{2}} \sqrt{m \sum v_{i}^{2} - {(\sum v_{i})}^{2}}}

(1)

r_uv lies between −1 and +1 inclusive as discussed in Bhardwaj and Pruthi, 2016 [2]. The value of Pearson correlation is 0.933 significant at 0.01 level proving the reliability of data observed.

Location and scale (estimated normal distribution parameters) of US EPA and CPCB data sets for unweighted cases using Blom's proportion estimation formula is calculated in Table 2. The probability-probability plot in Fig. 2 depicts deviation from the normal distribution. Location and scale values show a persistent trend for the data sets of PM2.5.

Table 2.

Estimated Distribution Parameters of PM2.5 (US EPA and CPCB)

		US EPA	CPCB
Normal Distribution	Location	122.1859	135.9284
Normal Distribution	Scale	110.50204	99.11627

Open in a new tab

The cases are unweighted.

Fig. 2 — P–P plot and Q–Q plot of US EPA and CPCB PM2.5 concentrations.

The extensively used t-test is carried out to analyze the difference between data monitored by USEPA and CPCB. Null hypothesis, H₀: No mean difference between USEPA and CPCB monitored data i.e. $μ_{U S E P A} - μ_{C P C B} = 0$ and alternative hypothesis, $H_{a} : μ_{U S E P A} - μ_{C P C B} \neq 0 .$

t = \frac{μ_{U S E P A} - μ_{C P C B}}{\sqrt{\frac{S D_{U S E P A}^{2} + S D_{C P C B}^{2}}{N}}}

(2)

Calculated t-value is compared to t-value corresponding to degree of freedom (see Table 3):

d f = \frac{{(\frac{S D_{U S E P A}^{2} + S D_{C P C B}^{2}}{N})}^{2}}{\frac{1}{N - 1} ({(\frac{S D_{U S E P A}^{2} + S D_{C P C B}^{2}}{N})}^{2} - \frac{2 S D_{U S E P A} S D_{C P C B}}{N^{2}})}

(3)

where, $μ$ represents mean and SD standard deviation. The F-test statistic using one-way ANOVA is evaluated to emphasize the mean difference between two datasets (Table 4).

F = \frac{regression mean square (M S R)}{mean square error (M S E)}

(4)

where, MSE is error sum of squares divided by df associated and MSR is regression sum of squares.

Table 3.

t-test for PM2.5 (US EPA and CPCB)

Sample	N	Mean	Std. Deviation	Std. Error Mean	T	DF	p-value
USEPA	357	122.1859	110.502	5.84839	6.479	356	0.000
CPCB	357	135.9284	99.11627	5.24579	6.479	356	0.000

Open in a new tab

Table 4.

One-way ANOVA for PM2.5 (US EPA and CPCB)

Source of Variation	Df	SS	MS	F	p-value
Model	1	32382.37	32382.37	2.97299	0.00
Error	710	7733453.143	10892.19	2.97299	0.00

Open in a new tab

ANOVA and t-test sufficiently emphasized the significant difference between PM2.5 data monitored by two major agencies US EPA and CPCB.

2.1. Auto regressive integrated moving average

The preferred method in time series modeling is Box-Jenkins. Box-Jenkins method constitutes three components “AR,” “I,” or “MA” thus resulting in ARIMA. An ARIMA model can be expressed as

(1 - \sum_{i}^{p} θ_{i} L^{i}) {(1 - L)}^{d} y_{t} = (1 + \sum_{i}^{q} ϕ_{i} L^{i}) ϵ_{t}

(5)

The above equations were fitted to PM2.5 data. An approach of identification, estimation and diagnostic is carried out for ARIMA modeling. It defines large-scale variation in behavior of stationary time series. ARIMA is build upon present and past values of response and residuals. The main steps of ARIMA methods are: • Identification – examining the data along with calculation and drawing a graph of auto-correlation and partial auto-correlation functions. The smallest values of parameters are sought. When the value is 0, corresponding AR or MA component is not requisite in the respective model. ‘I’ component in ARIMA (trend) is examined first. The objective is to determine whether the process is stationary (d = 0) or not. If not than it has to be transformed into such. No connection between every two sequential observations implies p = 0. • Constructing models and estimating its parameters [7]. • Diagnostics and selection of model – the residuals and the quality of approximation of the model are examined. Theoretically, it is assumed that residuals are random and normally distributed. • Application of the predictive model, forecasts, analysis of dependencies, and study problem-solving capabilities [4].

Using ARIMA algorithm, PM2.5 is forecasted. Table 5, Table 6, Table 7 summarizes the output of fitted ARIMA Model.

Table 5.

ARIMA model

	ARIMA Model	Coefficients
	ARIMA Model	AR1	MA1	MA2	MA3
US EPA	(1,0,3)	0.982	0.142	0.236	0.22
CPCB	(1,0,2)	0.994	0.237	0.247

Open in a new tab

Table 6.

Forecasted values

Date	US EPA				CPCB
Date	Observed	Forecasted	LCL	UCL	Observed	Forecasted	LCL	UCL
20/12/2016	122.08	146.74	41.54	251.93	128.99	177.95	86.85	269.05
21/12/2016	163.34	161.8	24.43	299.17	171.13	192.67	78.39	306.96
22/12/2016	181.08	160.54	9.88	311.19	197.02	191.53	68.28	314.77

Open in a new tab

LCL and UCL stand for lower and upper confidence limits respectively.

Table 7.

R-squared values

R-squared		PM2.5 Forecasted
R-squared		USEPA	CPCB
PM2.5 Observed	USEPA	0.9842	0.8520
PM2.5 Observed	CPCB	0.8656	0.9768

Open in a new tab

To check the accuracy of modeling following statistics are used:

R^{2} = 1 - \frac{sum of squared error of model}{sum of squared error of baseline model}

(6)

2.2. Air quality index

Air Quality Index is not just a number signifying the quality of air but also explaining what we are inhaling. AQI was introduced in 1968. The objective was to aware public about deteriorating air quality and raise alarm in order to take precautionary measures. AQI is calculated using www.cpcb.nic.in. and www.epa.gov. To focus on the effects of monitoring variability air quality index is calculated. AQI is divided into the following categories:

In Fig. 3 AQI is represented for those days in which they fall in different categories. The first and second bar in Fig. 3 represents calculated AQI corresponding to USEPA and CPCB data respectively. In a short span of 354 days AQI falls in different categories for 58 days.

2.3. Concluding remark

It is noted that PM2.5 monitored by US EPA and CPCB show a significant difference. Mathematically, US EPA measured PM2.5 data can be formed from CPCB by adding 13.7425 ± 7.856331729 and vice-versa by subtracting. AQI calculated fall in different categories as per National Standards. This might have lead to the issuance of wrong public health advisories in the past and if this difference is not observed as carried out in the present study, it may have a significant adverse impact on human health in future as well.

Acknowledgements

The authors are thankful to Guru Gobind Singh Indraprastha University, Delhi, India for providing research facilities and financial support.

Footnotes

Transparency document associated with this article can be found in the online version at https://doi.org/10.1016/j.dib.2019.103774.

Transparency document

The following is the transparency document related to this article:

Multimedia component 1

mmc1.doc^{(26KB, doc)}

References

1.Abdullah L.C., Wong L.L., Saari M., Salmiaton A., Abdul Rashid M.S. Particulate matter dispersion and haze occurrence potential studies at a local palm oil mill. Int. J. Environ. Sci. Technol. 2007;4(2):271–278. [Google Scholar]
2.Bhardwaj R., Pruthi D. Predictability and wavelet analysis of air pollutants for commercial and industrial regions in Delhi. Indian J. Ind. Appl. Math. 2016;7(2):165–174. [Google Scholar]
3.Tiwari S., Chate D.M., Srivastaua A.K., Bisht D.S., Padmanabhamurty B. Assessments of PM1, PM2. 5 and PM10 concentrations in Delhi at different mean cycles. Geofizika. 2012;29(2):125–141. [Google Scholar]
4.Stoimenova M.P. Stochastic modeling of problematic air pollution with particulate matter in the city of pernik, Bulgaria. Ecol. Balk. 2016;8(2):33–41. [Google Scholar]
5.Dockery D.W., Stone P.H. Cardiovascular risks from fine particulate air pollution. N. Engl. J. Med. 2007;356:511–513. doi: 10.1056/NEJMe068274. [DOI] [PubMed] [Google Scholar]
6.CPCB Central Pollution Control Board . 2001. Air quality in Delhi (1989–2000), National Ambient Air Quality Monitoring Series-NAAQMS/17/2000-2001, Parivesh Bhawan, Delhi, India. [Google Scholar]
7.Zhang L., Lin J., Qiu R., Hu X., Zhang H., Chen Q., Tan H., Lin D., Wang J. Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol. Indic. 2018;95:702–710. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Multimedia component 1

mmc1.doc^{(26KB, doc)}

[bib1] 1.Abdullah L.C., Wong L.L., Saari M., Salmiaton A., Abdul Rashid M.S. Particulate matter dispersion and haze occurrence potential studies at a local palm oil mill. Int. J. Environ. Sci. Technol. 2007;4(2):271–278. [Google Scholar]

[bib2] 2.Bhardwaj R., Pruthi D. Predictability and wavelet analysis of air pollutants for commercial and industrial regions in Delhi. Indian J. Ind. Appl. Math. 2016;7(2):165–174. [Google Scholar]

[bib3] 3.Tiwari S., Chate D.M., Srivastaua A.K., Bisht D.S., Padmanabhamurty B. Assessments of PM1, PM2. 5 and PM10 concentrations in Delhi at different mean cycles. Geofizika. 2012;29(2):125–141. [Google Scholar]

[bib4] 4.Stoimenova M.P. Stochastic modeling of problematic air pollution with particulate matter in the city of pernik, Bulgaria. Ecol. Balk. 2016;8(2):33–41. [Google Scholar]

[bib5] 5.Dockery D.W., Stone P.H. Cardiovascular risks from fine particulate air pollution. N. Engl. J. Med. 2007;356:511–513. doi: 10.1056/NEJMe068274. [DOI] [PubMed] [Google Scholar]

[bib6] 6.CPCB Central Pollution Control Board . 2001. Air quality in Delhi (1989–2000), National Ambient Air Quality Monitoring Series-NAAQMS/17/2000-2001, Parivesh Bhawan, Delhi, India. [Google Scholar]

[bib7] 7.Zhang L., Lin J., Qiu R., Hu X., Zhang H., Chen Q., Tan H., Lin D., Wang J. Trend analysis and forecast of PM2.5 in Fuzhou, China using the ARIMA model. Ecol. Indic. 2018;95:702–710. [Google Scholar]

PERMALINK

Variability analysis in PM2.5 monitoring

Rashmi Bhardwaj

Dimple Pruthi

Abstract

1. Data

Fig. 1.

Table 1.

1.1. Study area

2. Experimental design, materials and methods

Table 2.

Fig. 2.

Table 3.

Table 4.

2.1. Auto regressive integrated moving average

Table 5.

Table 6.

Table 7.

2.2. Air quality index

Fig. 3.

2.3. Concluding remark

Acknowledgements

Footnotes

Transparency document

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

Variability analysis in PM2.5 monitoring

Rashmi Bhardwaj

Dimple Pruthi

Abstract

1. Data

Fig. 1.

Table 1.

1.1. Study area

2. Experimental design, materials and methods

Table 2.

Fig. 2.

Table 3.

Table 4.

2.1. Auto regressive integrated moving average

Table 5.

Table 6.

Table 7.

2.2. Air quality index

Fig. 3.

2.3. Concluding remark

Acknowledgements

Footnotes

Transparency document

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases