ABSTRACT
Symbolic data analysis deals with complex data with symbolic objects, such as lists, histograms, and intervals. Spatial analysis for symbolic data is relatively underexplored. To fill the gap, this paper proposes a statistical framework for spatial interval-valued data (SIVD) analysis. We provide geostatistical methods for spatial prediction, predictive performance measure for prediction assessment, and visualization for mapping SIVD. The proposed methods are illustrated with both simulated and real examples.
KEYWORDS: Spatial prediction, interval-valued data, temperature, symbolic data analysis, geostatistics
MATHEMATICS SUBJECT CLASSIFICATION: 62H11
1. Introduction
Spatial analysis has become increasingly popular in a number of fields such as environmental sciences [21], epidemiology [29], social sciences [30], and public health [11]. Spatial data have been typically collected in the form of univariate [21] and multivariate data [17]. As technology advances, the data generating process has become complicated with complex structures, demanding appropriate statistical tools to analyze the data [3,33]. This paper focuses on spatially referenced symbolic data, specifically spatial interval-valued data (SIVD).
Symbolic data analysis (SDA) is a statistical approach for analyzing massive or complex data with information aggregated in forms of interval, histogram, and list [8]. A symbolic data point is defined as a hypercube in p-dimensional space or Cartesian product of distributions [4]. In this paper, we focus on interval-valued data, the most popular symbolic data in applications. Real life examples include temperature, wind speed, energy production, blood pressure, and stock markets [8,24,35]. A unique feature of interval-valued data is to entail variation in the interval, which is typically measured by range [6]. A key challenge to analyze interval-valued data is how to account for the interval variability within each interval. Taking the internal variation into account in the analysis is important for understanding volatility, variability, and uncertainty in the data [1,12,16]. Thus, accounting for the variability reduces the uncertainty and random variation relative to that found in single-valued data [34]. Interval-valued data is also an efficient way to remove confounding or complex effects, such as time, by aggregating observations over a given time-window into an interval [8].
Numerous statistical methodologies have been proposed to analyze interval-valued data. The seminal work of Billard and Diday[7] proposed a regression model for interval-valued data using the center points. Neto and Carvalho [15] introduced an interval-valued linear model that uses inequality constraints to ensure that the lower bounds are always less than or equal to the upper bounds. Billard [5] and Xu [36] presented covariance functions for interval-valued data to measure between- and within-interval variations. Additionally, interval-valued data analysis has branched out into various fields of statistical analysis such as classification [32], principal component analysis [23], time series analysis [9], Bayesian analysis [2], and functional data analysis [3]. As for spatial interval-valued data, Huang [19] proposed a linear model for areal-type spatial interval-valued data.
The motivating example involves the spatial prediction of daily temperature interval over a geographic region. Although temperature data are actually collected in near real time, they are commonly summarized as a single value, such as daily average or median temperature [20]. This approach results in loss of information from the data aggregation by ignoring daily temperature internal variation. Paradoxically, temperature data have been already reported in the form of interval-valued data with minimum and maximum temperatures. However, they have been often analyzed with a summary statistics, such as mean, minimum, and maximum instead of interval-valued data analysis framework. In particular, statistical analysis of SIVD has been barely conducted and can be applied in a variety of applications, including transportation [25,31], agriculture [18], and energy [26].
The aim of this paper is to propose a statistical framework for the analysis of SIVD. Our proposed methodologies are based on the traditional two-step geostatistical analysis framework. First, variogram analysis is performed to estimate the spatial structure of the data. Second, values at unsampled locations are predicted based on the estimated parameters using Kriging. In this paper, we propose three geostatistical methods for SVID, performance measures for assessing predictive accuracy of interval-valued data, and novel visualization techniques for mapping SIVD.
The paper is organized as follows. In Section 2 we present the proposed methods for analysis of SIVD. In Section 3, we perform a simulation study to evaluate the proposed methods. In Section 4, we apply our proposed methods to real-world SIVD. Section 5 concludes and discusses our work.
2. Methods
2.1. Geostatistical analysis
Geostatistical analysis proceeds in two stages, variogram analysis and kriging. The first step, variogram analysis, is performed to estimate spatial parameters that quantify the spatial structure of data. Formally, variogram is defined as the variance of the difference of values separated by a spatial lag h,
| (1) |
where and are the observations at two spatial locations, and are assumed to be intrinsic stationary, and h is a spatial lag between the locations. is called the semivariogram, which is a half of the variogram and is used interchangeably.
The semivariogram is often estimated using the empirical semivariogram and plotted against spatial lag,
| (2) |
where is the set of all possible pairs separated by h and is the number of distinct pairs in . Spatial dependence is characterized by fitting the estimated empirical semivariograms to a theoretical variogram model.
The second step of geostatistical analysis is to predict the variable of interest at unsampled locations. Kriging is the best linear unbiased predictor (BLUP) that is a linear combination weighted by the proximity of observed data and the spatial dependence estimated from variogram analysis. There are several variants of kriging, which require different underlying assumption on the data, such as simple kriging (SK) for data with constant and known mean, ordinary kriging (OK) for data with constant and unknown mean, and universal kriging (UK) for data with a trend [28].
In this paper we consider intrinsic stationary spatial process and adopt OK methodology. OK predictor for an unsampled location is a weighted linear combination of observations, , where and is a vector of weights. This prediction is completed by determining the weights minimizing the mean squared prediction error, , subject to to ensure the unbiasedness. The optimization problem for choosing the weights can be solved by introducing a Lagrange multiplier,
| (3) |
where is a symmetric matrix with , semivariogram between and , and , a vector of semivariograms between and .
2.2. Geostatistical analysis for SIVD
In this section, we propose three methodologies for the analysis of SIVD. Let denote a spatial interval-valued observation at spatial location ; i.e. , with ; Note that the target prediction is in the interval-valued form, , where is an unsampled location.
2.2.1. Center kriging
The center kriging (CK) uses the centers of interval-valued data for spatial analysis of interval-valued data, . Under the assumption of intrinsic stationarity, the semivariogram for CK is defined by
| (4) |
where , and the corresponding empirical semivariogram is given by
| (5) |
Although CK is a traditional and popular approach in interval-valued data analysis, it is well known that this approach results in loss of information and ignores the intrinsic variability of interval-valued data. A theoretical semivariogram model is fitted to the empirical semivariogram to estimate the spatial parameters in the spatial process of the centers. This implies the upper and lower bounds of the interval-valued data are predicted on the basis of a single spatial process to model the centers. The CK approach would provide fairly accurate prediction when the center process accounts for much of the spatial variation of SIVD.
Analogous to the variogram analysis, the CK predictor incorporates the spatial information retrieved from the centers of the interval-valued data into predicting . This implies that this approach is accomplished by predicting the upper and lower bounds at location based on the estimated covariance structure of the centers. To this end, predicting the center at is first performed, . The optimal predictor for the interval center is , where and . The weights are determined by minimizing the center mean squared prediction error subject to , where is a vector of ones. The optimal predictor for the interval center is , where
| (6) |
where and . The center weights are then used as the weight for predicting the interval bounds and , where and . The CK predictor for is given by .
2.2.2. MinMax kriging
The Min-Max kriging (MMK) is a bivariate method that treats the lower and upper bounds of the interval as independent spatial processes. The MMK is a direct contrast from the CK that assumes the interval bounds are from the same spatial process. The MMK performs geostatistical analysis with two independent and intrinsic stationary spatial processes. The semivarograms of the two bounds are and , respectively. The corresponding empirical semivariograms are
| (7) |
A theoretical semivariogram is fitted to the empirical semivariogram of each bound to estimate the spatial parameters for the lower and upper bounds. The variogram analysis produces unique spatial parameter estimates for each bound.
The MMK also requires predictors for interval bounds, and , to predict an interval-valued observation . The optimal predictors are given by and , where and , and the weights and are determined by minimizing the lower mean squared prediction error subject to and the upper mean squared prediction error subject to . The optimal predictors for the interval lower and upper bounds are and , respectively, where
| (8) |
| (9) |
where , , , and . The MMK predictor is .
2.2.3. Center and range kriging
The proposed center and range kriging (CRK) incorporates center as well as internal variation of SIVD into spatial analysis. Neto and Carvalho [14] proposed to transform interval-valued data into two quantities, center and range, in interval-valued regression analysis. Two regression models are fitted to the centers and the ranges of the intervals. The regression coefficients are estimated by minimizing the sum of the center square error plus the sum of the range square error. In line with the method, our proposed CRK models two intrinsic stationary spatial processes for the interval center, , and range, . We perform variogram analysis to estimate the spatial parameters in the spatial processes for centers and ranges of the interval values, which are used to krig center and range values at unsampled locations. Note that the proposed CRK only incorporates interval variation among the proposed methods in this paper.
The variogram analysis for the center is identical to that in Section 2.2.1. The semivariogram for the range is given by where , and the corresponding empirical semivariograms for the range is
| (10) |
The CRK predictor is a combination of two predictors for center and range. The former is identical to the CK predictor in Section 2.2.1. The latter is given by , and the weights are determined by minimizing the range mean squared prediction error subject to where . The resulting weights are given by
| (11) |
where and . The CRK predictor is , where and are given in Section 2.2.1.
2.3. Prediction assessment
The root mean-square error (RMSE) is a traditional performance measure for assessing the accuracy of spatial prediction,
| (12) |
where and are the predicted and observed values at location , respectively [10]. This measure cannot be directly applied to SIVD due to the interval data format. Hence, we adopt the performance measures used in interval-valued data analysis for evaluating prediction performance. The first two measures are the lower bound RMSE and the upper bound RMSE
| (13) |
where and are the lower and upper bounds of observed interval and and are the lower and upper bounds of the predicted interval . The and measure the performance of interval-valued prediction separately.
We also consider two measures combining the bounds. The combined RMSE serves as an overall measure of accuracy by merging both interval bounds into a single measure,
| (14) |
To take account of the interval variability, the range RMSE is used to quantify the predictive performance based on the interval length
| (15) |
where represents the prediction range and represents the range of the observed interval. It is of interest to assess the range of the prediction because the interval variability is a vital feature of SIVD.
These measures provide different types of information that complement one another in the assessment of spatial prediction of SVID. The measures overall prediction accuracy, whereas provides little insight into the prediction performance of the individual bounds. The and can serve as a compliment to the . The measures the prediction variability, while the other measures focus on assessing prediction bias.
3. Simulation study
In this section, we conduct a simulation study to evaluate the prediction performance of our proposed methods Let denote a SIVD observation at location . The lower and upper bound, and , are determined by the following algorithm. First, we simulate a Gaussian random field (GRF), , in which we specify the covariance parameters, i.e. partial sill, range, and nugget. Next, for a location , we generate m replicates from a family of probability distribution with expected value and dispersion parameter , . In this study, we set to the normal distribution. Next, we construct the interval with the smallest and largest values of the m generated observations, , where and , respectively. Last, we repeat these steps for all n spatial locations to generated n SIVD observations, .
An exponential semivariogram model is considered with different spatial parameter specifications, range parameter ϕ and partial sill . Two values of the dispersion parameter in data generation to form interval-valued observation are chosen, and 3. For each combination, we simulate 100 spatial observations, n = 100, and 10 replicates generated from the interval generating distribution, m = 10. Each data set is divided into train (80%) and test (20%) sets for validation. We apply each of our proposed methods to each simulated data, repeat this process for 10,000 times for each configuration, and calculate the performance measures proposed in Section 2.3.
The RMSE results are shown in Table 1. The most prominent trend is that the CRK method has the best predictive performance for all parameter specifications and performance measures in Section 2.3. While the MMK mostly outperforms the CK, there are a few cases in which the CK has lower RMSE. When is large and ϕ is small, the difference in RMSE between the CK and MMK is the smallest.
Table 1.
Lower (L), upper (U), combined (C) and range (R) root mean square error for center kriging (CK), MinMax kriging (MMK), and center and range kriging (CRK).
| ϕ | ||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | 1.00 | 0.50 | 1.3584 | 1.3416 | 1.3064 | 1.3602 | 1.3446 | 1.3153 | 1.9354 | 1.9120 | 1.8663 | 1.7210 | 1.6940 | 1.5686 |
| 1.00 | 1.00 | 1.3273 | 1.3100 | 1.2885 | 1.3335 | 1.3205 | 1.3022 | 1.8947 | 1.8737 | 1.8440 | 1.6879 | 1.6664 | 1.5750 | |
| 1.00 | 2.00 | 1.2728 | 1.2650 | 1.2517 | 1.2625 | 1.2557 | 1.2459 | 1.8039 | 1.7937 | 1.7768 | 1.6374 | 1.6369 | 1.5793 | |
| 2.00 | 0.50 | 1.4562 | 1.4579 | 1.3882 | 1.4737 | 1.4710 | 1.3998 | 2.0857 | 2.0845 | 1.9836 | 1.8368 | 1.8160 | 1.6024 | |
| 2.00 | 1.00 | 1.3977 | 1.3988 | 1.3455 | 1.3937 | 1.3939 | 1.3323 | 1.9867 | 1.9880 | 1.9049 | 1.7510 | 1.7535 | 1.5673 | |
| 2.00 | 2.00 | 1.3454 | 1.3338 | 1.3060 | 1.3558 | 1.3399 | 1.3151 | 1.9234 | 1.9035 | 1.8656 | 1.7156 | 1.6916 | 1.5870 | |
| 3.00 | 0.50 | 1.5051 | 1.4974 | 1.4368 | 1.5021 | 1.5009 | 1.4285 | 2.1390 | 2.1337 | 2.0378 | 1.8062 | 1.8224 | 1.5683 | |
| 3.00 | 1.00 | 1.4546 | 1.4493 | 1.3880 | 1.4595 | 1.4618 | 1.3824 | 2.0731 | 2.0707 | 1.9702 | 1.8005 | 1.7975 | 1.5675 | |
| 3.00 | 2.00 | 1.3898 | 1.3878 | 1.3393 | 1.3687 | 1.3677 | 1.3197 | 1.9625 | 1.9603 | 1.8913 | 1.7599 | 1.7159 | 1.6055 | |
| 3 | 1.00 | 0.50 | 1.9311 | 1.8990 | 1.8758 | 1.9591 | 1.9110 | 1.9069 | 2.7693 | 2.7121 | 2.6926 | 2.5676 | 2.4465 | 2.4037 |
| 1.00 | 1.00 | 1.8670 | 1.8509 | 1.8427 | 1.8898 | 1.8661 | 1.8624 | 2.6744 | 2.6464 | 2.6381 | 2.4840 | 2.4487 | 2.4079 | |
| 1.00 | 2.00 | 1.8194 | 1.8098 | 1.8071 | 1.8311 | 1.8151 | 1.8124 | 2.6018 | 2.5839 | 2.5794 | 2.4306 | 2.4059 | 2.3823 | |
| 2.00 | 0.50 | 2.0693 | 2.0359 | 1.9829 | 2.0709 | 2.0480 | 1.9883 | 2.9483 | 2.9091 | 2.8255 | 2.6475 | 2.5885 | 2.3780 | |
| 2.00 | 1.00 | 1.9968 | 1.9707 | 1.9383 | 1.9946 | 1.9659 | 1.9400 | 2.8412 | 2.8024 | 2.7591 | 2.5391 | 2.4738 | 2.3587 | |
| 2.00 | 2.00 | 1.8871 | 1.8675 | 1.8589 | 1.8732 | 1.8503 | 1.8411 | 2.6772 | 2.6472 | 2.6339 | 2.4309 | 2.4003 | 2.3350 | |
| 3.00 | 0.50 | 2.1345 | 2.1126 | 2.0496 | 2.1492 | 2.1292 | 2.0445 | 3.0499 | 3.0197 | 2.9126 | 2.6947 | 2.6549 | 2.3834 | |
| 3.00 | 1.00 | 2.0273 | 2.0124 | 1.9560 | 2.0254 | 1.9978 | 1.9528 | 2.8860 | 2.8543 | 2.7819 | 2.6129 | 2.5640 | 2.3852 | |
| 3.00 | 2.00 | 1.9316 | 1.9190 | 1.9029 | 1.9241 | 1.9071 | 1.8836 | 2.7444 | 2.7229 | 2.6954 | 2.4783 | 2.4534 | 2.3738 |
As expected, the CRK method is highly competitive in terms of estimating the interval range, resulting in the smallest range RMSE by incorporating the observed ranges into predicting interval value and appearing to contribute to improve the spatial prediction of SIVD. Note that the CK and MMK only rely on the predicted interval bounds to determine the ranges.
For all methods, the lower, upper, combined RMSE decrease as ϕ increases and decreases. It is also found that their values increase for all parameter specifications as τ increases. The predictive performance of all methods becomes more similar as ϕ increases and and τ decrease. The range RMSE of the CRK method does not show any pattern with respect to and ϕ, indicating that the prediction of the interval range by the CRK method is robust to changes in spatial parameters. The range RMSE of the CRK method increases as τ increases, which is in agreement with results from CK and MMK.
4. Application to temperature in eastern Texas
In this section, we apply the proposed methods to a real-world SIVD dataset of daily temperature across eastern and southern Texas. The data consists of 107 interval-valued data with daily minimum and maximum temperatures on 1 March 2019, recorded by the National Centers for Environmental Information of the National Oceanic and Atmospheric Administration (NOAA). Figure 1 shows the 107 station locations over a topographic map and displays a 3D scatter plot of the interval-valued data.
Figure 1.
Geographical locations of 107 observed stations over a topographic map in Texas (left). Spatial interval-valued data mapping of the temperature data (right).
The interval features, lower bound, upper bound, center, and range, are shown in Figure 2, where the point size is proportional to the temperature value. These scatterplots explore the spatial distributions of the SIVD by providing insight into the spatial structure of the respective interval feature. For instance, the stations in south Texas have similar minimums, whereas the ranges in the same area are relatively dissimilar. The spatial distribution of ranges is distinctly different from that of minimums, maximums, and centers. As expected, the values of minimums, maximums, and centers decrease as latitude increases. Interestingly, large ranges are observed in central and coast regions.
Figure 2.
Scatterplots of lower bound (top-left), upper bound (top-right), center (bottom-left), and range (bottom-right) of temperature across Texas. The point size is proportional to the value.
The density plots of the minimums, maximums, centers, and ranges are shown in Figure 3. The distribution of the ranges is right skewed and has the highest peak. The center and minimum distributions have a similar variance and bimodal shape. The distribution of maximum temperatures also shows bimodal shape and has larger variance than the center and minimum distributions. In pursuit of spatial prediction of SIVD, we perform variagoram analysis to estimate spatial structure. Figure 4 displays the empirical variograms and the corresponding theoretical variogram fits of the interval features. Three theoretical variogram models are considered, exponential, Gaussian, and spherical models. The best fits are determined by comparing the sum of squares error (SSE). Gaussian models are selected for the minimums and centers, while exponential and spherical models are chosen for the maximums and the ranges, respectively.
Figure 3.
Densities of lower bound, upper bound, center, and range of temperature interval-valued data.
Figure 4.
Empirical variograms for the minimum (top-left), maximum (top-right), center (bottom-left), and the range (bottom-right) temperature with their corresponding theoretical variogram fits.
The parameter estimates for the models are shown in Table 2. The estimates of are considerably different for all four features, which implies that the spatial variations are different for the interval features. The minimum temperature has considerably smaller than the maximum temperature, which indicates the lower bound of the interval is much less spatially variable than the upper bound. The centers and minimums fitted to the Gaussian variogram model have similar spatial range but dissimilar sill estimates. The maximums have the largest range and sill estimates, while the ranges have the smallest sill and range estimates. We perform leave-one-out cross validation using the kriging methodologies proposed in Section 2.2 and calculate the four proposed RMSE measures as shown in Table 3. The CRK has the smallest RMSE value for the upper bound, combined, and range, while the CK and MMK have the smallest lower bound RMSE. The RMSE of the CK and MMK are nearly similar. Overall, the results are consistent with the results in the simulation study. Due to the presence of several outliers in the maximum values, the lower bound RMSE was considerably smaller than the upper bound, combined, and range RMSE. Figure 5 shows predicted interval features over study area using the CRK. Since longitude and latitude are closely related to temperature, the overall trend for the minimum, maximum, and center is lower values in the north and higher values in the south even though each feature shows varying gradients of change. On the other hand, the range is relatively uniform across the state with higher value along the Gulf coast and two hot spots in central Texas. The north region has small variability in minimum, maximum, and center, which results in small range.
Table 2.
Spatial parameter estimates for interval features.
| SIVD feature | Sill | Range ϕ |
|---|---|---|
| Min | 97.279 | 3.595 |
| Max | 332.739 | 8.384 |
| Center | 172.354 | 3.494 |
| Range | 78.743 | 0.979 |
Table 3.
Lower (L), upper (U), combined (C), and range (R) cross-validated RMSE for CK, MMK, and CRK method application to SIVD temperature.
| Measure | CK | MMK | CRK |
|---|---|---|---|
| 2.7456 | 2.7456 | 2.9406 | |
| 8.6318 | 8.6313 | 8.1647 | |
| 9.0580 | 9.0574 | 8.6781 | |
| 8.4473 | 8.4457 | 7.6081 |
Figure 5.
Spatial prediction of minimum (top-left), maximum (top-right), center (bottom-left), and range (bottom-right) temperatures.
Figure 6 shows a three-dimensional representation of spatial prediction of SIVD using the minimum and maximum temperature surfaces predicted by the CRK. This novel mapping technique allows us to visualize spatial interpolation of SIVD over the entire study area with the nature of interval-valued data. This figure highlights that the center shows steady increasing gradient as latitude decreases, while the range has drastic changes due to a few spikes of maximum.
Figure 6.
Map of spatial of temperature intervals using the CRK method. The upper and lower surfaces correspond to predicted upper and lower bounds, respectively.
5. Conclusion and discussion
In this paper, we proposed geostatistical methodologies for the spatial interpolation of interval-valued data. The simulation study showed that the proposed CRK method outperforms the other methods in the prediction evaluation criterion, while the CK shows the worst prediction performance. Similar to the simulation study, the results in our application study on daily temperature SIVD show that the CRK method has the best prediction performance, followed by the MM method. This implies that incorporating the interval variation using the range into spatial analysis plays an important role. Furthermore, we observed that aggregating data collected over a time window into a single value, such as center in the CK method, leads to substantial information loss and poor spatial prediction. The MMK method outperforms the CK method, which indicates that the interval bounds provides more information related to prediction than the single feature. However, since the interval bounds are assumed to be independent, the MMK method fails to efficiently account for the internal variability as well. In addition, it is feasible to predict lower bound that is greater than the predicted upper bound due to the independence assumption.
The proposed methods are developed under some assumptions, but they may be too restrictive in many real world problems. Firstly, intrinsic stationary stationarity is implicitly assumed for using ordinary kriging. However, in applications to real-world spatial data, we often observe a trend over space and attempt to model it using spatial coordinates or external variables. In our application, temperature is closely related to altitude [27], wind speed [13], and coast proximity [22]. Modeling the mean with those variables can contribute to capture large-scale variation in temperature and improve the prediction performance. Secondly, we assume that the interval features are unrelated. Joint modeling the interval features can improve the statistical inference by accounting for the relationship and cope with the potential ordering problem discussed earlier. Relaxing the assumptions can make the proposed methods more flexible. Lastly, the underlying spatial process is often assumed to follow the Gaussian process. However, the assumption might be overly restrictive in applications. It would be of interest to model the process with flexible distributions in a future work.
Symbolic data are expressed in several forms, such as lists, intervals, histograms, and multi-valued data. In this paper, we limit our focus to a single format of symbolic data, interval-valued data, which is the most common form in application. Future work can propose statistical methodologies for alternative forms of spatial symbolic data, such as spatial histograms, spatial lists, and spatial multi-valued data.
There are several visualization tools available which can assist with analyzing geostatistics data. However, visualizing geostatistical analysis of SVID with these tools is not straightforward due to the nature of interval-valued data. Toward this end, we endeavored to display them in 2-dimensional and 3-dimensional spaces in this paper. We envision novel visualization methods for statistical analysis of SIVD.
Disclosure statement
No potential conflict of interest was reported by the author(s).
References
- 1.Alizadeh S., Brandt M.W., and Diebold F.X., Range-based estimation of stochastic volatility models, J. Finance. 57 (2002), pp. 1047–1091. [Google Scholar]
- 2.Beranger B., Lin H., and Sisson S.A., New models for symbolic data analysis, preprint (2018). Available at arXiv:1809.03659.
- 3.Beyaztas U., Shang H.L., and Abdel-Salam A.-S.G., Functional linear models for interval-valued data, Commun. Statist. Simul. Comput., 2020, pp. 1–20.
- 4.Billard L., Symbolic data analysis: what is it?, in Compstat 2006-Proceedings in Computational Statistics, Springer, Berlin, 2006, pp. 261–269.
- 5.Billard L., Dependencies and variation components of symbolic interval-valued data, in Selected Contributions in Data Analysis and Classification, Springer, Berlin, 2007, pp. 3–12.
- 6.Billard L., Brief overview of symbolic data and analytic issues, Stat. Anal. Data Min.: ASA Data Sci. J. 4 (2011), pp. 149–156. [Google Scholar]
- 7.Billard L. and Diday E., Regression analysis for interval-valued data, in Data Analysis, Classification, and Related Methods, Springer, Berlin, 2000, pp. 369–374.
- 8.Billard L. and Diday E., From the statistics of data to the statistics of knowledge: Symbolic data analysis, J. Am. Stat. Assoc. 98 (2003), pp. 470–487. [Google Scholar]
- 9.Brida J.G. and Punzo L.F., Symbolic time series analysis and dynamic regimes, Struct. Change Econ. Dyn. 14 (2003), pp. 159–183. [Google Scholar]
- 10.Bui D.T., Panahi M., Shahabi H., Singh V.P., Shirzadi A., Chapi K., Khosravi K., Chen W., Panahi S., Li S., and Ahmad B.B., Novel hybrid evolutionary algorithms for spatial prediction of floods, Sci. Rep. 8 (2018), pp. 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Chen W., Tang H., and Zhao H., Diurnal, weekly and monthly spatial variations of air pollutants and air quality of Beijing, Atmos. Environ. 119 (2015), pp. 21–34. [Google Scholar]
- 12.Chou R.Y., Forecasting financial volatilities with extreme values: The conditional autoregressive range (CARR) model, J. Money Credit Bank. 37 (2005), pp. 561–582. [Google Scholar]
- 13.Dai A., Trenberth K.E., and Karl T.R., Effects of clouds, soil moisture, precipitation, and water vapor on diurnal temperature range, J. Clim. 12 (1999), pp. 2451–2473. [Google Scholar]
- 14.de A Lima Neto E. and de AT de Carvalho F., Centre and range method for fitting a linear regression model to symbolic interval data, Comput. Stat. Data. Anal. 52 (2008), pp. 1500–1515. [Google Scholar]
- 15.de A Lima Neto E. and de AT de Carvalho F., Constrained linear regression models for symbolic interval-valued variables, Comput. Stat. Data. Anal. 54 (2010), pp. 333–347. [Google Scholar]
- 16.Fernandes M., de Sá Mota B., and Rocha G., A multivariate conditional autoregressive range model, Econ. Lett. 86 (2005), pp. 435–440. [Google Scholar]
- 17.Handcock M.S. and Wallis J.R., An approach to statistical spatial-temporal modeling of meteorological fields, J. Am. Stat. Assoc. 89 (1994), pp. 368–378. [Google Scholar]
- 18.Hargreaves G.H. and Samani Z.A., Reference crop evapotranspiration from temperature, Appl. Eng. Agric. 1 (1985), pp. 96–99. [Google Scholar]
- 19.Huang T., A constrained spatial autoregressive model for interval-valued data, preprint (2022). Available at arXiv:2210.15869.
- 20.Isaak D.J., Luce C.H., Rieman B.E., Nagel D.E., Peterson E.E., Horan D.L., Parkes S., and Chandler G.L., Effects of climate change and wildfire on stream temperatures and salmonid thermal habitat in a mountain river network, Ecol. Appl. 20 (2010), pp. 1350–1371. [DOI] [PubMed] [Google Scholar]
- 21.Jerrett M., Burnett R.T., Ma R., Pope III C.A., Krewski D., Newbold K.B., Thurston G., Shi Y., Finkelstein N., Calle E.E., and Thun M.J., Spatial analysis of air pollution and mortality in los angeles, Epidemiology 16 (2005), pp. 727–736. [DOI] [PubMed] [Google Scholar]
- 22.Kim S.-O., Yun J.-I., Chung U.-R., and Hwang K.-H., A geospatial evaluation of potential sea effects on observed air temperature, Korean J. Agric. For. Meteorol. 12 (2010), pp. 217–224. [Google Scholar]
- 23.Le-Rademacher J. and Billard L., Symbolic covariance principal component analysis and visualization for interval-valued data, J. Comput. Graph. Stat. 21 (2012), pp. 413–432. [Google Scholar]
- 24.Maia A.L.S., de AT de Carvalho F., and Ludermir T.B., Forecasting models for interval-valued time series, Neurocomputing 71 (2008), pp. 3344–3352. [Google Scholar]
- 25.Melillo J.M., Richmond T.T., and Yohe G., Climate change impacts in the united states, Third National Climate Assesst 52 (2014), pp. 150–174. [Google Scholar]
- 26.Quayle R.G. and Diaz H.F., Heating degree day data applied to residential heating energy consumption, J. Appl. Meteorol. 19 (1980), pp. 241–246. [Google Scholar]
- 27.Revadekar J.V., Hameed S., Collins D., Manton M., Sheikh M., Borgaonkar H.P., Kothawale D.R., Adnan M., Ahmed A.U., Ashraf J., Baidya S., Islam N., Jayasinghearachchi D., Manzoor N., Premalal K.H.M.S., and Shreshta M.L., Impact of altitude and latitude on changes in temperature extremes over South Asia during 1971–2000, Int. J. Climatol. 33 (2013), pp. 199–209. [Google Scholar]
- 28.Schabenberger O. and Gotway C.A., Statistical Methods for Spatial Data Analysis, CRC Press, Boca Raton, FL: 2005. [Google Scholar]
- 29.Sifuna P.M., Ouma C., Atieli H., Owuoth J., Onyango D., Andagalu B., and Cowden J., Spatial epidemiology of tuberculosis in the high-burden counties of Kisumu and Siaya, western Kenya, 2012–2015, Int. J. Tuberc. Lung. Dis. 23 (2019), pp. 363–370. [DOI] [PubMed] [Google Scholar]
- 30.Sit M.A., Koylu C., and Demir I., Identifying disaster-related tweets and their semantic, spatial and temporal context using deep learning, natural language processing and spatial analysis: A case study of hurricane Irma, Int. J. Digit. Earth 12 (2019), pp. 1205–1229. [Google Scholar]
- 31.Transportation Research Board and National Research Council , Potential impacts of climate change on U.S. transportation: Special report 290, The National Academies Press, Washington, DC, 2008.
- 32.Verde R., Clustering methods in symbolic data analysis, in Classification, Clustering, and Data Mining Applications, Springer, Berlin, 2004, pp. 299–317.
- 33.Xin J. and Zazueta F., Technology trends in ICT–towards data-driven, farmer-centered and knowledge-based hybrid cloud architectures for smart farming, Agric. Eng. Int.: CIGR J. 18 (2016), pp. 275–279. [Google Scholar]
- 34.Xiong T., Bao Y., and Hu Z., Multiple-output support vector regression with a firefly algorithm for interval-valued stock price index forecasting, Knowl. Based. Syst. 55 (2014), pp. 87–100. [Google Scholar]
- 35.Xiong T., Li C., and Bao Y., Interval-valued time series forecasting using a novel hybrid HOLTI and MSVR model, Econ. Model. 60 (2017), pp. 11–23. [Google Scholar]
- 36.Xu W., Symbolic data analysis: interval-valued data regression, PhD thesis, University of Georgia, 2010.






