On Statistics, Probability, and Entropy of Interval-Valued Datasets

Chenyi Hu; Zhihui H Hu

doi:10.1007/978-3-030-50153-2_31

. 2020 May 16;1239:407–421. doi: 10.1007/978-3-030-50153-2_31

On Statistics, Probability, and Entropy of Interval-Valued Datasets

Editors: Marie-Jeanne Lesot⁶, Susana Vieira⁷, Marek Z Reformat⁸, João Paulo Carvalho⁹, Anna Wilbik¹⁰, Bernadette Bouchon-Meunier¹¹, Ronald R Yager¹²

PMCID: PMC7274713

Abstract

Applying interval-valued data and methods, researchers have made solid accomplishments in information processing and uncertainty management. Although interval-valued statistics and probability are available for interval-valued data, current inferential decision making schemes rely on point-valued statistic and probabilistic measures mostly. To enable direct applications of these point-valued schemes on interval-valued datasets, we present point-valued variational statistics, probability, and entropy for interval-valued datasets. Related algorithms are reported with illustrative examples.

Keywords: Interval-valued dataset, Point-valued variational statistics, Probability, Information entropy

Introduction

Why Do We Study Interval-Valued Datasets?

Statistic and probabilistic measures play a very important role in processing data and managing uncertainty. In the literature, these measures are mostly point-valued and applied to point-valued dataset. While a point-valued datum intends to record a snapshot of an event instantaneously in theory, it is often imprecise in real world due to system and random errors. Applying interval-valued data to encapsulate variations and uncertainty, researchers have developed interval methods for knowledge processing. With data aggregation strategies [1, 5, 21], and others, we are able to reduce large size point-valued data into smaller interval-valued ones for efficient data management and processing. By doing so, researchers are able to focus more on qualitative properties and ignore insignificant quantitative differences.

Studying interval-valued data, Gioia and Lauro developed interval-valued statistics [4] in 2005. Lodwick and Jamison discussed interval-valued probability [17] in the analysis of problems containing a mixture of possibilistic, probabilistic, and interval uncertainty in 2008. Billard and Diday reported regression analysis of interval-valued data in [2]. Huynh et al. established a justification on decision making under interval uncertainty [13]. Works on applications of interval-valued data in knowledge processing include [3, 8, 16, 19, 20, 22], and many more. Applying interval-valued data in the stock market forecasting, Hu and He initially reported an astonishing quality improvements in [9]. Specifically, comparing against the commonly used point-valued confidence interval predictions, the interval approaches have increased the average accuracy ratio of annual stock market forecasts from 12.6% to 64.19%, and reduced the absolute mean error from 72.35% to 5.17% [9]. Additional results on the stock market forecasts reported in [6, 7, 10], and others have verified the advantages of using interval-valued data. The paper [12], published in the same volume as this one, further validates the advantages from the perspective of information theory.

Using interval-valued data can significantly improve efficiency and effectiveness in information processing and uncertainty management. Therefore, we need to study interval-valued datasets.

The Objective of this Study

As a matter of fact, powerful inferential decision making schemes in the current literature use point-valued statistic and probabilistic measures, not interval-valued ones [4] and [17], mostly. To enable direct applications of these schemes and theory on analyzing interval-valued datasets, we need to supply point-valued statistics and probability for interval-valued datasets. Therefore, the primary objective of this work is to establish and to calculate such point-valued measures for interval-valued datasets.

To make this paper easy to read, it includes brief introductions on necessary background information. It also provides easy to follow illustrative examples for novel concepts and algorithms in addition to pseudo-code. Numerical results of these examples are obtained with a recent version of Python 3. However, readers may use any preferred general purpose programming language to verify the results.

Basic Concepts and Notations

Prior to our discussion, let us first clarify some basic concepts and notations related to intervals in this paper. An interval is a connected subset of Inline graphic . We denote an interval-valued object with a boldfaced letter to distinguish it from a point-valued one. We further specify the greatest lower bound and least upper bound of an interval object with an underline and an overline of the same letter but not boldfaced, respectively. For example, while a is a real, the boldfaced letter Inline graphic denotes an interval with its greatest lower bound , and least upper bound . That is . The absolute value of a, defined as , is also called the length (or norm) of a. This is the greatest distance between any two numbers in a.

The midpoint and radius of an interval Inline graphic are defined as and respectively. Because the midpoint and radius of an interval are point-valued, we simply denote them as mid(a) and rad(a) without boldfacing the letter a. We call the endpoint (or min-max) representation of . We can specify an interval with mid(a) and rad(a) too. This is because of Inline graphic and . In the rest of this paper, we use both min-max and mid-rad representations for an interval-valued object.

While we use a boldfaced lowercase letter to indicate an interval, we denote an interval-valued dataset, i.e., a collection of real intervals, with a boldfaced uppercase letter. For instance, Inline graphic is an interval-valued dataset. The sets and are the left- and right-end sets of X, respectively. Although items in a set are not ordered, the and are related to the same interval . For convenience, we denote both and as ordered tuples. They are the left- and right-endpoints of Inline graphic . That is and . Similarly, the midpoint and radius of are point-valued tuples. They are and respectively.

Example 1

Provided an interval-valued sample dataset Inline graphic , . Then, its left-endpoint is , and right-endpoint is . The midpoint of is , and the radius is .

We use this sample dataset Inline graphic in the rest of this paper to illustrate concepts and algorithms for its simplicity.

In the rest of this paper, we discuss statistics of an interval-valued dataset in Sect. 2; define point-valued probability distributions for an interval-valued dataset in Sect. 3; introduce point-valued information entropy in Sect. 4; and summarize the main results and future work in Sect. 5.

Descriptive Statistics of an Interval-Valued Dataset

We introduce positional statistics for an interval-valued dataset first, and then discuss its point-valued variance and standard deviation.

Positional Statistics of an Interval-Valued Dataset X

The left-, right-endpoints, midpoint, and radius Inline graphic and rad(X) are among positional statistics of an interval-valued dataset X as presented in Example 1. The mean of , denoted as , is the arithmetic average of X. Because in interval arithmetic1, we have

We now define few more observational statistics for X.

Definition 1

Let Inline graphic be an interval-valued dataset, then

The envelope of is the interval ;
The core of is the interval ; and
The mode of is a tuple, , where , is a cardinality k subset of , and for any if then .

In other words, Inline graphic is a subset of , and is a subset of . Furthermore, is an ordered tuple. In which, is the non-empty intersection of for all , such that, the cardinality of is the greatest. For a given , its mode may not be unique. This is because of that, there may be multiple cardinality k subsets of {1, 2, ..., n} satisfying the nonempty intersection requirement Inline graphic .

Corollary 1

Let Inline graphic be an interval-valued dataset, then

For all , ;
The core of is not empty if and only if ; and
The mode of is if and only if

Corollary 1 is straightforward.

Instead of providing a proof, we provide the mean, envelop, core and mode for the sample dataset Inline graphic . In addition to its endpoints, midpoint, and radius presented in Example 1, we have its mean ; ; because of is greater than ; and . Figure 1 illustrates the sample dataset . From which, one may visualize the and by imaging a vertical line, like the y-axis, continuously moving from left to right. The first and last points the line touches any Inline graphic determine the envelop . The line touches at most four intervals for all between [2.5, 3]. Hence, the mode is .

Fig. 1. — The sample interval-valued dataset .

While finding the envelop, core, and mean of Inline graphic is straightforward, determining the mode of involves the 2n numbers in and , which divide into sub-intervals in general (though some of them maybe degenerated as points.) Each of these sub-intervals can be a candidate of the nonempty intersection part in the mode. For any , it may cover some of these Inline graphic sub-intervals (candidates) consecutively. For each of these candidates, we accumulate its occurrences in each The mode(s) for is (are) the candidate(s) with the (same) highest occurrence. As a special case, if is not empty, then . We summarize the above as an algorithm. graphic file with name 500679_1_En_31_Figa_HTML.jpg

Algorithm 1 is Inline graphic . This is because of that for each interval , it may update the count in each of the candidates takes .

Point-Valued Variational Statistics of an Interval-Valued Dataset

In the literature, the variance of a point-valued dataset X is defined as

in which, the term Inline graphic is the distance between and , which is the mean of X.

Using (2) to define a variance for an interval-valued Inline graphic , we need a notion of point-valued distance between two intervals, and the interval . May we simply use , the absolute value of the difference between two intervals and , as their distance? Unfortunately, it does not work.

In interval arithmetic [18], the difference between two intervals Inline graphic and is defined as the follow:

Equation (3) ensures Inline graphic However, it also implies , which is the maximum distance between and .

Mathematically, a distance between two nonempty sets A and B is usually defined as the minimum distance between Inline graphic and but not the maximum. Hence, we need to define a notion of distance between two intervals.

Definition 2

Let Inline graphic and be two nonempty intervals. The distance between and is defined as

Definition 2 satisfies all mathematical requirements for a distance. They are Inline graphic if and only if ; and for any nonempty intervals , and , . Definition 2 is in fact an extension of the distance between two reals. This is because of that the radius of a real is zero and the midpoint of a real is itself always.

Replacing Inline graphic in Equation (2) with defined in (4), we have the point-valued variance of as the follow:

The expression above has three terms. All of them involve Inline graphic and . Since , Therefore, the first term in the expression above according to (2). Similarly, the second term .

The third term is related to the absolute covariance between mid(X) and rad(X). Let Inline graphic and , then we can rewrite the term as .

Summarizing the discussion above, we have the point-valued variance for an interval-valued dataset Inline graphic as the follow.

Definition 3

Let Inline graphic be an interval-valued dataset, then the point-valued variance of is

Because midpoints and radii of interval-valued objects are point-valued, the variance defined in (5) is also point-valued. Hence, we have the point-valued standard deviation of Inline graphic as usual:

In evaluating (5) and (6), one does not need interval computing at all. For the sample dataset Inline graphic , we have its point-valued variance ; and the standard deviation .

It is worthwhile to note that, Eq. (5) is an extension of (2) and applicable to point-valued datasets too. This is because of that, for all Inline graphic in a point-valued X, and always. Hence, for a point-valued X.

Probability Distributions of an Interval-Valued Population

An interval-valued dataset Inline graphic can be viewed as a sample of an interval-valued population. In this section, we study practical ways to find probability distributions for an interval-valued dataset . Our discussion addresses two different cases. One assumes distribution information for all . The other does not.

On Probability Distribution of X with Distribution Information for Each

Our discussion involves the concept of a probability distribution over an interval. Let us very briefly review the literature first.

A function f(x) is a probability density function (pdf) of a random variable x on the interval Inline graphic if and only if , and . Well-known pdfs in the literature include the uniform distribution: normal distribution: ; and beta distribution: , where and both parameters and are positive, and is the gamma function. There are software tools available to fit point-valued sample data, which means computationally determining the parameter values in a chosen type of distribution. For instance, the Python scipy.stats module is available to find the optimal Inline graphic and to fit a point-valued dataset in a normal distribution, and/or and in a beta distribution.

It is safe to assume an availability of a Inline graphic for each both theoretically and computationally. In practice, an interval is often obtained through aggregating observed points. For instances, in [9] and [11], min-max and confidence intervals are applied to aggregate points into intervals, respectively. If an interval is provided directly, one can always pick points from the interval and fit these points with a selected probability distribution computationally. Hereafter, we denote the Inline graphic of as .

We now define a notion of Inline graphic for an interval-valued dataset .

Definition 4

A function f(x) is called a probability density function of an interval-valued dataset Inline graphic if and only if f(x) satisfies all of the conditions:

The theorem below provides a practical way to calculate a Inline graphic for .

Theorem 1

Let Inline graphic be an interval-valued dataset; and be the of provided Then,

is a pdf of X.

Proof

Because Inline graphic , we have . Hence, . In addition, for all , we have . Equation (7) satisfied. Hence, the f(x) is a pdf of X.

Equation (8) actually provides a practical way of calculating the Inline graphic of X. Provided for each , we have the algorithm in pseudo-code below: graphic file with name 500679_1_En_31_Figb_HTML.jpg

Example 2

Find a pdf from the sample dataset Inline graphic [2, 3], [2.5, 7], . For simplicity, we assume a uniform distribution for each ’s, i.e.,

Applying Algorithm 2, we have

The Inline graphic in the example is a stair function. This is because the uniform distribution assumption on each

Here are few additional notes on finding a Inline graphic for with Algorithm 2 .

If assuming uniform distribution, how do we handle the case if Inline graphic such that ? First of all, an interval element is usually not degenerated as a constant. Even there is an i such that , we can always assign an arbitrary non-negative value at that point. This does not impact the calculation of probability in integrating the function.

Algorithm 2 assumes Inline graphic . If it is not the case, the 2n numbers in and divide in sub-intervals. They are together with the sub-intervals in . Therefore, the accumulation loop in Algorithm 2 should run through all of the sub-intervals, and then normalize them by dividing n.

Another implicit assumption of Theorem 1 is that, all Inline graphic are equally weighted. However, that is not necessary. If needed, one may place a positive weight on each of ’s as stated in the Corollary 2.

Corollary 2

Let Inline graphic be an interval-valued dataset and be the pdf of , then the function

is a Inline graphic of .

A proof of Corollary 2 is straightforward too. We have successfully applied the Corollary in computationally studying the stock market [12].

Probability Distribution of an Interval-Valued X Without Distribution Information for Any

It is not necessary to assume the probability distribution for all Inline graphic to find a of X. An interval is determined by its midpoint and radius. Let and be two point-valued random variables. Then, the of is a non-negative function , such that . If we assume a normal distribution for , then f(u, v) is a bivariate normal distribution [25]. The of a bivariate normal distribution is:

where Inline graphic and is the normalized correlation between u and v, i.e., the ratio of their covariance and the product of and . Applying the pdf, we are able to estimate the probability over a region , as

To calculate the probability of an interval x, whose midpoint and radius are Inline graphic and , we need a marginal for either u or v. If we fix , then the marginal of v follows a single variable normal distribution. Thus,

and the probability of x is

An interval-valued dataset Inline graphic provides us its mid(X) and rad(X). They are point-valued sample sets of u and v, respectively. All of , , and can be calculated as usual to estimate the , , , and in (11). For instance, from the sample , we have , , , , and , respectively. Furthermore, using and in (13), we can estimate the probability of an arbitrary interval x with (14).

So far, we have established practical ways to calculate point-valued variance, standard deviation, and probability distribution for an interval-valued dataset X. With them, we are able to directly apply commonly available inferential decision making schemes based on interval-valued dataset.

Information Entropy of Interval-Valued Datasets

While it is out of the scope of this paper to discuss specific applications of inferential statistics on an interval-valued dataset, we are interested in measuring the amount of information in an interval-valued dataset. Information entropy is the average rate at which information is produced by a stochastic source of data [24]. Shannon introduced the concept of entropy in his seminal paper “A Mathematical Theory of Communication” [23]. The measure of information entropy associated with each possible data value is:

where Inline graphic is the probability of .

An interval-valued dataset Inline graphic divides the real axis into sub-intervals. Using to denote the partition and to specify its j-th element, we have . As illustrated in Example 2, we can apply Algorithm 2 to find the for each . Then, the probability of is available. Hence, we can apply (15) to calculate the entropy of an interval-valued dataset Inline graphic . For reader’s convenience, we summarize the steps of finding the entropy of as an algorithm below.

The example below finds the entropy of the sample dataset Inline graphic with the same assumption of uniform distribution in Example 2.

Example 3

Equation (9) in Example 2 provides the Inline graphic of . Applying it, we obtain the probability of each interval as

The entropy of Inline graphic is

Algorithm 3 provides us a much needed tool in studying point-valued information entropy of an interval-valued dataset. Applying it, we have investigated entropies of the real world financial dataset, which has used in the study of stock market forecasts [6, 7], and [9], from the perspective of information theory. The results are reported in [12]. It not only reveals the deep reason of the significant quality improvements reported before, but also validates the concepts and algorithms presented here in this paper as a successful application.

Summary and Future Work

Recent advances have shown that using interval-valued data can significantly improve the quality and efficiency of information processing and uncertainty management. For interval-valued datasets, this work establishes much needed concepts of point-valued variational statistics, probability, and entropy for interval-valued datasets. Furthermore, this paper contains practical algorithms to find these point-valued measures. It provides additional theoretic foundations of applying point-valued methods in analyzing interval-valued datasets.

These point-valued measures enable us to directly apply currently available powerful point-valued statistic, probabilistic, theoretic results to interval-valued datasets. Applying these measures in various applications is definitely among a high priority of our future work. In fact, using this work as the theoretic foundation, we have successfully analyzed the entropies of the real world financial dataset related to the stock market forecasting mentioned in the introduction of this paper. The obtained results are reported in [12] and published in the same volume as this one. On a theoretic side, future work includes extending the concepts in this paper from single dimensional to multi-dimensional interval-valued datasets.

Footnotes

For readers who want to know more about standardized interval arithmetic, please refer the IEEE Standards for Interval Arithmetic [14] and [15].

Contributor Information

Marie-Jeanne Lesot, Email: marie-jeanne.lesot@lip6.fr.

Susana Vieira, Email: susana.vieira@tecnico.ulisboa.pt.

Marek Z. Reformat, Email: marek.reformat@ualberta.ca

João Paulo Carvalho, Email: joao.carvalho@inesc-id.pt.

Anna Wilbik, Email: a.m.wilbik@tue.nl.

Bernadette Bouchon-Meunier, Email: bernadette.bouchon-meunier@lip6.fr.

Ronald R. Yager, Email: yager@panix.com

Chenyi Hu, Email: chu@uca.edu.

References

1.Bentkowska U. New types of aggregation functions for interval-valued fuzzy setting and preservation of pos-B and nec-B-transitivity in decision making problems. Inf. Sci. 2018;424(C):385–399. doi: 10.1016/j.ins.2017.10.025. [DOI] [Google Scholar]
2.Billard L, Diday E. Regression analysis for interval-valued data. In: Kiers HAL, Rasson JP, Groenen PJF, Schader M, editors. Data Analysis, Classification, and Related Methods. Heidelberg: Springer; 2000. [Google Scholar]
3.Dai J, Wang W, Mi J. Uncertainty measurement for interval-valued information systems. Inf. Sci. 2013;251:63–78. doi: 10.1016/j.ins.2013.06.047. [DOI] [Google Scholar]
4.Gioia F, Lauro C. Basic statistical methods for interval data. Statistica Applicata. 2005;17(1):75–104. [Google Scholar]
5.Grabisch M, Marichal J, Mesiar R, Pap E. Aggregation Functions. New York: Cambridge University Press; 2009. [Google Scholar]
6.He L, Hu C. Midpoint method and accuracy of variability forecasting. J. Empir. Econ. 2009;38:705–715. doi: 10.1007/s00181-009-0286-6. [DOI] [Google Scholar]
7.He L, Hu C. Impacts of interval computing on stock market forecasting. J. Comput. Econ. 2009;33(3):263–276. doi: 10.1007/s10614-008-9159-x. [DOI] [Google Scholar]
8.Hu C, et al. Knowledge Processing with Interval and Soft Computing. London: Springer; 2008. [Google Scholar]
9.Hu C, He L. An application of interval methods to stock market forecasting. J. Reliable Comput. 2007;13:423–434. doi: 10.1007/s11155-007-9039-4. [DOI] [Google Scholar]
10.Hu, C.: Using interval function approximation to estimate uncertainty. In: Interval/Probabilistic Uncertainty and Non-Classical Logics, pp. 341–352 (2008). 10.1007/978-3-540-77664-2_26
11.Hu C. A note on probabilistic confidence of the stock market ILS interval forecasts. J. Risk Finance. 2010;11(4):410–415. doi: 10.1108/15265941011071539. [DOI] [Google Scholar]
12.Hu, C., and Hu, Z.: A computational study on the entropy of interval-valued datasets from the stock market. In: Lesot, M.-J., et al. (eds.) The Proceedings of the 18th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2020), IPMU 2020, CCIS, vol. 1239, pp. 422–435. Springer (2020)
13.Huynh, V., Nakamori, Y., Hu, C., Kreinovich, V.: On decision making under interval uncertainty: a new justification of Hurwicz optimism-pessimism approach and its use in group decision making. In: 39th International Symposium on Multiple-Valued Logic, pp. 214–220 (2009)
14.IEEE Standard for Interval Arithmetic. IEEE Standards Association (2015). https://standards.ieee.org/standard/1788-2015.html
15.IEEE Standard for Interval Arithmetic (Simplified). IEEE Standards Association (2018). https://standards.ieee.org/standard/1788_1-2017.html
16.de Korvin A, Hu C, Chen P. Generating and applying rules for interval valued fuzzy observations. In: Yang ZR, Yin H, Everson RM, editors. Intelligent Data Engineering and Automated Learning – IDEAL 2004; Heidelberg: Springer; 2004. pp. 279–284. [Google Scholar]
17.Lodwick W-A, Jamison K-D. Interval-valued probability in the analysis of problems containing a mixture of possibilistic, probabilistic, and interval uncertainty. Fuzzy Sets Syst. 2008;159(21):2845–2858. doi: 10.1016/j.fss.2008.03.013. [DOI] [Google Scholar]
18.Moore RE. Methods and Applications of Interval Analysis. Philadelphia: SIAM Studies in Applied Mathematics; 1979. [Google Scholar]
19.Marupally, P., Paruchuri, V., Hu, C.: Bandwidth variability prediction with rolling interval least squares (RILS). In: Proceedings of the 50th ACM SE Conference, Tuscaloosa, AL, USA, 29–31 March 2012, pp. 209–213. ACM (2012). 10.1145/2184512.2184562
20.Nordin, B., Hu, C., Chen, B., Sheng, V.S.: Interval-valued centroids in K-means algorithms. In: Proceedings of the 11th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, pp. 478–481. IEEE (2012). 10.1109/ICMLA.2012.87
21.Pkala B. Uncertainty Data in Interval-Valued Fuzzy Set Theory: Properties, Algorithms and Applications. 1. Cham: Springer; 2018. [Google Scholar]
22.Rhodes, C., Lemon, J., Hu, C.: An interval-radial algorithm for hierarchical clustering analysis. In: 14th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, pp. 849–856. IEEE (2015)
23.Shannon C-E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]
24.Wikipedia: Information entropy. https://en.wikipedia.org/wiki/Entropy_(information_theory)
25.Wolfram Mathworld. Binary normal distribution. http://mathworld.wolfram.com/BivariateNormalDistribution.html

[CR1] 1.Bentkowska U. New types of aggregation functions for interval-valued fuzzy setting and preservation of pos-B and nec-B-transitivity in decision making problems. Inf. Sci. 2018;424(C):385–399. doi: 10.1016/j.ins.2017.10.025. [DOI] [Google Scholar]

[CR2] 2.Billard L, Diday E. Regression analysis for interval-valued data. In: Kiers HAL, Rasson JP, Groenen PJF, Schader M, editors. Data Analysis, Classification, and Related Methods. Heidelberg: Springer; 2000. [Google Scholar]

[CR3] 3.Dai J, Wang W, Mi J. Uncertainty measurement for interval-valued information systems. Inf. Sci. 2013;251:63–78. doi: 10.1016/j.ins.2013.06.047. [DOI] [Google Scholar]

[CR4] 4.Gioia F, Lauro C. Basic statistical methods for interval data. Statistica Applicata. 2005;17(1):75–104. [Google Scholar]

[CR5] 5.Grabisch M, Marichal J, Mesiar R, Pap E. Aggregation Functions. New York: Cambridge University Press; 2009. [Google Scholar]

[CR6] 6.He L, Hu C. Midpoint method and accuracy of variability forecasting. J. Empir. Econ. 2009;38:705–715. doi: 10.1007/s00181-009-0286-6. [DOI] [Google Scholar]

[CR7] 7.He L, Hu C. Impacts of interval computing on stock market forecasting. J. Comput. Econ. 2009;33(3):263–276. doi: 10.1007/s10614-008-9159-x. [DOI] [Google Scholar]

[CR8] 8.Hu C, et al. Knowledge Processing with Interval and Soft Computing. London: Springer; 2008. [Google Scholar]

[CR9] 9.Hu C, He L. An application of interval methods to stock market forecasting. J. Reliable Comput. 2007;13:423–434. doi: 10.1007/s11155-007-9039-4. [DOI] [Google Scholar]

[CR10] 10.Hu, C.: Using interval function approximation to estimate uncertainty. In: Interval/Probabilistic Uncertainty and Non-Classical Logics, pp. 341–352 (2008). 10.1007/978-3-540-77664-2_26

[CR11] 11.Hu C. A note on probabilistic confidence of the stock market ILS interval forecasts. J. Risk Finance. 2010;11(4):410–415. doi: 10.1108/15265941011071539. [DOI] [Google Scholar]

[CR12] 12.Hu, C., and Hu, Z.: A computational study on the entropy of interval-valued datasets from the stock market. In: Lesot, M.-J., et al. (eds.) The Proceedings of the 18th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems (IPMU 2020), IPMU 2020, CCIS, vol. 1239, pp. 422–435. Springer (2020)

[CR13] 13.Huynh, V., Nakamori, Y., Hu, C., Kreinovich, V.: On decision making under interval uncertainty: a new justification of Hurwicz optimism-pessimism approach and its use in group decision making. In: 39th International Symposium on Multiple-Valued Logic, pp. 214–220 (2009)

[CR14] 14.IEEE Standard for Interval Arithmetic. IEEE Standards Association (2015). https://standards.ieee.org/standard/1788-2015.html

[CR15] 15.IEEE Standard for Interval Arithmetic (Simplified). IEEE Standards Association (2018). https://standards.ieee.org/standard/1788_1-2017.html

[CR16] 16.de Korvin A, Hu C, Chen P. Generating and applying rules for interval valued fuzzy observations. In: Yang ZR, Yin H, Everson RM, editors. Intelligent Data Engineering and Automated Learning – IDEAL 2004; Heidelberg: Springer; 2004. pp. 279–284. [Google Scholar]

[CR17] 17.Lodwick W-A, Jamison K-D. Interval-valued probability in the analysis of problems containing a mixture of possibilistic, probabilistic, and interval uncertainty. Fuzzy Sets Syst. 2008;159(21):2845–2858. doi: 10.1016/j.fss.2008.03.013. [DOI] [Google Scholar]

[CR18] 18.Moore RE. Methods and Applications of Interval Analysis. Philadelphia: SIAM Studies in Applied Mathematics; 1979. [Google Scholar]

[CR19] 19.Marupally, P., Paruchuri, V., Hu, C.: Bandwidth variability prediction with rolling interval least squares (RILS). In: Proceedings of the 50th ACM SE Conference, Tuscaloosa, AL, USA, 29–31 March 2012, pp. 209–213. ACM (2012). 10.1145/2184512.2184562

[CR20] 20.Nordin, B., Hu, C., Chen, B., Sheng, V.S.: Interval-valued centroids in K-means algorithms. In: Proceedings of the 11th IEEE International Conference on Machine Learning and Applications (ICMLA), Boca Raton, FL, USA, pp. 478–481. IEEE (2012). 10.1109/ICMLA.2012.87

[CR21] 21.Pkala B. Uncertainty Data in Interval-Valued Fuzzy Set Theory: Properties, Algorithms and Applications. 1. Cham: Springer; 2018. [Google Scholar]

[CR22] 22.Rhodes, C., Lemon, J., Hu, C.: An interval-radial algorithm for hierarchical clustering analysis. In: 14th IEEE International Conference on Machine Learning and Applications (ICMLA), Miami, FL, USA, pp. 849–856. IEEE (2015)

[CR23] 23.Shannon C-E. A mathematical theory of communication. Bell Syst. Tech. J. 1948;27:379–423. doi: 10.1002/j.1538-7305.1948.tb01338.x. [DOI] [Google Scholar]

[CR24] 24.Wikipedia: Information entropy. https://en.wikipedia.org/wiki/Entropy_(information_theory)

[CR25] 25.Wolfram Mathworld. Binary normal distribution. http://mathworld.wolfram.com/BivariateNormalDistribution.html

PERMALINK

On Statistics, Probability, and Entropy of Interval-Valued Datasets

Chenyi Hu

Zhihui H Hu

Abstract

Introduction

Why Do We Study Interval-Valued Datasets?

The Objective of this Study

Basic Concepts and Notations

Example 1

Descriptive Statistics of an Interval-Valued Dataset

Positional Statistics of an Interval-Valued Dataset X

Definition 1

Corollary 1

Fig. 1.

Point-Valued Variational Statistics of an Interval-Valued Dataset

Definition 2

Definition 3

Probability Distributions of an Interval-Valued Population

On Probability Distribution of X with Distribution Information for Each

Definition 4

Theorem 1

Proof

Example 2

Corollary 2

Probability Distribution of an Interval-Valued X Without Distribution Information for Any

Information Entropy of Interval-Valued Datasets

Example 3

Summary and Future Work

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

On Statistics, Probability, and Entropy of Interval-Valued Datasets

Chenyi Hu

Zhihui H Hu

Abstract

Introduction

Why Do We Study Interval-Valued Datasets?

The Objective of this Study

Basic Concepts and Notations

Example 1

Descriptive Statistics of an Interval-Valued Dataset

Positional Statistics of an Interval-Valued Dataset X

Definition 1

Corollary 1

Fig. 1.

Point-Valued Variational Statistics of an Interval-Valued Dataset

Definition 2

Definition 3

Probability Distributions of an Interval-Valued Population

On Probability Distribution of X with Distribution Information for Each

Definition 4

Theorem 1

Proof

Example 2

Corollary 2

Probability Distribution of an Interval-Valued X Without Distribution Information for Any

Information Entropy of Interval-Valued Datasets

Example 3

Summary and Future Work

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases