Skip to main content
IEEE - PMC COVID-19 Collection logoLink to IEEE - PMC COVID-19 Collection
. 2020 Jun 8;8:110412–110424. doi: 10.1109/ACCESS.2020.3000860

Data Modeling With Polynomial Representations and Autoregressive Time-Series Representations, and Their Connections

Asoke K Nandi 1,
PMCID: PMC8043497  PMID: 34192105

Abstract

Two of the data modelling techniques - polynomial representation and time-series representation – are explored in this paper to establish their connections and differences. All theoretical studies are based on uniformly sampled data in the absence of noise. This paper proves that all data from an underlying polynomial model of finite degree Inline graphic can be represented perfectly by an autoregressive time-series model of order Inline graphic and a constant term Inline graphic as in equation (2). Furthermore, all polynomials of degree Inline graphic are shown to give rise to the same set of time-series coefficients of specific forms with the only possible difference being in the constant term Inline graphic. It is also demonstrated that time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. Six numerical explorations, with both generated data and real data, including the UK data and US data on the current Covid-19 incidence, are presented to support the theoretical findings. It is shown that all polynomials of degree Inline graphic can be represented by an all-pole filter with Inline graphic repeated roots (or poles) at Inline graphic. Theoretically, all noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series; if the values of polynomial coefficients are not of special interest in any data modelling, one may use time-series representations for data modelling.

Keywords: Data models, polynomials, autoregressive processes, time-series, signal representation, Covid-19

I. Introduction

Interests in data science have been growing extremely fast in the twenty-first century. As well as interests from many different subject areas, data science is being integrated in diverse range of industries and agencies (e.g., health, transport, energy, government, society, etc.). Strictly, a time-series refers to a series of data points ordered in time. It is very common that a time-series represents data points at equally separated in time. Of course, the analytics that are created for time-series data can generally be applied to a sequence of data that are equally separated in space (e.g., images) or some other domain. There are many types of time-series models, including autoregressive models.

Although there are many types of time-series models, the earliest and an alternative way to model data is by polynomial regression. Polynomial regression models are generally fitted with the Least-squares method to obtain estimated values of the polynomial coefficients. In 1805 Legendre published the Least-squares method [1] and Gauss published it in 1809 and later in 1823 [2]. In 1815 Gergonne wrote a paper on “The application of the method of least squares to the interpolation of sequences” [3]. This is an English translation by Stigler [4] of the original paper that was written in French. In the last 120 or so years, polynomial regression contributed greatly to the development of regression analysis [5][7].

Although there are other ways to model data, the focus in this paper is around polynomial representation and autoregressive time-series representation. There has been a lot of research in time-series data representation [8][12]. For example, the main goal of time-series analysis in econometrics, geophysics, meteorology, quantitative finance, seismology, and statistics is prediction or forecasting [13][20]. On the other hand, it is used for signal detection and estimation in communication engineering, control engineering, and signal processing [21][28]. It is also used for clustering, classification, and prediction or forecasting in data mining, machine learning, and pattern recognition [29][34]. Mathematical modelling and time-series analysis are fundamental to many fields; a couple of very recent examples can be found in [35], [36].

In polynomial representations, observed data is a function of time (or some other variable). This function, except for the case of a constant or a straight line, represents a non-linear relationship between the time (or some other variable) and the observed data, even though the parameters are linear. On the other hand, in autoregressive (AR) time-series representation, observed data is a linear function of some of the earlier data and thus the model is linear in both data and parameters. Although both are used for data modelling, there are some fundamental differences. Hence, this paper explores many questions around polynomial and autoregressive representations with a view to establish their connections and differences. Two of these questions are:

  • 1)

    Can all finite degree polynomials be expressed as finite order time series? If the answer is affirmative, what is the underlying relationship?

  • 2)

    Can all finite order autoregressive time-series be represented as finite order polynomials?

This study is in the context of real-valued and uniformly sampled noise-free data. The paper presents the following original results:

  • 1)

    All polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be represented as autoregressive time-series of order 1, order 2, and order 3, with a constant respectively. This is illustrated in section II.

  • 2)

    All polynomials of degree 3 can be represented by AR time-series with the set of coefficients with the same values but possibly with a different value for its constant term. This observation is also true for polynomials of degree 1 and of degree 2. This is presented in section II.

  • 3)

    All polynomials of finite degree Inline graphic can be represented as AR time-series of order Inline graphic and a constant. This can be found in section III.

  • 4)

    All polynomials of degree Inline graphic can be represented by AR time-series with one set of coefficients with the same values but possibly with a different value for its constant term. This is demonstrated in section III.

  • 5)

    The corresponding time-series coefficients are integers and of specific forms, which are derived in section III.

  • 6)

    Some numerical explorations from several sources of both generated data and real data, including some current Covid-19 incidence data from the UK and the US, are presented in section IV.

  • 7)

    Whilst all finite degree polynomials can be represented by finite order AR time-series, the converse is not true. There are infinitely many AR time-series of finite orders that cannot be represented by finite order polynomials. Furthermore, all finite order AR time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. This is shown in section V.

  • 8)

    Section VI shows that all polynomials of degree Inline graphic can be represented by an all-pole filter with Inline graphic repeated roots (or poles) at Inline graphic. Thus, any noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series.

II. Method – Small Degree Polynomial

Given a set of uniformly sampled real-valued data points in discrete time, these may be represented by a polynomial or a time-series. A polynomial of degree N in continuous time can take the following form

II.

For uniformly sampled discrete time, the continuous time, t, is represented as Inline graphic, where Inline graphic is an integer and Inline graphic is the sampling period. In this scenario, the above equation can be rewritten as

II.

On the other hand, an autoregressive time-series model of order Inline graphic, AR(Inline graphic), can be written as

II.

and may be used to represent the set of uniformly sampled data points in discrete time.

A. Linear Polynomial

In this subsection, an exploration of data representation by a linear polynomial and an AR time-series is carried out. For any linear polynomial, Inline graphic has the value of 1 in equation (1). It is easy to show from equation (1) that Inline graphic. By removing Inline graphic from indices, this can be written as Inline graphic. Comparing this with equation (2) for AR(q), it is clear that Inline graphic, Inline graphic, and Inline graphic.

Therefore, the following can be concluded:

  • Every linear polynomial, i.e., of degree 1, can be perfectly represented by an AR(1) time-series.

  • Every linear polynomial will have the same value of the coefficient in time-series, i.e., Inline graphic.

  • The constant term in the time-series is given by Inline graphic.

  • This implies that every linear polynomial with different values of Inline graphic but the same value of Inline graphic will have the identical AR(1) representation, i.e., with the same values of Inline graphic and Inline graphic.

B. Quadratic Polynomial

In this subsection, an exploration of data representation by a quadratic polynomial and an AR time-series is carried out. For any quadratic polynomial, Inline graphic has the value of 2 in equation (1). Thus, it follows from equation (1) that

B.

Using equations (3) and (4), one can write

B.

and, using equations (4) and (5), one can write

B.

Now, using equations (6) and (7), one finds

B.

Therefore,

B.

By removing Inline graphic from indices, equation (9) can be written as Inline graphic. Comparing this with equation (2) for AR(q), it is clear that Inline graphic, Inline graphic, and Inline graphic.

Therefore, the following can be concluded:

  • Every quadratic polynomial, i.e., of degree 2, can be perfectly represented by an AR(2) time-series.

  • Every quadratic polynomial will have the same coefficient values in time-series, i.e., Inline graphic and Inline graphic.

  • The constant term in the time-series is given by Inline graphic.

  • This implies that every quadratic polynomial with different values of Inline graphic and Inline graphic but the same value of Inline graphic will have the identical AR(2) representation, i.e., with the same values of Inline graphic, and Inline graphic.

C. Cubic Polynomial

In this subsection, an exploration of data representation by a cubic polynomial and an AR time-series is carried out. For any cubic polynomial, Inline graphic has the value of 3 in equation (1). Thus, it follows from equation (1) that

C.

Using equations (10) and (11), one can obtain

C.

and, using equations (11) and (12), one can obtain

C.

Now, using equations (14) and (15), one obtains

C.

Using equations (12) and (13), one can write

C.

Now, using equations (15) and (17), one can write

C.

Thus,

C.

Combining equations (16) and (19), one obtains

C.

Therefore,

C.

By removing Inline graphic from indices, this can be written as Inline graphic. This can be described by AR(Inline graphic), provided Inline graphic, Inline graphic, and Inline graphic.

Therefore, the following can be concluded:

  • Every cubic polynomial, i.e., of degree 3, can be perfectly represented by an AR(3) time-series.

  • All cubic polynomials will have the same coefficient values in time-series, i.e., Inline graphic, Inline graphic, and Inline graphic.

  • The constant term in the time-series is given by Inline graphic.

  • This implies that every cubic polynomial with different values of Inline graphic, Inline graphic, and Inline graphic but the same value of Inline graphic will have the identical AR(3) representation, i.e., with the same values of Inline graphic, Inline graphic and Inline graphic.

The summary of the exposition so far is that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as AR time-series of orders 1, 2, and 3 respectively. Furthermore, for each degree of polynomials all the time-series coefficients have predefined values and they are specific integers, while the constant term, Inline graphic, has a predefined form that depends on the coefficient of the leading degree of the polynomial, the degree of the polynomial, and the sampling period. These and more specific information can be found in Table 1 above.

TABLE 1. Information for Polynomials of Different Degrees.

Degree of polynomial Inline graphic AR parameters
Inline graphic Inline graphic Inline graphic Inline graphic
1 1 Inline graphic
2 2 −1 Inline graphic
3 3 −3 1 Inline graphic

III. Method – Any Finite Degree Polynomial

In section II it has been demonstrated that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as autoregressive time-series of orders 1, 2, and 3 respectively. In this section the exploration is generalised for all polynomials of every finite degree. In section II it was found that, for Inline graphic and 3, the degree of the polynomial and the corresponding order of the AR time-series order are identical. In the following, a discrete-time polynomial of degree Inline graphic of the form below is considered

III.

in seeking a corresponding autoregressive time-series model of order Inline graphic, AR(Inline graphic).

The time-series in equation (2) can be rewritten as

III.

Now it is conjectured that

III.

for Inline graphic. Using this conjecture and equation (21), the equation (22) can be written as

III.

In the above double summation, it is instructive and revealing to consider different values of Inline graphic separately.

A. Part I

For the particular case of Inline graphic, the right-hand side of equation (24) can be written as Inline graphic. The relation 0.154.6 on page 4 of [37], for Inline graphic and Inline graphic, can be adapted to

A.

Using this relation, for Inline graphic,

A.

Therefore, for the case of Inline graphic, the right-hand side of equation (24) is Inline graphic.

B. Part II

For the case of Inline graphic, the right-hand side of equation (24) can be written as

B.

Using equation (25), for Inline graphic, one can write

B.

Using equations (26) and (28) in equation (27), it is found that the right-hand side of equation (24), for the case of Inline graphic, is equal to Inline graphic.

C. Part III

Now the case of Inline graphic is considered. The right-hand side of equation (24) can be written as

C.

Using equations (26) and (28), in the previous expression for the right-hand side of equation (24), for the case of Inline graphic, the right-hand side is found to be equal to Inline graphic.

Similarly, for each value of Inline graphic up to Inline graphic, it can be shown that the right-hand side of equation is equal to Inline graphic. When Inline graphic, there is a term of the form Inline graphic. According to equation (28), which is valid if the top range of the summation is either larger than or equal to the power of Inline graphic plus one, i.e., Inline graphic, this term equates to zero.

However, when Inline graphic, there is a term of the form Inline graphic. For this term, equation (28) is not valid since the top range of the summation, i.e., Inline graphic, is neither larger than nor equal to the power of Inline graphic plus one, i.e., Inline graphic. To deal with the case of Inline graphic, the relation 0.154.4 on page 4 of [37], for Inline graphic and Inline graphic, can be adapted to

C.

Thus,

C.

Thus, for the case of Inline graphic, the right-hand side of equation (24) can be written as

C.

Using equation (26) the first term is Inline graphic. All the terms in the middle are zero by virtue of equation (28). Using equation (30) the last term is found to be Inline graphic, which is equal to Inline graphic.

Adding all the results for Inline graphic, one obtains

C.

Thus,

C.

Therefore, all noise-free data from uniformly sampled polynomials of finite degree Inline graphic can be perfectly represented by an autoregressive time-series model of order Inline graphic such that

C.

where

C.

and

C.

IV. Experiments

In this section some explorations are carried out for different types of data sources to illustrate a few themes. In reality, all real data have uncertainties; therefore, it is important to study sensitivities to degrees and types of uncertainties. Yet, in these explorations all generated data are error-free. Here the objectives are to underpin some theoretical results and to generate some intuitions from precise data and theoretical results, and not to get distracted into studying effects of noise interference. Two applications to real data, the current Covid-19 data from the UK and the US, are clearly not noise-free but are offered as real examples.

In the first four of these explorations, N data are generated. These are then modelled by polynomials as in equation (21) and time-series as in equation (2). When considering a polynomial of degree Inline graphic, the first Inline graphic data are used to evaluate the Inline graphic coefficients of this polynomial. This works as data are error-free. On the other hand, when considering a time-series of order Inline graphic, the first Inline graphic data are used to evaluate the Inline graphic coefficients of this time-series.

A. Case I

Here Inline graphic data are generated from a polynomial of degree 3,

A.

This is a finite degree polynomial with no steady state. For each value of the degree Inline graphic of the polynomial from Inline graphic, the first Inline graphic data are used to calculate the Inline graphic coefficients of the polynomial. Using these polynomial coefficients, the remaining Inline graphic data values are predicted; these are labelled as Inline graphic for Inline graphic. Similarly, for each value of the time-series order of Inline graphic from Inline graphic, the first Inline graphic data are used to calculate the Inline graphic coefficients of the time-series. Using these coefficients, the remaining Inline graphic data values are predicted; these are labelled as Inline graphic for Inline graphic.

For the same values of Inline graphic and Inline graphic, Inline graphic data values are predicted for polynomial and Inline graphic data values are predicted for time-series representations. To compare prediction errors from polynomial and time-series representations fairly, only those predictions, i.e., Inline graphic data values, common to both representations are used. Mean prediction errors are Inline graphic and Inline graphic for polynomial and time-series respectively. Also, the RMS prediction error (polynomial) is defined as

A.

while the RMS prediction error (time-series) is defined as

A.

The RMS prediction error (polynomial) is depicted in Figure 1a) as a function of Inline graphic, while the RMS prediction error (time-series) is shown in Figure 1b) as a function of Inline graphic. The prediction error at Inline graphic is (6.6 * 10−13 ± 2.7 * 10−12), while the prediction error at Inline graphic is (7.2 * 10−10 ± 1.6 * 10−9); both are extremely small. Figure 2a) shows the data versus the time index, while the Figure 2b) depicts the prediction errors versus the time index for Inline graphic (polynomial in red) and at Inline graphic (time-series in green). The results confirm that these data from a finite degree polynomial can be equally well described by both polynomial and time-series representations.

FIGURE 1.

FIGURE 1.

Data generated from a polynomial, Figure 1a) shows the RMS prediction error (polynomial) as a function of (Inline graphic). Figure 1b) presents the RMS prediction error (time-series) as a function of Inline graphic.

FIGURE 2.

FIGURE 2.

Data generated from a polynomial, Figure 2a) depicts the data versus the time index. Figure 2b) shows the prediction errors versus the time index for (Inline graphic (polynomial in red) and for Inline graphic (time-series in green).

B. Case II

Here Inline graphic data are generated from a sine wave

B.

This represents an infinite degree polynomial and has no steady state, but its values are bounded between −1 and +1. The procedures for calculating the Inline graphic coefficients of the polynomial and calculating the Inline graphic coefficients of the AR time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 3a) as a function of Inline graphic, while the RMS prediction error (time-series) is shown in Figure 3b) as a function of Inline graphic. The error at Inline graphic is (6.9 ± 3.8), while the error at Inline graphic is (2.5 * 10−18 ± 1.4 * 10−15). Also, the error at Inline graphic is (422 ± 614), while the error at Inline graphic is (1.3 * 10−16 ± 1.4 * 10−15). Figure 4a) shows the data versus the time index, while the Figure 4b) depicts the prediction errors versus the time index for Inline graphic (polynomial in red) and at Inline graphic (time-series in green). The results confirm that these data from a sine wave are extremely well described by a time-series representation of only order 2; there is a theoretical reason for this (see section V for an explanation). Also, this time series representation is far better than any finite degree polynomial representation.

FIGURE 3.

FIGURE 3.

Data generated from a sine wave, Figure 3a) presents the RMS prediction error (polynomial) as a function of (Inline graphic). Figure 3b) displays the RMS prediction error (time-series) as a function of Inline graphic.

FIGURE 4.

FIGURE 4.

Data generated from a sine wave, Figure 4a) shows the data versus the time index. Figure 4b) depicts the prediction errors versus the time index for (Inline graphic (polynomial in red) and for Inline graphic (time-series in green).

C. Case III

Here Inline graphic data are generated from a non-polynomial

C.

This represents an infinite degree polynomial and has no steady state. The procedures for calculating the Inline graphic coefficients of the polynomial and calculating the Inline graphic coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 5a) as a function of Inline graphic, while the RMS prediction error (time-series) is shown in Figure 5b) as a function of Inline graphic. The prediction error at Inline graphic is (6.6 * 106 ± 4.9 * 106), while the prediction error at Inline graphic is (−1.2 * 10−8 ± 8.7 * 10−8). Also, the prediction error at Inline graphic is (7.3 * 107 ± 8.9 * 107), while the prediction error at Inline graphic is (−1.2 * 10−8 ± 8.7 * 10−8). Figure 6a) shows the data versus the time index, while the Figure 6b) depicts the prediction errors versus the time index for Inline graphic (polynomial in red) and at Inline graphic (time-series in green). Thus, these data from a non-polynomial are significantly better described by a time-series representation of only order 4; the theoretical reason can be found in section V. Also, RMS (time-series) is many orders of magnitude smaller than RMS (any finite degree polynomial).

FIGURE 5.

FIGURE 5.

Data generated from a non-polynomial, Figure 5a) presents the RMS prediction error (polynomial) as a function of (Inline graphic). Figure 5b) displays the RMS prediction error (time-series) as a function of Inline graphic.

FIGURE 6.

FIGURE 6.

Data generated from a non-polynomial, Figure 6a) shows the data versus the time index. Figure 6b) depicts the prediction errors versus the time index for (Inline graphic (polynomial in red) and for Inline graphic (time-series in green).

D. Case IV

Here Inline graphic data are generated from an inverse polynomial

D.

This represents an infinite degree polynomial. It has neither a finite degree polynomial representation nor a finite order time-series representation. The procedures for calculating the Inline graphic coefficients of the polynomial and calculating the Inline graphic coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.

The RMS prediction error (polynomial) is depicted in Figure 7a) as a function of Inline graphic, while the RMS prediction error (time-series) is shown in Figure 7b) as a function of Inline graphic. The prediction error at Inline graphic is (8.2 ± 6.7), while the prediction error at Inline graphic is (−4.7 * 10−4 ± 1.7 * 10−4). Also, the prediction error at Inline graphic is (476 ± 617), while the prediction error at Inline graphic is (−1.1 * 10−7 ± 9.5 * 10−8). Figure 8a) shows the data versus the time index, while the Figure 8b) depicts the prediction errors versus the time index for Inline graphic (polynomial in red) and at Inline graphic (time-series in green). Results confirm that these data from an inverse polynomial are significantly better described by an AR time-series representation than a finite degree polynomial representation by several orders of magnitude in RMS.

FIGURE 7.

FIGURE 7.

Data generated from an inverse polynomial, Figure 7a) displays the RMS prediction error (polynomial) as a function of (Inline graphic). Figure 7b) presents the RMS prediction error (time-series) as a function of Inline graphic.

FIGURE 8.

FIGURE 8.

Data generated from an inverse polynomial, Figure 8a) shows the data versus the time index. Figure 8b) depicts the prediction errors versus the time index for (Inline graphic (polynomial in red) and for Inline graphic (time-series in green).

E. Case V

This is an example of using real data from a current global Covid-19 epidemic as it is unfolding. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the UK. This dataset is publicly available [38]. On 01 April 2020 there were 61 data (i.e., N = 61) covering the period from 31 January 2020 to 31 March 2020. Thus Inline graphic for Inline graphic represents the cumulative daily confirmed cases of Covid-19 infections in the UK.

Of these 61 data, the first 50 data are used for estimating the free parameters and the last 11 data are used for forecasting. For a polynomial of the degree Inline graphic, the first 50 data are used to estimate the Inline graphic coefficients of the polynomial using the Moore-Penrose inverse. By adopting the equation (21), the first 50 data can be described the matrix equation Inline graphic, where

E.

Thus, Inline graphic is a column vector of size Inline graphic, Inline graphic is a column vector of size Inline graphicx1, and Inline graphic is a matrix of size 50xInline graphic. Now,

E.

Using these estimated polynomial coefficients from the equation (32), all 61 data are calculated using

E.

where Inline graphic is a matrix of size 61x Inline graphic and Inline graphic, while Yp is a column vector of size Inline graphic and Inline graphic. Of course, Inline graphic came from the regression but Inline graphic are polynomial predictions for Inline graphic.

Similarly, for a time-series of order Inline graphic, the first 50 data are used to estimate the Inline graphic coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. Using the Inline graphic estimated coefficients, all 61 data are calculated, which are labelled as Inline graphic. Of course, out of these 61 values, Inline graphic came from the regression and Inline graphic are time-series predictions for Inline graphic.

It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at Inline graphic is 5142, while the RMS error at Inline graphic is 539. Clearly, the time-series representation is much more accurate. Also, the RMS error at Inline graphic is 1711, much smaller than at lower Inline graphic values, but it is still much larger than the one from the time-series representation. Figure 9a) depicts all 61 data values (Inline graphic) in blue, all 61 calculated values (yp) in red according to polynomial representation at Inline graphic [the first 50 values are from the fit and the last 11 values are predictions], as well as all 61 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 50 values are from the fit and the last 11 values are predictions]. To get a closer look at the predictions, Figure 9b) depicts the last 11 data values (Inline graphic) in blue, the 11 predicted values (yp) in red according to polynomial representation at Inline graphic, as well as the 11 predicted values (yt) in green according to autoregressive time-series representation of order 2.

FIGURE 9.

FIGURE 9.

Daily confirmed cases of Covid-19 infections in the UK, covering the period from 31 January 2020 to 31 March 2020, Figure 9a) depicts all 61 data values (y) in blue, all 61 calculated values (yp) in red according to polynomial representation at (Inline graphic [the first 50 values are from the fit and the last 11 values are predictions], as well as all 61 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 50 values are from the fit and the last 11 values are predictions]. Figure 9b) presents the last 11 data values (y) in blue, the 11 predicted values (yp) in red according to polynomial representation at (Inline graphic, as well as the 11 predicted values (yt) in green according to autoregressive time-series representation of order 2.

To get a better idea of the fit (and not the predictions) Figure 10 plots data values Inline graphic at Inline graphic in blue, the corresponding fitted values (yp) in red according to polynomial representation at Inline graphic, as well as the corresponding fitted values (yt) in green according to autoregressive time-series representation of order 2.

FIGURE 10.

FIGURE 10.

Cumulative daily confirmed cases of Covid-19 infections in the UK, covering the period from 31 January 2020 to 31 March 2020, Figure 10 plots data values (y) at Inline graphic in blue, the corresponding fitted values (yp) in red according to polynomial representation at Inline graphic, as well as the corresponding fitted values (yt) in green according to autoregressive time-series representation of order 2.

It is clear that the polynomial representation picks up the trend of the later data values, but it completely fails for the first half of the data values. On the other hand, this autoregressive time-series of order 2 picks up the trend over the whole range of the data values. The results confirm that the UK Covid-19 data are significantly better described by an AR time-series of order 2 (less RMS error) than a finite degree polynomial of degree 5 (and others).

F. Case VI

This is another example of using real data. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the US. This dataset is publicly available [39]. On 04 April 2020 there were 25 data (i.e., N = 25) covering the period from 10 March 2020 to 03 April 2020. Thus Inline graphic for Inline graphic represents the cumulative daily confirmed cases of Covid-19 infections in the US.

Of these 25 data, the first 15 data are used for estimating the free parameters and the last 10 data are used for forecasting. For a polynomial of the degree Inline graphic, the first 15 data are used to estimate the Inline graphic coefficients of the polynomial using the Moore-Penrose inverse in much the same way as for Case V above. Using these estimated polynomial coefficients, all 25 data are calculated in a similar manner to Case V. The YP is a column vector of size Inline graphic and Inline graphic. Inline graphic came from the regression but Inline graphic are polynomial predictions for Inline graphic.

Similarly, for a time-series of order Inline graphic, the first 15 data are used to estimate the Inline graphic coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. Using the Inline graphic estimated coefficients, all 25 data are calculated, i.e., Inline graphic. Of course, out of these 25 values, Inline graphic came from the regression and Inline graphic are time-series predictions for Inline graphic.

It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at Inline graphic is 15272, while the RMS error at Inline graphic is 6533. Figure 11a) depicts the 25 data values (Inline graphic) in blue, 25 calculated values (yp) in red according to polynomial representation at Inline graphic [the first 15 values are from the fit and the last 10 values are predictions], as well as 25 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 15 values are from the fit and the last 10 values are predictions]. To get a closer look at the predictions, Figure 11b) shows the last 10 data values (Inline graphic) in blue, the 10 predicted values (yp) in red according to polynomial representation at Inline graphic, as well as the 10 predicted values (yt) in green according to autoregressive time-series representation of order 3. RMS errors increase for other choices of Inline graphic values. Clearly, the time-series representation is much more accurate.

FIGURE 11.

FIGURE 11.

Cumulative daily confirmed cases of Covid-19 infections in the US, covering the period from 10 March 2020 to 03 April 2020, Figure 11a) depicts the 25 data values (y) in blue, 25 calculated values (yp) in red according to polynomial representation at Inline graphic [the first 15 values are from the fit and the last 10 values are predictions], as well as 25 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 15 values are from the fit and the last 10 values are predictions]. To get a closer look at the predictions, Figure 11b) shows the last 10 data values (y) in blue, the 10 predicted values (yp) in red from polynomial representation at Inline graphic, as well as the 10 predicted values (yt) in green from autoregressive time-series representation of order 3.

Looking for better results with a higher degree of polynomial, the RMS error at Inline graphic is found to be 34692, which is significantly larger than the value from time-series representation at Inline graphic. Figure 12a) and Figure 12b) for Inline graphic can be similarly described as Figure 11 for Inline graphic. The results confirm that these US Covid-19 data are significantly better described by an AR time-series of order 3 than a finite degree polynomial of degree 3 (and others).

FIGURE 12.

FIGURE 12.

Daily confirmed cases of Covid-19 infections in the US, covering the period from 10 March 2020 to 03 April 2020, Figure 12a) and Figure 12b) are for Inline graphic; otherwise, they can be similarly described as in Figure 11, except for a different value of (Inline graphic).

Table 2 provides a summary of these six cases. Data from a polynomial of finite degree can be represented equally well by a finite degree polynomial as well as a finite order time-series with specific integer coefficients, while data from other sources are represented significantly better by time-series representations. In many cases, finite order time-series can theoretically represent data from infinite order polynomials extremely well. Therefore, whenever the knowledge of the polynomial coefficients is not necessary in an application, one may choose to use time-series representation.

TABLE 2. Summary of Six Cases.

Data source Number of data Degree of polynomial Polynomial representation Time-series representation Comments
Case I Polynomial 35 3 At Inline graphic, RMS prediction error of 10−12. At Inline graphic, RMS prediction error of 10−9. Both offer very low prediction errors.
Case II Sine wave 35 Inline graphic At Inline graphic, RMS prediction error of 3.8. At Inline graphic, RMS prediction error of 10−15. Time-series offers significantly better prediction.
Case III Non-polynomial 35 Inline graphic At Inline graphic, RMS prediction error of 106. At Inline graphic, RMS prediction error of 10−8. Time-series offers significantly better prediction.
Case IV Inverse polynomial 35 Inline graphic At Inline graphic, RMS prediction error of 6.7. At Inline graphic, RMS prediction error of 10−8. Time-series offers significantly better prediction.
Case V UK Covid-19 data 61 Unknown At Inline graphic, RMS prediction error of 1711. At Inline graphic, RMS prediction error of 539. Time-series offers much better prediction.
Case VI US Covid-19 data 25 Unknown At Inline graphic, RMS prediction error of 15272. At Inline graphic, RMS prediction error of 6532. Time-series offers much better prediction.

V. Time-Series with Other Coefficients

It has been demonstrated in sections II and III that all data from polynomials of finite degree Inline graphic can be perfectly represented by a time-series of order Inline graphic, if Inline graphic is not zero. The coefficients of such time-series are always integers of a specified form. Below are demonstrated what time-series with other forms of coefficients (either non-integers or integers of different forms) represent.

The equation (2) is called non-homogeneous if Inline graphic in equation (2) is not zero [40], [41]. Then equation (2) can be combined with its equivalent form

V.

to obtain (by replacing Inline graphic)

V.

with Inline graphic and Inline graphic. This is a homogeneous equation. The corresponding characteristics polynomial has Inline graphic roots, i.e., Inline graphic. When these roots are distinct,

V.

On the other hand, when there are repeated roots, the solution is different. For only two repeated roots, e.g., Inline graphic,

V.

Each of these two solutions in equations (36) and (37) describes polynomials of infinite degrees. Hence, finite order time-series with other forms of coefficients (either non-integers or integers of different forms) represent polynomials of infinite degrees.

A. Example I

In case II above, Inline graphic data were generated from a sine wave

A.

Fitting Inline graphic to the first 4 data values, it was found that Inline graphic and Inline graphic. These give rise to the characteristic polynomial of Inline graphic. The two roots are given by Inline graphic and Inline graphic. As these two roots are distinct, the solution is given by Inline graphic. Since Inline graphic, Inline graphic. Also, since Inline graphic, Inline graphic. So,

A.

This demonstrates how a time-series of order 2 can represent perfectly this sine wave which is a polynomial of infinite degree.

B. Example II

Here Inline graphic data are generated from a non-polynomial

B.

Fitting Inline graphic to the first 8 data values, it is found that Inline graphic, and Inline graphic. These give rise to the characteristic polynomial of Inline graphic. The four roots are given by Inline graphic and Inline graphic. As these are three repeated roots, the solution is given by Inline graphic. Using the first 8 data values, it can be shown that Inline graphic, and Inline graphic. Therefore, Inline graphic.

This is yet another example of how a finite order time-series (in this case of order 4) can represent perfectly this polynomial of infinite degree.

VI. All-Pole Filters and Polynomials

In this section a connection between polynomials and all-pole filters is demonstrated.

A. Polynomials and All-Pole Filters

It is well known that AR time-series models can be realised with all-pole filters. It has already been proven in Section III that all polynomials of finite degree of Inline graphic can be represented by AR time-series of order Inline graphic (as in equation (2)). Using z-transform, equation (2) can be written as

A.

and it has been proven in Section III that

A.

for Inline graphic. So, the denominator polynomial can now be written as

A.

Therefore, all polynomials of finite degree Inline graphic map onto Inline graphic on the z-plane by its Inline graphic repeated roots.

B. Other Roots on the Unit Circle

All roots on the unit circle away from Inline graphic and Inline graphic are complex. For a real-valued time-series, complex roots come in complex conjugate pairs. Consider just one such pair for illustration, i.e.,

Inline graphic and Inline graphic. Thus, Inline graphic. Since Inline graphic is real-valued, either Inline graphic and Inline graphic, or Inline graphic and Inline graphic. Therefore, each pair of complex conjugates roots represent either a cosine or a sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are Inline graphic and Inline graphic.

C. Other Complex Conjugate Roots Not on the Unit Circle

Again, for a real-valued time-series, complex roots come in complex conjugate pairs, consider just one pair for illustration, i.e., Inline graphic and Inline graphic, with Inline graphic. In this case, Inline graphic. Since Inline graphic is real-valued, either Inline graphic and Inline graphic, or Inline graphic and Inline graphic. Therefore, each pair of complex conjugates roots represent either a damped cosine or a damped sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are Inline graphic and Inline graphic.

D. Real Roots Between −1 and +1

Let Inline graphic be the three distinct roots of the denominator polynomial. Then Inline graphic. This can be described by an AR time-series of order 3 rather than a polynomial of infinite degree.

On the other hand if Inline graphic be the three repeated roots of the denominator polynomial, i.e., Inline graphic. In that case Inline graphic. This is another example of a finite order AR time-series representing data that requires a polynomial of infinite degree.

The two lessons are:

  • 1)

    All polynomials of degree Inline graphic can be represented by an all-pole filter with Inline graphic repeated roots (or poles) at Inline graphic.

  • 2)

    Data representable by finite order all-pole filters, whether they are from finite degree or infinite degree polynomials, can be described by a finite order AR time-series.

VII. Conclusion

Two of the data modelling techniques are polynomial representation and time-series representation. In this paper, all theoretical studies to explore their connections and differences have been based on uniformly sampled data in the absence of errors. It has been proven that all data from an underlying polynomial model of finite degree Inline graphic as in equation (21) can be represented perfectly by either a polynomial of degree Inline graphic or an autoregressive time-series of order Inline graphic and a constant term. Also, it has been proven that all polynomials of degree Inline graphic can be described by the same set of time-series coefficients with the only possible difference being in the constant term Inline graphic as in equation (2). These time-series coefficients are integers of a specific form. It was also demonstrated that time-series with either non-integer coefficients or integer coefficients of not the specific form represent polynomials of infinite degree. Explorations, in four cases with generated data and in two cases with real data, demonstrated that, while finite degree polynomial and finite order time-series representations are equally good for data following finite degree polynomial forms, finite order autoregressive time-series representations offer significant advantages in modelling data from other sources. All polynomials of degree Inline graphic can be represented by an all-pole filter with Inline graphic repeated roots (or poles) at Inline graphic. Theoretically, all data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described by a finite order AR time-series. If the values of polynomial coefficients are not necessary in an application, one may choose to use finite order time-series representations as they are more general than finite degree polynomial representations.

Acknowledgment

The author acknowledges Dr C Liu for formatting the manuscript and Dr D A Nandi for supplying the US Covid-19 data.

Biography

graphic file with name nandi-3000860.gif

Asoke K. Nandi (Fellow, IEEE) received the Ph.D. degree in physics from the University of Cambridge (Trinity College), Cambridge, U.K.

He held academic positions at several universities, including the University of Oxford, U.K., Imperial College London, U.K., the University of Strathclyde, U.K., and the University of Liverpool, U.K., as well as the Finland Distinguished Professorship with Jyvaskyla University, Finland. In 2013, he moved to Brunel University London, U.K., to become the Chair and Head of electronics and computer engineering. He is a Distinguished Visiting Professor at Xi'an Jiaotong Univeristy, China, and an Adjunct Professor with the University of Calgary, Canada. In 1983, he co-discovered the three fundamental particles known as W+, W, and Z0 (with the UA1 Team at CERN), providing the evidence for the unification of the electromagnetic and weak forces, for which the Nobel Committee for Physics awarded the prize to his two team leaders for their decisive contributions, in 1984. He has made many fundamental, theoretical, and algorithmic contributions to many aspects of signal processing and machine learning. He has much expertise in Big and Heterogeneous Data. He has authored over 600 technical publications, including 240 journal articles as well as five books, entitled Automatic Modulation Recognition of Communications Signals (Springer, 1996), Blind Estimation Using Higher-Order Statistics (Springer, 1999), Automatic Modulation Classification: Principles, Algorithms, and Applications (Wiley, 2015), Integrative Cluster Analysis in Bioinformatics (Wiley, 2015), and Condition Monitoring With Vibration Signals: Compressive Sampling and Learning Algorithms for Rotating Machines (Wiley, 2020). The h-index of his publications is 75 (Google Scholar) and his ERDOS number is 2. His current research interests include signal processing and machine learning, with applications to communications, image segmentations, and biomedical data.

Prof. Nandi is a Fellow of the Royal Academy of Engineering, U.K., as well as seven other institutions. He was an IEEE EMBS Distinguished Lecturer from 2018 to 2019. Among the many awards, he received the Mountbatten Premium, the Division Award of the Electronics and Communications Division, Institution of Electrical Engineers, U.K., in 1998, the Water Arbitration Prize of the Institution of Mechanical Engineers, U.K., in 1999, the Glory of Bengal Award for his outstanding achievements in scientific research, in 2010, and the Institute of Electrical and Electronics Engineers (USA) Heinrich Hertz Award, in 2012.

References

  • [1].Legendre A. M., Nouvelles Méthodes Pour la Détermination Des Orbites Des Comètes (Sur la Méthode Des Moindres Quarrés), Paris, France: Chez Firmin DIDOT, 1805. [Google Scholar]
  • [2].Gauss C. F., Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Gottingen, Germany: Henricus Dieterich, 1823. [Google Scholar]
  • [3].Gergonne J. D., “The application of the method of least squares to the interpolation of sequences,” Historia Math., vol. 1, no. , pp. 439–447, Nov. 1974, doi: 10.1016/0315-0860(74)90034-2. [DOI] [Google Scholar]
  • [4].Stigler S. M., “Gergonne’s 1815 paper on the design and analysis of polynomial regression experiments,” Historia Math., vol. 1, no. 4, pp. 431–439, 1974, doi: 10.1016/0315-0860(74)90033-0. [DOI] [Google Scholar]
  • [5].Yule G. U., “On the theory of correlation,” J. Roy. Stat. Soc., vol. 60, no. 4, pp. 812–854, 1897, doi: 10.2307/2979746. [DOI] [Google Scholar]
  • [6].Pearson K., Yule G. U., Blanchard N., and Lee A., “The law of ancestral heredity,” Biometrika, vol. 2, no. 2, pp. 211–236, 1903, doi: 10.2307/2331683. [DOI] [Google Scholar]
  • [7].Fisher R. A., “The goodness of fit of regression formulae, and the distribution of regression coefficients,” J. Roy. Stat. Soc., vol. 85, no. 4, pp. 597–612, 1922, doi: 10.2307/2341124. [DOI] [Google Scholar]
  • [8].Wiener N., Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, MA, USA: MIT Press, 1949. [Google Scholar]
  • [9].Box G. and Jenkins G., Time Series Analysis: Forecasting and Control, Revised Edition. Oakland, CA, USA: Holden-Day, 1976. [Google Scholar]
  • [10].Hamilton J., Time Series Analysis. Princeton, NJ, USA: Princeton Univ. Press, 1994. [Google Scholar]
  • [11].Gershenfeld N., The Nature of Mathematical Modeling. Cambridge, U.K.: Cambridge Univ. Press, 2000. [Google Scholar]
  • [12].Woodward W. A., Gray H. L., and Elliott A. C., Applied Time Series Analysis. Boca Raton, FL, USA: CRC Press, 2012. [Google Scholar]
  • [13].Mandel J., The Statistical Analysis of Experimental Data. New York, NY, USA: Interscience, 1964. [Google Scholar]
  • [14].Falk M.et al. A First Course on Time Series Analysis: Examples With SAS. Accessed: Jun. 9, 2020. [Online]. Available: https://www.uni-wuerzburg.de/fileadmin/10040800/user_upload/time_series/the_book/2011-March-01-times.pdf [Google Scholar]
  • [15].Tsay R. S., Financial Time Series (Wiley StatsRef: Statistics Reference Online). Hoboken, NJ, USA: Wiley, 2014, pp. 1–23. [Google Scholar]
  • [16].Goutte C., Toft P., Rostrup E., Nielsen F. Å., and Hansen L. K., “On clustering fMRI time series,” NeuroImage, vol. 9, no. 3, pp. 298–310, 1999. [DOI] [PubMed] [Google Scholar]
  • [17].Mormann F., Andrzejak R. G., Elger C. E., and Lehnertz K., “Seizure prediction: The long and winding road,” Brain, vol. 130, no. 2, pp. 314–333, 2006. [DOI] [PubMed] [Google Scholar]
  • [18].Craddock J. M., “The analysis of meteorological time series for use in forecasting,” Statistician, vol. 15, no. 2, p. 167, 1965. [Google Scholar]
  • [19].Enders W., Applied Econometric Times Series, 4th ed. Hoboken, NJ, USA: Wiley, 2015. [Google Scholar]
  • [20].Nandi A. K., Roberts D. J., and Nandi A. K., “Prediction paradigm involving time series applied to total blood issues data from England,” Transfusion, vol. 60, no. 3, pp. 535–543, Mar. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Makhoul J., “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4, pp. 561–580, Apr. 1975. [Google Scholar]
  • [22].Hayes M. H., Statistical Digital Signal Processing and Modeling. Hoboken, NJ, USA: Wiley, 1996. [Google Scholar]
  • [23].Haykin S. O., Adaptive Filter Theory, 5th ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2013. [Google Scholar]
  • [24].Rappaport T. S., Wireless Communications: Principles and Practice, vol. 2. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [Google Scholar]
  • [25].Carter G. C., “Coherence and time delay estimation,” Proc. IEEE, vol. 75, no. 2, pp. 236–255, Feb. 1987. [Google Scholar]
  • [26].Zarzoso V. and Nandi A. K., “Noninvasive fetal electrocardiogram extraction: Blind separation versus adaptive noise cancellation,” IEEE Trans. Biomed. Eng., vol. 48, no. 1, pp. 12–18, 2001. [DOI] [PubMed] [Google Scholar]
  • [27].Varotsos P., Sarlis N. V., and Skordas E. S., Natural Time Analysis: The New View of Time: Precursory Seismic Electric Signals, Earthquakes and Other Complex Time Series. Secaucus, NJ, USA: Springer, 2011. [Google Scholar]
  • [28].Sakoe H. and Chiba S., “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-26, no. 1, pp. 43–49, Feb. 1978. [Google Scholar]
  • [29].Liao T. W., “Clustering of time series data—A survey,” Pattern Recognit., vol. 38, no. 11, pp. 1857–1874, Nov. 2005, doi: 10.1016/j.patcog.2005.01.025. [DOI] [Google Scholar]
  • [30].Aghabozorgi S., Shirkhorshidi A. S., and Wah T. Y., “Time-series clustering—A decade review,” Inf. Syst., vol. 53, pp. 16–38, Oct. 2015, doi: 10.1016/j.is.2015.04.007. [DOI] [Google Scholar]
  • [31].Keogh E. and Kasetty S., “On the need for time series data mining benchmarks: A survey and empirical demonstration,” Data Mining Knowl. Discovery, vol. 7, pp. 349–371, Oct. 2003, doi: 10.1023/A:1024988512476. [DOI] [Google Scholar]
  • [32].Fahim M. and Sillitti A., “Anomaly detection, analysis and prediction techniques in IoT environment: A systematic literature review,” IEEE Access, vol. 7, pp. 81664–81681, 2019, doi: 10.1109/ACCESS.2019.2921912. [DOI] [Google Scholar]
  • [33].Ali M., Alqahtani A., Jones M. W., and Xie X., “Clustering and classification for time series data in visual analytics: A survey,” IEEE Access, vol. 7, pp. 181314–181338, 2019, doi: 10.1109/ACCESS.2019.2958551. [DOI] [Google Scholar]
  • [34].BuHamra S., Smaoui N., and Gabr M., “The Box–Jenkins analysis and neural networks: Prediction and time series modelling,” Appl. Math. Model., vol. 27, no. 10, pp. 805–815, Oct. 2003. [Google Scholar]
  • [35].Wang X. and Wang C., “Time series data cleaning: A survey,” IEEE Access, vol. 8, pp. 1866–1881, 2020, doi: 10.1109/ACCESS.2019.2962152. [DOI] [Google Scholar]
  • [36].Zhong L., Mu L., Li J., Wang J., Yin Z., and Liu D., “Early prediction of the 2019 novel coronavirus outbreak in the mainland China based on simple mathematical model,” IEEE Access, vol. 8, pp. 51761–51769, 2020, doi: 10.1109/ACCESS.2020.2979599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [37].Gradshteyn I. S. and Ryzhik I. M., Tables of Integrals, Series, and Products, vol. 10, Jeffrey A., Ed. New York, NY, USA: Academic, 1980. [Google Scholar]
  • [38].UK Covid-19 Data. Accessed: Apr. 1, 2020. [Online]. Available: https://www.arcgis.com/home/item.html?id=e5fd11150d274bebaaf8fe2a7a2bda11
  • [39].US Covid-19 Data. Accessed: Apr. 4, 2020. [Online]. Available: https://www.kaggle.com/c/covid19-global-forecasting-week-3/data and https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
  • [40].Levy H. and Lessman F., Finite Difference Equations. New York, NY, USA: Dover, 1992. [Google Scholar]
  • [41].Kelly W. G. and Peterson A. C., Difference Equations: An Introduction With Applications. Amsterdam, The Netherlands: Elsevier, 2000. [Google Scholar]

Articles from Ieee Access are provided here courtesy of Institute of Electrical and Electronics Engineers

RESOURCES