Abstract
Two of the data modelling techniques - polynomial representation and time-series representation – are explored in this paper to establish their connections and differences. All theoretical studies are based on uniformly sampled data in the absence of noise. This paper proves that all data from an underlying polynomial model of finite degree
can be represented perfectly by an autoregressive time-series model of order
and a constant term
as in equation (2). Furthermore, all polynomials of degree
are shown to give rise to the same set of time-series coefficients of specific forms with the only possible difference being in the constant term
. It is also demonstrated that time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. Six numerical explorations, with both generated data and real data, including the UK data and US data on the current Covid-19 incidence, are presented to support the theoretical findings. It is shown that all polynomials of degree
can be represented by an all-pole filter with
repeated roots (or poles) at
. Theoretically, all noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series; if the values of polynomial coefficients are not of special interest in any data modelling, one may use time-series representations for data modelling.
Keywords: Data models, polynomials, autoregressive processes, time-series, signal representation, Covid-19
I. Introduction
Interests in data science have been growing extremely fast in the twenty-first century. As well as interests from many different subject areas, data science is being integrated in diverse range of industries and agencies (e.g., health, transport, energy, government, society, etc.). Strictly, a time-series refers to a series of data points ordered in time. It is very common that a time-series represents data points at equally separated in time. Of course, the analytics that are created for time-series data can generally be applied to a sequence of data that are equally separated in space (e.g., images) or some other domain. There are many types of time-series models, including autoregressive models.
Although there are many types of time-series models, the earliest and an alternative way to model data is by polynomial regression. Polynomial regression models are generally fitted with the Least-squares method to obtain estimated values of the polynomial coefficients. In 1805 Legendre published the Least-squares method [1] and Gauss published it in 1809 and later in 1823 [2]. In 1815 Gergonne wrote a paper on “The application of the method of least squares to the interpolation of sequences” [3]. This is an English translation by Stigler [4] of the original paper that was written in French. In the last 120 or so years, polynomial regression contributed greatly to the development of regression analysis [5]–[7].
Although there are other ways to model data, the focus in this paper is around polynomial representation and autoregressive time-series representation. There has been a lot of research in time-series data representation [8]–[12]. For example, the main goal of time-series analysis in econometrics, geophysics, meteorology, quantitative finance, seismology, and statistics is prediction or forecasting [13]–[20]. On the other hand, it is used for signal detection and estimation in communication engineering, control engineering, and signal processing [21]–[28]. It is also used for clustering, classification, and prediction or forecasting in data mining, machine learning, and pattern recognition [29]–[34]. Mathematical modelling and time-series analysis are fundamental to many fields; a couple of very recent examples can be found in [35], [36].
In polynomial representations, observed data is a function of time (or some other variable). This function, except for the case of a constant or a straight line, represents a non-linear relationship between the time (or some other variable) and the observed data, even though the parameters are linear. On the other hand, in autoregressive (AR) time-series representation, observed data is a linear function of some of the earlier data and thus the model is linear in both data and parameters. Although both are used for data modelling, there are some fundamental differences. Hence, this paper explores many questions around polynomial and autoregressive representations with a view to establish their connections and differences. Two of these questions are:
-
1)
Can all finite degree polynomials be expressed as finite order time series? If the answer is affirmative, what is the underlying relationship?
-
2)
Can all finite order autoregressive time-series be represented as finite order polynomials?
This study is in the context of real-valued and uniformly sampled noise-free data. The paper presents the following original results:
-
1)
All polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be represented as autoregressive time-series of order 1, order 2, and order 3, with a constant respectively. This is illustrated in section II.
-
2)
All polynomials of degree 3 can be represented by AR time-series with the set of coefficients with the same values but possibly with a different value for its constant term. This observation is also true for polynomials of degree 1 and of degree 2. This is presented in section II.
-
3)
All polynomials of finite degree
can be represented as AR time-series of order
and a constant. This can be found in section III. -
4)
All polynomials of degree
can be represented by AR time-series with one set of coefficients with the same values but possibly with a different value for its constant term. This is demonstrated in section III. -
5)
The corresponding time-series coefficients are integers and of specific forms, which are derived in section III.
-
6)
Some numerical explorations from several sources of both generated data and real data, including some current Covid-19 incidence data from the UK and the US, are presented in section IV.
-
7)
Whilst all finite degree polynomials can be represented by finite order AR time-series, the converse is not true. There are infinitely many AR time-series of finite orders that cannot be represented by finite order polynomials. Furthermore, all finite order AR time-series with either non-integer coefficients or integer coefficients not of the aforementioned specific forms represent polynomials of infinite degree. This is shown in section V.
-
8)
Section VI shows that all polynomials of degree
can be represented by an all-pole filter with
repeated roots (or poles) at
. Thus, any noise-free data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described exactly by a finite order AR time-series.
II. Method – Small Degree Polynomial
Given a set of uniformly sampled real-valued data points in discrete time, these may be represented by a polynomial or a time-series. A polynomial of degree N in continuous time can take the following form
![]() |
For uniformly sampled discrete time, the continuous time, t, is represented as
, where
is an integer and
is the sampling period. In this scenario, the above equation can be rewritten as
![]() |
On the other hand, an autoregressive time-series model of order
, AR(
), can be written as
![]() |
and may be used to represent the set of uniformly sampled data points in discrete time.
A. Linear Polynomial
In this subsection, an exploration of data representation by a linear polynomial and an AR time-series is carried out. For any linear polynomial,
has the value of 1 in equation (1). It is easy to show from equation (1) that
. By removing
from indices, this can be written as
. Comparing this with equation (2) for AR(q), it is clear that
,
, and
.
Therefore, the following can be concluded:
-
•
Every linear polynomial, i.e., of degree 1, can be perfectly represented by an AR(1) time-series.
-
•
Every linear polynomial will have the same value of the coefficient in time-series, i.e.,
. -
•
The constant term in the time-series is given by
. -
•
This implies that every linear polynomial with different values of
but the same value of
will have the identical AR(1) representation, i.e., with the same values of
and
.
B. Quadratic Polynomial
In this subsection, an exploration of data representation by a quadratic polynomial and an AR time-series is carried out. For any quadratic polynomial,
has the value of 2 in equation (1). Thus, it follows from equation (1) that
![]() |
Using equations (3) and (4), one can write
![]() |
and, using equations (4) and (5), one can write
![]() |
Now, using equations (6) and (7), one finds
![]() |
Therefore,
![]() |
By removing
from indices, equation (9) can be written as
. Comparing this with equation (2) for AR(q), it is clear that
,
, and
.
Therefore, the following can be concluded:
-
•
Every quadratic polynomial, i.e., of degree 2, can be perfectly represented by an AR(2) time-series.
-
•
Every quadratic polynomial will have the same coefficient values in time-series, i.e.,
and
. -
•
The constant term in the time-series is given by
. -
•
This implies that every quadratic polynomial with different values of
and
but the same value of
will have the identical AR(2) representation, i.e., with the same values of
, and
.
C. Cubic Polynomial
In this subsection, an exploration of data representation by a cubic polynomial and an AR time-series is carried out. For any cubic polynomial,
has the value of 3 in equation (1). Thus, it follows from equation (1) that
![]() |
Using equations (10) and (11), one can obtain
![]() |
and, using equations (11) and (12), one can obtain
![]() |
Now, using equations (14) and (15), one obtains
![]() |
Using equations (12) and (13), one can write
![]() |
Now, using equations (15) and (17), one can write
![]() |
Thus,
![]() |
Combining equations (16) and (19), one obtains
![]() |
Therefore,
![]() |
By removing
from indices, this can be written as
. This can be described by AR(
), provided
,
, and
.
Therefore, the following can be concluded:
-
•
Every cubic polynomial, i.e., of degree 3, can be perfectly represented by an AR(3) time-series.
-
•
All cubic polynomials will have the same coefficient values in time-series, i.e.,
,
, and
. -
•
The constant term in the time-series is given by
. -
•
This implies that every cubic polynomial with different values of
,
, and
but the same value of
will have the identical AR(3) representation, i.e., with the same values of
,
and
.
The summary of the exposition so far is that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as AR time-series of orders 1, 2, and 3 respectively. Furthermore, for each degree of polynomials all the time-series coefficients have predefined values and they are specific integers, while the constant term,
, has a predefined form that depends on the coefficient of the leading degree of the polynomial, the degree of the polynomial, and the sampling period. These and more specific information can be found in Table 1 above.
TABLE 1. Information for Polynomials of Different Degrees.
Degree of polynomial
|
AR parameters | |||
|---|---|---|---|---|
![]() |
![]() |
![]() |
![]() |
|
| 1 | 1 | ![]() |
||
| 2 | 2 | −1 | ![]() |
|
| 3 | 3 | −3 | 1 | ![]() |
III. Method – Any Finite Degree Polynomial
In section II it has been demonstrated that all polynomials of degree 1 (linear), of degree 2 (quadratic), and of degree 3 (cubic) can be perfectly represented as autoregressive time-series of orders 1, 2, and 3 respectively. In this section the exploration is generalised for all polynomials of every finite degree. In section II it was found that, for
and 3, the degree of the polynomial and the corresponding order of the AR time-series order are identical. In the following, a discrete-time polynomial of degree
of the form below is considered
![]() |
in seeking a corresponding autoregressive time-series model of order
, AR(
).
The time-series in equation (2) can be rewritten as
![]() |
Now it is conjectured that
![]() |
for
. Using this conjecture and equation (21), the equation (22) can be written as
![]() |
In the above double summation, it is instructive and revealing to consider different values of
separately.
A. Part I
For the particular case of
, the right-hand side of equation (24) can be written as
. The relation 0.154.6 on page 4 of [37], for
and
, can be adapted to
![]() |
Using this relation, for
,
![]() |
Therefore, for the case of
, the right-hand side of equation (24) is
.
B. Part II
For the case of
, the right-hand side of equation (24) can be written as
![]() |
Using equation (25), for
, one can write
![]() |
Using equations (26) and (28) in equation (27), it is found that the right-hand side of equation (24), for the case of
, is equal to
.
C. Part III
Now the case of
is considered. The right-hand side of equation (24) can be written as
![]() |
Using equations (26) and (28), in the previous expression for the right-hand side of equation (24), for the case of
, the right-hand side is found to be equal to
.
Similarly, for each value of
up to
, it can be shown that the right-hand side of equation is equal to
. When
, there is a term of the form
. According to equation (28), which is valid if the top range of the summation is either larger than or equal to the power of
plus one, i.e.,
, this term equates to zero.
However, when
, there is a term of the form
. For this term, equation (28) is not valid since the top range of the summation, i.e.,
, is neither larger than nor equal to the power of
plus one, i.e.,
. To deal with the case of
, the relation 0.154.4 on page 4 of [37], for
and
, can be adapted to
![]() |
Thus,
![]() |
Thus, for the case of
, the right-hand side of equation (24) can be written as
![]() |
Using equation (26) the first term is
. All the terms in the middle are zero by virtue of equation (28). Using equation (30) the last term is found to be
, which is equal to
.
Adding all the results for
, one obtains
![]() |
Thus,
![]() |
Therefore, all noise-free data from uniformly sampled polynomials of finite degree
can be perfectly represented by an autoregressive time-series model of order
such that
![]() |
where
![]() |
and
![]() |
IV. Experiments
In this section some explorations are carried out for different types of data sources to illustrate a few themes. In reality, all real data have uncertainties; therefore, it is important to study sensitivities to degrees and types of uncertainties. Yet, in these explorations all generated data are error-free. Here the objectives are to underpin some theoretical results and to generate some intuitions from precise data and theoretical results, and not to get distracted into studying effects of noise interference. Two applications to real data, the current Covid-19 data from the UK and the US, are clearly not noise-free but are offered as real examples.
In the first four of these explorations, N data are generated. These are then modelled by polynomials as in equation (21) and time-series as in equation (2). When considering a polynomial of degree
, the first
data are used to evaluate the
coefficients of this polynomial. This works as data are error-free. On the other hand, when considering a time-series of order
, the first
data are used to evaluate the
coefficients of this time-series.
A. Case I
Here
data are generated from a polynomial of degree 3,
![]() |
This is a finite degree polynomial with no steady state. For each value of the degree
of the polynomial from
, the first
data are used to calculate the
coefficients of the polynomial. Using these polynomial coefficients, the remaining
data values are predicted; these are labelled as
for
. Similarly, for each value of the time-series order of
from
, the first
data are used to calculate the
coefficients of the time-series. Using these coefficients, the remaining
data values are predicted; these are labelled as
for
.
For the same values of
and
,
data values are predicted for polynomial and
data values are predicted for time-series representations. To compare prediction errors from polynomial and time-series representations fairly, only those predictions, i.e.,
data values, common to both representations are used. Mean prediction errors are
and
for polynomial and time-series respectively. Also, the RMS prediction error (polynomial) is defined as
![]() |
while the RMS prediction error (time-series) is defined as
![]() |
The RMS prediction error (polynomial) is depicted in Figure 1a) as a function of
, while the RMS prediction error (time-series) is shown in Figure 1b) as a function of
. The prediction error at
is (6.6 * 10−13 ± 2.7 * 10−12), while the prediction error at
is (7.2 * 10−10 ± 1.6 * 10−9); both are extremely small. Figure 2a) shows the data versus the time index, while the Figure 2b) depicts the prediction errors versus the time index for
(polynomial in red) and at
(time-series in green). The results confirm that these data from a finite degree polynomial can be equally well described by both polynomial and time-series representations.
FIGURE 1.
Data generated from a polynomial, Figure 1a) shows the RMS prediction error (polynomial) as a function of (
). Figure 1b) presents the RMS prediction error (time-series) as a function of
.
FIGURE 2.
Data generated from a polynomial, Figure 2a) depicts the data versus the time index. Figure 2b) shows the prediction errors versus the time index for (
(polynomial in red) and for
(time-series in green).
B. Case II
Here
data are generated from a sine wave
![]() |
This represents an infinite degree polynomial and has no steady state, but its values are bounded between −1 and +1. The procedures for calculating the
coefficients of the polynomial and calculating the
coefficients of the AR time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.
The RMS prediction error (polynomial) is depicted in Figure 3a) as a function of
, while the RMS prediction error (time-series) is shown in Figure 3b) as a function of
. The error at
is (6.9 ± 3.8), while the error at
is (2.5 * 10−18 ± 1.4 * 10−15). Also, the error at
is (422 ± 614), while the error at
is (1.3 * 10−16 ± 1.4 * 10−15). Figure 4a) shows the data versus the time index, while the Figure 4b) depicts the prediction errors versus the time index for
(polynomial in red) and at
(time-series in green). The results confirm that these data from a sine wave are extremely well described by a time-series representation of only order 2; there is a theoretical reason for this (see section V for an explanation). Also, this time series representation is far better than any finite degree polynomial representation.
FIGURE 3.
Data generated from a sine wave, Figure 3a) presents the RMS prediction error (polynomial) as a function of (
). Figure 3b) displays the RMS prediction error (time-series) as a function of
.
FIGURE 4.
Data generated from a sine wave, Figure 4a) shows the data versus the time index. Figure 4b) depicts the prediction errors versus the time index for (
(polynomial in red) and for
(time-series in green).
C. Case III
Here
data are generated from a non-polynomial
![]() |
This represents an infinite degree polynomial and has no steady state. The procedures for calculating the
coefficients of the polynomial and calculating the
coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.
The RMS prediction error (polynomial) is depicted in Figure 5a) as a function of
, while the RMS prediction error (time-series) is shown in Figure 5b) as a function of
. The prediction error at
is (6.6 * 106 ± 4.9 * 106), while the prediction error at
is (−1.2 * 10−8 ± 8.7 * 10−8). Also, the prediction error at
is (7.3 * 107 ± 8.9 * 107), while the prediction error at
is (−1.2 * 10−8 ± 8.7 * 10−8). Figure 6a) shows the data versus the time index, while the Figure 6b) depicts the prediction errors versus the time index for
(polynomial in red) and at
(time-series in green). Thus, these data from a non-polynomial are significantly better described by a time-series representation of only order 4; the theoretical reason can be found in section V. Also, RMS (time-series) is many orders of magnitude smaller than RMS (any finite degree polynomial).
FIGURE 5.
Data generated from a non-polynomial, Figure 5a) presents the RMS prediction error (polynomial) as a function of (
). Figure 5b) displays the RMS prediction error (time-series) as a function of
.
FIGURE 6.
Data generated from a non-polynomial, Figure 6a) shows the data versus the time index. Figure 6b) depicts the prediction errors versus the time index for (
(polynomial in red) and for
(time-series in green).
D. Case IV
Here
data are generated from an inverse polynomial
![]() |
This represents an infinite degree polynomial. It has neither a finite degree polynomial representation nor a finite order time-series representation. The procedures for calculating the
coefficients of the polynomial and calculating the
coefficients of the time-series are the same as described in Case I earlier. Also, the procedures for calculating the prediction error (polynomial) and the prediction error (times-series) have been described earlier in Case I.
The RMS prediction error (polynomial) is depicted in Figure 7a) as a function of
, while the RMS prediction error (time-series) is shown in Figure 7b) as a function of
. The prediction error at
is (8.2 ± 6.7), while the prediction error at
is (−4.7 * 10−4 ± 1.7 * 10−4). Also, the prediction error at
is (476 ± 617), while the prediction error at
is (−1.1 * 10−7 ± 9.5 * 10−8). Figure 8a) shows the data versus the time index, while the Figure 8b) depicts the prediction errors versus the time index for
(polynomial in red) and at
(time-series in green). Results confirm that these data from an inverse polynomial are significantly better described by an AR time-series representation than a finite degree polynomial representation by several orders of magnitude in RMS.
FIGURE 7.
Data generated from an inverse polynomial, Figure 7a) displays the RMS prediction error (polynomial) as a function of (
). Figure 7b) presents the RMS prediction error (time-series) as a function of
.
FIGURE 8.
Data generated from an inverse polynomial, Figure 8a) shows the data versus the time index. Figure 8b) depicts the prediction errors versus the time index for (
(polynomial in red) and for
(time-series in green).
E. Case V
This is an example of using real data from a current global Covid-19 epidemic as it is unfolding. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the UK. This dataset is publicly available [38]. On 01 April 2020 there were 61 data (i.e., N = 61) covering the period from 31 January 2020 to 31 March 2020. Thus
for
represents the cumulative daily confirmed cases of Covid-19 infections in the UK.
Of these 61 data, the first 50 data are used for estimating the free parameters and the last 11 data are used for forecasting. For a polynomial of the degree
, the first 50 data are used to estimate the
coefficients of the polynomial using the Moore-Penrose inverse. By adopting the equation (21), the first 50 data can be described the matrix equation
, where
![]() |
Thus,
is a column vector of size
,
is a column vector of size
x1, and
is a matrix of size 50x
. Now,
![]() |
Using these estimated polynomial coefficients from the equation (32), all 61 data are calculated using
![]() |
where
is a matrix of size 61x
and
, while Yp is a column vector of size
and
. Of course,
came from the regression but
are polynomial predictions for
.
Similarly, for a time-series of order
, the first 50 data are used to estimate the
coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. Using the
estimated coefficients, all 61 data are calculated, which are labelled as
. Of course, out of these 61 values,
came from the regression and
are time-series predictions for
.
It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at
is 5142, while the RMS error at
is 539. Clearly, the time-series representation is much more accurate. Also, the RMS error at
is 1711, much smaller than at lower
values, but it is still much larger than the one from the time-series representation. Figure 9a) depicts all 61 data values (
) in blue, all 61 calculated values (yp) in red according to polynomial representation at
[the first 50 values are from the fit and the last 11 values are predictions], as well as all 61 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 50 values are from the fit and the last 11 values are predictions]. To get a closer look at the predictions, Figure 9b) depicts the last 11 data values (
) in blue, the 11 predicted values (yp) in red according to polynomial representation at
, as well as the 11 predicted values (yt) in green according to autoregressive time-series representation of order 2.
FIGURE 9.
Daily confirmed cases of Covid-19 infections in the UK, covering the period from 31 January 2020 to 31 March 2020, Figure 9a) depicts all 61 data values (y) in blue, all 61 calculated values (yp) in red according to polynomial representation at (
[the first 50 values are from the fit and the last 11 values are predictions], as well as all 61 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 50 values are from the fit and the last 11 values are predictions]. Figure 9b) presents the last 11 data values (y) in blue, the 11 predicted values (yp) in red according to polynomial representation at (
, as well as the 11 predicted values (yt) in green according to autoregressive time-series representation of order 2.
To get a better idea of the fit (and not the predictions) Figure 10 plots data values
at
in blue, the corresponding fitted values (yp) in red according to polynomial representation at
, as well as the corresponding fitted values (yt) in green according to autoregressive time-series representation of order 2.
FIGURE 10.
Cumulative daily confirmed cases of Covid-19 infections in the UK, covering the period from 31 January 2020 to 31 March 2020, Figure 10 plots data values (y) at
in blue, the corresponding fitted values (yp) in red according to polynomial representation at
, as well as the corresponding fitted values (yt) in green according to autoregressive time-series representation of order 2.
It is clear that the polynomial representation picks up the trend of the later data values, but it completely fails for the first half of the data values. On the other hand, this autoregressive time-series of order 2 picks up the trend over the whole range of the data values. The results confirm that the UK Covid-19 data are significantly better described by an AR time-series of order 2 (less RMS error) than a finite degree polynomial of degree 5 (and others).
F. Case VI
This is another example of using real data. The dataset represents cumulative daily confirmed cases of Covid-19 infections in the US. This dataset is publicly available [39]. On 04 April 2020 there were 25 data (i.e., N = 25) covering the period from 10 March 2020 to 03 April 2020. Thus
for
represents the cumulative daily confirmed cases of Covid-19 infections in the US.
Of these 25 data, the first 15 data are used for estimating the free parameters and the last 10 data are used for forecasting. For a polynomial of the degree
, the first 15 data are used to estimate the
coefficients of the polynomial using the Moore-Penrose inverse in much the same way as for Case V above. Using these estimated polynomial coefficients, all 25 data are calculated in a similar manner to Case V. The YP is a column vector of size
and
.
came from the regression but
are polynomial predictions for
.
Similarly, for a time-series of order
, the first 15 data are used to estimate the
coefficients of the time-series. Each of these data values depends on the coefficients and earlier data values. As all data values are error prone, the Total Least Squares, which takes account of errors in both the dependent and independent variables, is more appropriate than the ordinary Least Squares, which takes account of only errors in dependent variables and not in the independent variables. Using the
estimated coefficients, all 25 data are calculated, i.e.,
. Of course, out of these 25 values,
came from the regression and
are time-series predictions for
.
It is not known a priori whether the data can be represented by a finite degree polynomial or a finite order time-series. The RMS error at
is 15272, while the RMS error at
is 6533. Figure 11a) depicts the 25 data values (
) in blue, 25 calculated values (yp) in red according to polynomial representation at
[the first 15 values are from the fit and the last 10 values are predictions], as well as 25 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 15 values are from the fit and the last 10 values are predictions]. To get a closer look at the predictions, Figure 11b) shows the last 10 data values (
) in blue, the 10 predicted values (yp) in red according to polynomial representation at
, as well as the 10 predicted values (yt) in green according to autoregressive time-series representation of order 3. RMS errors increase for other choices of
values. Clearly, the time-series representation is much more accurate.
FIGURE 11.
Cumulative daily confirmed cases of Covid-19 infections in the US, covering the period from 10 March 2020 to 03 April 2020, Figure 11a) depicts the 25 data values (y) in blue, 25 calculated values (yp) in red according to polynomial representation at
[the first 15 values are from the fit and the last 10 values are predictions], as well as 25 calculated values (yt) in green according to autoregressive time-series representation of order 2 [the first 15 values are from the fit and the last 10 values are predictions]. To get a closer look at the predictions, Figure 11b) shows the last 10 data values (y) in blue, the 10 predicted values (yp) in red from polynomial representation at
, as well as the 10 predicted values (yt) in green from autoregressive time-series representation of order 3.
Looking for better results with a higher degree of polynomial, the RMS error at
is found to be 34692, which is significantly larger than the value from time-series representation at
. Figure 12a) and Figure 12b) for
can be similarly described as Figure 11 for
. The results confirm that these US Covid-19 data are significantly better described by an AR time-series of order 3 than a finite degree polynomial of degree 3 (and others).
FIGURE 12.
Daily confirmed cases of Covid-19 infections in the US, covering the period from 10 March 2020 to 03 April 2020, Figure 12a) and Figure 12b) are for
; otherwise, they can be similarly described as in Figure 11, except for a different value of (
).
Table 2 provides a summary of these six cases. Data from a polynomial of finite degree can be represented equally well by a finite degree polynomial as well as a finite order time-series with specific integer coefficients, while data from other sources are represented significantly better by time-series representations. In many cases, finite order time-series can theoretically represent data from infinite order polynomials extremely well. Therefore, whenever the knowledge of the polynomial coefficients is not necessary in an application, one may choose to use time-series representation.
TABLE 2. Summary of Six Cases.
| Data source | Number of data | Degree of polynomial | Polynomial representation | Time-series representation | Comments |
|---|---|---|---|---|---|
| Case I Polynomial | 35 | 3 | At , RMS prediction error of 10−12. |
At , RMS prediction error of 10−9. |
Both offer very low prediction errors. |
| Case II Sine wave | 35 | ![]() |
At , RMS prediction error of 3.8. |
At , RMS prediction error of 10−15. |
Time-series offers significantly better prediction. |
| Case III Non-polynomial | 35 | ![]() |
At , RMS prediction error of 106. |
At , RMS prediction error of 10−8. |
Time-series offers significantly better prediction. |
| Case IV Inverse polynomial | 35 | ![]() |
At , RMS prediction error of 6.7. |
At , RMS prediction error of 10−8. |
Time-series offers significantly better prediction. |
| Case V UK Covid-19 data | 61 | Unknown | At , RMS prediction error of 1711. |
At , RMS prediction error of 539. |
Time-series offers much better prediction. |
| Case VI US Covid-19 data | 25 | Unknown | At , RMS prediction error of 15272. |
At , RMS prediction error of 6532. |
Time-series offers much better prediction. |
V. Time-Series with Other Coefficients
It has been demonstrated in sections II and III that all data from polynomials of finite degree
can be perfectly represented by a time-series of order
, if
is not zero. The coefficients of such time-series are always integers of a specified form. Below are demonstrated what time-series with other forms of coefficients (either non-integers or integers of different forms) represent.
The equation (2) is called non-homogeneous if
in equation (2) is not zero [40], [41]. Then equation (2) can be combined with its equivalent form
![]() |
to obtain (by replacing
)
![]() |
with
and
. This is a homogeneous equation. The corresponding characteristics polynomial has
roots, i.e.,
. When these roots are distinct,
![]() |
On the other hand, when there are repeated roots, the solution is different. For only two repeated roots, e.g.,
,
![]() |
Each of these two solutions in equations (36) and (37) describes polynomials of infinite degrees. Hence, finite order time-series with other forms of coefficients (either non-integers or integers of different forms) represent polynomials of infinite degrees.
A. Example I
In case II above,
data were generated from a sine wave
![]() |
Fitting
to the first 4 data values, it was found that
and
. These give rise to the characteristic polynomial of
. The two roots are given by
and
. As these two roots are distinct, the solution is given by
. Since
,
. Also, since
,
. So,
![]() |
This demonstrates how a time-series of order 2 can represent perfectly this sine wave which is a polynomial of infinite degree.
B. Example II
Here
data are generated from a non-polynomial
![]() |
Fitting
to the first 8 data values, it is found that
, and
. These give rise to the characteristic polynomial of
. The four roots are given by
and
. As these are three repeated roots, the solution is given by
. Using the first 8 data values, it can be shown that
, and
. Therefore,
.
This is yet another example of how a finite order time-series (in this case of order 4) can represent perfectly this polynomial of infinite degree.
VI. All-Pole Filters and Polynomials
In this section a connection between polynomials and all-pole filters is demonstrated.
A. Polynomials and All-Pole Filters
It is well known that AR time-series models can be realised with all-pole filters. It has already been proven in Section III that all polynomials of finite degree of
can be represented by AR time-series of order
(as in equation (2)). Using z-transform, equation (2) can be written as
![]() |
and it has been proven in Section III that
![]() |
for
. So, the denominator polynomial can now be written as
![]() |
Therefore, all polynomials of finite degree
map onto
on the z-plane by its
repeated roots.
B. Other Roots on the Unit Circle
All roots on the unit circle away from
and
are complex. For a real-valued time-series, complex roots come in complex conjugate pairs. Consider just one such pair for illustration, i.e.,
and
. Thus,
. Since
is real-valued, either
and
, or
and
. Therefore, each pair of complex conjugates roots represent either a cosine or a sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are
and
.
C. Other Complex Conjugate Roots Not on the Unit Circle
Again, for a real-valued time-series, complex roots come in complex conjugate pairs, consider just one pair for illustration, i.e.,
and
, with
. In this case,
. Since
is real-valued, either
and
, or
and
. Therefore, each pair of complex conjugates roots represent either a damped cosine or a damped sine, which can be described by an AR time-series of order 2 instead of a polynomial of infinite degree. The corresponding time-series coefficients are
and
.
D. Real Roots Between −1 and +1
Let
be the three distinct roots of the denominator polynomial. Then
. This can be described by an AR time-series of order 3 rather than a polynomial of infinite degree.
On the other hand if
be the three repeated roots of the denominator polynomial, i.e.,
. In that case
. This is another example of a finite order AR time-series representing data that requires a polynomial of infinite degree.
The two lessons are:
-
1)
All polynomials of degree
can be represented by an all-pole filter with
repeated roots (or poles) at
. -
2)
Data representable by finite order all-pole filters, whether they are from finite degree or infinite degree polynomials, can be described by a finite order AR time-series.
VII. Conclusion
Two of the data modelling techniques are polynomial representation and time-series representation. In this paper, all theoretical studies to explore their connections and differences have been based on uniformly sampled data in the absence of errors. It has been proven that all data from an underlying polynomial model of finite degree
as in equation (21) can be represented perfectly by either a polynomial of degree
or an autoregressive time-series of order
and a constant term. Also, it has been proven that all polynomials of degree
can be described by the same set of time-series coefficients with the only possible difference being in the constant term
as in equation (2). These time-series coefficients are integers of a specific form. It was also demonstrated that time-series with either non-integer coefficients or integer coefficients of not the specific form represent polynomials of infinite degree. Explorations, in four cases with generated data and in two cases with real data, demonstrated that, while finite degree polynomial and finite order time-series representations are equally good for data following finite degree polynomial forms, finite order autoregressive time-series representations offer significant advantages in modelling data from other sources. All polynomials of degree
can be represented by an all-pole filter with
repeated roots (or poles) at
. Theoretically, all data representable by a finite order all-pole filter, whether they come from finite degree or infinite degree polynomials, can be described by a finite order AR time-series. If the values of polynomial coefficients are not necessary in an application, one may choose to use finite order time-series representations as they are more general than finite degree polynomial representations.
Acknowledgment
The author acknowledges Dr C Liu for formatting the manuscript and Dr D A Nandi for supplying the US Covid-19 data.
Biography

Asoke K. Nandi (Fellow, IEEE) received the Ph.D. degree in physics from the University of Cambridge (Trinity College), Cambridge, U.K.
He held academic positions at several universities, including the University of Oxford, U.K., Imperial College London, U.K., the University of Strathclyde, U.K., and the University of Liverpool, U.K., as well as the Finland Distinguished Professorship with Jyvaskyla University, Finland. In 2013, he moved to Brunel University London, U.K., to become the Chair and Head of electronics and computer engineering. He is a Distinguished Visiting Professor at Xi'an Jiaotong Univeristy, China, and an Adjunct Professor with the University of Calgary, Canada. In 1983, he co-discovered the three fundamental particles known as W+, W−, and Z0 (with the UA1 Team at CERN), providing the evidence for the unification of the electromagnetic and weak forces, for which the Nobel Committee for Physics awarded the prize to his two team leaders for their decisive contributions, in 1984. He has made many fundamental, theoretical, and algorithmic contributions to many aspects of signal processing and machine learning. He has much expertise in Big and Heterogeneous Data. He has authored over 600 technical publications, including 240 journal articles as well as five books, entitled Automatic Modulation Recognition of Communications Signals (Springer, 1996), Blind Estimation Using Higher-Order Statistics (Springer, 1999), Automatic Modulation Classification: Principles, Algorithms, and Applications (Wiley, 2015), Integrative Cluster Analysis in Bioinformatics (Wiley, 2015), and Condition Monitoring With Vibration Signals: Compressive Sampling and Learning Algorithms for Rotating Machines (Wiley, 2020). The h-index of his publications is 75 (Google Scholar) and his ERDOS number is 2. His current research interests include signal processing and machine learning, with applications to communications, image segmentations, and biomedical data.
Prof. Nandi is a Fellow of the Royal Academy of Engineering, U.K., as well as seven other institutions. He was an IEEE EMBS Distinguished Lecturer from 2018 to 2019. Among the many awards, he received the Mountbatten Premium, the Division Award of the Electronics and Communications Division, Institution of Electrical Engineers, U.K., in 1998, the Water Arbitration Prize of the Institution of Mechanical Engineers, U.K., in 1999, the Glory of Bengal Award for his outstanding achievements in scientific research, in 2010, and the Institute of Electrical and Electronics Engineers (USA) Heinrich Hertz Award, in 2012.
References
- [1].Legendre A. M., Nouvelles Méthodes Pour la Détermination Des Orbites Des Comètes (Sur la Méthode Des Moindres Quarrés), Paris, France: Chez Firmin DIDOT, 1805. [Google Scholar]
- [2].Gauss C. F., Theoria Combinationis Observationum Erroribus Minimis Obnoxiae. Gottingen, Germany: Henricus Dieterich, 1823. [Google Scholar]
- [3].Gergonne J. D., “The application of the method of least squares to the interpolation of sequences,” Historia Math., vol. 1, no. , pp. 439–447, Nov. 1974, doi: 10.1016/0315-0860(74)90034-2. [DOI] [Google Scholar]
- [4].Stigler S. M., “Gergonne’s 1815 paper on the design and analysis of polynomial regression experiments,” Historia Math., vol. 1, no. 4, pp. 431–439, 1974, doi: 10.1016/0315-0860(74)90033-0. [DOI] [Google Scholar]
- [5].Yule G. U., “On the theory of correlation,” J. Roy. Stat. Soc., vol. 60, no. 4, pp. 812–854, 1897, doi: 10.2307/2979746. [DOI] [Google Scholar]
- [6].Pearson K., Yule G. U., Blanchard N., and Lee A., “The law of ancestral heredity,” Biometrika, vol. 2, no. 2, pp. 211–236, 1903, doi: 10.2307/2331683. [DOI] [Google Scholar]
- [7].Fisher R. A., “The goodness of fit of regression formulae, and the distribution of regression coefficients,” J. Roy. Stat. Soc., vol. 85, no. 4, pp. 597–612, 1922, doi: 10.2307/2341124. [DOI] [Google Scholar]
- [8].Wiener N., Extrapolation, Interpolation, and Smoothing of Stationary Time Series. Cambridge, MA, USA: MIT Press, 1949. [Google Scholar]
- [9].Box G. and Jenkins G., Time Series Analysis: Forecasting and Control, Revised Edition. Oakland, CA, USA: Holden-Day, 1976. [Google Scholar]
- [10].Hamilton J., Time Series Analysis. Princeton, NJ, USA: Princeton Univ. Press, 1994. [Google Scholar]
- [11].Gershenfeld N., The Nature of Mathematical Modeling. Cambridge, U.K.: Cambridge Univ. Press, 2000. [Google Scholar]
- [12].Woodward W. A., Gray H. L., and Elliott A. C., Applied Time Series Analysis. Boca Raton, FL, USA: CRC Press, 2012. [Google Scholar]
- [13].Mandel J., The Statistical Analysis of Experimental Data. New York, NY, USA: Interscience, 1964. [Google Scholar]
- [14].Falk M.et al. A First Course on Time Series Analysis: Examples With SAS. Accessed: Jun. 9, 2020. [Online]. Available: https://www.uni-wuerzburg.de/fileadmin/10040800/user_upload/time_series/the_book/2011-March-01-times.pdf [Google Scholar]
- [15].Tsay R. S., Financial Time Series (Wiley StatsRef: Statistics Reference Online). Hoboken, NJ, USA: Wiley, 2014, pp. 1–23. [Google Scholar]
- [16].Goutte C., Toft P., Rostrup E., Nielsen F. Å., and Hansen L. K., “On clustering fMRI time series,” NeuroImage, vol. 9, no. 3, pp. 298–310, 1999. [DOI] [PubMed] [Google Scholar]
- [17].Mormann F., Andrzejak R. G., Elger C. E., and Lehnertz K., “Seizure prediction: The long and winding road,” Brain, vol. 130, no. 2, pp. 314–333, 2006. [DOI] [PubMed] [Google Scholar]
- [18].Craddock J. M., “The analysis of meteorological time series for use in forecasting,” Statistician, vol. 15, no. 2, p. 167, 1965. [Google Scholar]
- [19].Enders W., Applied Econometric Times Series, 4th ed. Hoboken, NJ, USA: Wiley, 2015. [Google Scholar]
- [20].Nandi A. K., Roberts D. J., and Nandi A. K., “Prediction paradigm involving time series applied to total blood issues data from England,” Transfusion, vol. 60, no. 3, pp. 535–543, Mar. 2020. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Makhoul J., “Linear prediction: A tutorial review,” Proc. IEEE, vol. 63, no. 4, pp. 561–580, Apr. 1975. [Google Scholar]
- [22].Hayes M. H., Statistical Digital Signal Processing and Modeling. Hoboken, NJ, USA: Wiley, 1996. [Google Scholar]
- [23].Haykin S. O., Adaptive Filter Theory, 5th ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2013. [Google Scholar]
- [24].Rappaport T. S., Wireless Communications: Principles and Practice, vol. 2. Upper Saddle River, NJ, USA: Prentice-Hall, 1996. [Google Scholar]
- [25].Carter G. C., “Coherence and time delay estimation,” Proc. IEEE, vol. 75, no. 2, pp. 236–255, Feb. 1987. [Google Scholar]
- [26].Zarzoso V. and Nandi A. K., “Noninvasive fetal electrocardiogram extraction: Blind separation versus adaptive noise cancellation,” IEEE Trans. Biomed. Eng., vol. 48, no. 1, pp. 12–18, 2001. [DOI] [PubMed] [Google Scholar]
- [27].Varotsos P., Sarlis N. V., and Skordas E. S., Natural Time Analysis: The New View of Time: Precursory Seismic Electric Signals, Earthquakes and Other Complex Time Series. Secaucus, NJ, USA: Springer, 2011. [Google Scholar]
- [28].Sakoe H. and Chiba S., “Dynamic programming algorithm optimization for spoken word recognition,” IEEE Trans. Acoust., Speech, Signal Process., vol. ASSP-26, no. 1, pp. 43–49, Feb. 1978. [Google Scholar]
- [29].Liao T. W., “Clustering of time series data—A survey,” Pattern Recognit., vol. 38, no. 11, pp. 1857–1874, Nov. 2005, doi: 10.1016/j.patcog.2005.01.025. [DOI] [Google Scholar]
- [30].Aghabozorgi S., Shirkhorshidi A. S., and Wah T. Y., “Time-series clustering—A decade review,” Inf. Syst., vol. 53, pp. 16–38, Oct. 2015, doi: 10.1016/j.is.2015.04.007. [DOI] [Google Scholar]
- [31].Keogh E. and Kasetty S., “On the need for time series data mining benchmarks: A survey and empirical demonstration,” Data Mining Knowl. Discovery, vol. 7, pp. 349–371, Oct. 2003, doi: 10.1023/A:1024988512476. [DOI] [Google Scholar]
- [32].Fahim M. and Sillitti A., “Anomaly detection, analysis and prediction techniques in IoT environment: A systematic literature review,” IEEE Access, vol. 7, pp. 81664–81681, 2019, doi: 10.1109/ACCESS.2019.2921912. [DOI] [Google Scholar]
- [33].Ali M., Alqahtani A., Jones M. W., and Xie X., “Clustering and classification for time series data in visual analytics: A survey,” IEEE Access, vol. 7, pp. 181314–181338, 2019, doi: 10.1109/ACCESS.2019.2958551. [DOI] [Google Scholar]
- [34].BuHamra S., Smaoui N., and Gabr M., “The Box–Jenkins analysis and neural networks: Prediction and time series modelling,” Appl. Math. Model., vol. 27, no. 10, pp. 805–815, Oct. 2003. [Google Scholar]
- [35].Wang X. and Wang C., “Time series data cleaning: A survey,” IEEE Access, vol. 8, pp. 1866–1881, 2020, doi: 10.1109/ACCESS.2019.2962152. [DOI] [Google Scholar]
- [36].Zhong L., Mu L., Li J., Wang J., Yin Z., and Liu D., “Early prediction of the 2019 novel coronavirus outbreak in the mainland China based on simple mathematical model,” IEEE Access, vol. 8, pp. 51761–51769, 2020, doi: 10.1109/ACCESS.2020.2979599. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Gradshteyn I. S. and Ryzhik I. M., Tables of Integrals, Series, and Products, vol. 10, Jeffrey A., Ed. New York, NY, USA: Academic, 1980. [Google Scholar]
- [38].UK Covid-19 Data. Accessed: Apr. 1, 2020. [Online]. Available: https://www.arcgis.com/home/item.html?id=e5fd11150d274bebaaf8fe2a7a2bda11
- [39].US Covid-19 Data. Accessed: Apr. 4, 2020. [Online]. Available: https://www.kaggle.com/c/covid19-global-forecasting-week-3/data and https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
- [40].Levy H. and Lessman F., Finite Difference Equations. New York, NY, USA: Dover, 1992. [Google Scholar]
- [41].Kelly W. G. and Peterson A. C., Difference Equations: An Introduction With Applications. Amsterdam, The Netherlands: Elsevier, 2000. [Google Scholar]
























































































