Wind field reconstruction with adaptive random Fourier features

Jonas Kiessling; Emanuel Ström; Raúl Tempone

doi:10.1098/rspa.2021.0236

. 2021 Nov 17;477(2255):20210236. doi: 10.1098/rspa.2021.0236

Wind field reconstruction with adaptive random Fourier features

Jonas Kiessling ^1,², Emanuel Ström ^2,^✉, Raúl Tempone ^3,⁴

PMCID: PMC8596000 PMID: 35153592

Abstract

We investigate the use of spatial interpolation methods for reconstructing the horizontal near-surface wind field given a sparse set of measurements. In particular, random Fourier features is compared with a set of benchmark methods including kriging and inverse distance weighting. Random Fourier features is a linear model $β (x) = \sum_{k = 1}^{K} β_{k} e^{i ω_{k} x}$ approximating the velocity field, with randomly sampled frequencies $ω_{k}$ and amplitudes $β_{k}$ trained to minimize a loss function. We include a physically motivated divergence penalty $| \nabla \cdot β (x) |^{2}$ , as well as a penalty on the Sobolev norm of $β$ . We derive a bound on the generalization error and a sampling density that minimizes the bound. We then devise an adaptive Metropolis–Hastings algorithm for sampling the frequencies of the optimal distribution. In our experiments, our random Fourier features model outperforms the benchmark models.

Keywords: random Fourier features, Metropolis algorithm, spatial interpolation, machine learning, wind field reconstruction, flow field estimation

1. Introduction

An integral part in wind farm planning and weather prediction is access to high-quality wind data [1,2]. However, meteorological stations are often heterogeneously distributed with many gaps, and interpolation techniques have to be employed in order to increase spatial resolution [3]. For example, a crucial step in building wind farms is site prospecting, in which national or state-level measurements are used to estimate the expected aggregate yearly or monthly energy output [1]. A common approach is to approximate the probabilistic distribution of the wind speed over time with some parametric model, apply spatial interpolation to the parameters, calculate the energy output as a function of wind speed and then take the expected value. The work [4] lists approximately 200 papers written between 1940 and 2008 that focus on parametric models for time series of wind speed measurements. More modern approaches that directly interpolate wind speed also exist [3].

In this work, we focus on interpolation of the near-surface north-south and east-west horizontal wind components for short, 10-min time intervals. This effectively results in a reconstruction of the entire wind vector field, hence the name wind field reconstruction. Due to high spatial variability and dependence on both local and upstream terrain features, interpolation of wind data over hourly or minute-wise intervals is considered a particularly hard problem [3]. However, there are reasons why this can be useful, such as predicting the propagation of forest fires and pollutants, modelling the movement of flying animals and insects [5], or formulating initial conditions for atmospheric simulations and numerical weather prediction models [2,6]. In such applications, spatial interpolation is used to transfer measurements from meteorological stations to a fine grid or mesh.

There is relatively little research published on wind field reconstruction by the geological community [3]. Recent studies on machine learning interpolation methods like neural networks, support vector machines and random forest in other geological applications such as air temperatures [7], snow depth [8], mineral concentrations [9] and wind speed [4,10,11] have shown promising results. These types of models trade interpretability for power and flexibility and allow for efficient inclusion of covariates such as elevation, terrain slope, concavity and roughness [7]. This raises the question of whether machine learning-based interpolation works for wind field reconstruction as well. In a 2018 study by Reinhardt & Samimi [3], the established method of regression kriging was shown to reliably outperform some neural networks and support vector machines in horizontal wind field reconstruction. The authors note that the performance of machine learning methods was considerably worse for near surface winds, and argue that the models are unable to capture the complex behaviour and high variability that arises from interactions between wind and terrain.

Since the 2018 study by Reinhardt & Samimi, there have been significant advancements in the related field of fluid flow reconstruction. In particular, approaches based on generalizations of principal component decomposition have been trained to interpolate sparse measurements directly to high resolution grids with incredibly high accuracy [12–14]. The difference between these methods and traditional interpolation is that the training data consist of measurement from multiple times. However, the resolution of these methods can only be as high as the training data. High-resolution data are sometimes not available, or might take a considerable time and effort to simulate. In such cases, simpler interpolation models can arguably be a valid alternative. The definition of an interpolation model varies depending on the context. The definition used in our work draws from [9], and is synonymous with regression.

In this paper, we compare a novel machine learning method known as random Fourier features with a selection of popular and successful interpolation techniques. The model fits a Fourier series $β (x) = \sum_{k} β_{k} e^{i ω_{k} x}$ to the data. Instead of traditional greedy optimization methods such as stochastic gradient descent, our random Fourier features model explores the frequency domain using an adaptive Metropolis algorithm inspired by [15]. In each step, the Fourier coefficients $β_{k}$ are optimized with respect to a loss function. The work [16] proposes a technique for interpolation of incompressible flow by incorporating zero divergence as a constraint in the optimization. We propose an alternative approach where the incompressibility takes the form of a regularization term.

The method presented in this report only considers data from a fixed time point and does not model the wind flow through time. The method can be extended to model the flow for instance with the principal component technique explained in [13,17]. The article is structured as follows. In §2, a mathematical formulation of the wind field reconstruction problem is formulated. The data are also presented, along with error estimates used to evaluate the models. Section 3 contains a short introduction to the interpolation models. The results are presented in §4 and finally, discussed in §5.

2. Problem formulation

Let $Ω \subset R^{2}$ denote a geographical region. For the purpose of this report, $Ω$ is the set of points contained within the Swedish borders. We define the horizontal wind field $u : Ω \times [0, \infty) \to R^{2}$ that maps every point in space $x = (x, y) \in Ω$ and time $t > 0$ to a velocity vector $u (x, t) = (u (x, t), v (x, t)) \in R^{2}$ . In mesoscale atmospheric models, a set of physical assumptions involving quantities like the wind field, temperature and pressure are used to pose a set of differential equations that describe the time evolution of the system. Given an initial state at the time $t = 0$ and a set of boundary conditions, it is possible to simulate and forecast the wind field $u$ using these equations. The spatial interpolation methods considered in this work do not take previous or future times into account. Thus, only a subset of the physical assumptions about the system can effectively be enforced. Some of the presented models will incorporate the assumption of divergence-free flow, which is well established in mesoscale atmospheric models [6]. It can be written as a differential equation $\nabla \cdot u = 0$ , where $\nabla \cdot u = \partial u / \partial x + \partial v / \partial y$ is the divergence of the wind field $u$ . The above notation as well as the notation presented in §2a,b will be used extensively throughout our work.

(a) . Data

The data were obtained from the Swedish Meterological and Hydrological Institute. It contains a set of wind observations during the entirety of 2018, from $N = 171$ weather stations scattered across Sweden as shown in figure 1. Each measurement is collected 10 m above ground. The positions of the stations are given by the elevation—the height above sea level—as well as the latitude and longitude. The measurements are 10-min averages, collected once every hour. They consist of two components: the wind speed, which is the magnitude of the horizontal component of the wind vector, and the wind direction, which is the angle from which the wind comes, measured clockwise relative to north ( $90^{\circ}$ is east, $180^{\circ}$ is south and so on). Both measurements are rounded to varying degrees of precision depending on the station [18]. The stations are not active at all times. In fact, there are occasional hours with as few as one station reporting measurements. The wind measurements are highly autocorrelated over time, as demonstrated by the autocorrelation plots of the south-north and east-west wind components in figures 2 and 3. The data were processed before training. First, the latitude–longitude pairs were transformed to Cartesian coordinates $x = (x, y)$ where $x$ is the eastward-measured distance and $y$ is the northward measured distance, as shown in figure 1. This was done using the SWEREF 99 TM¹ map projection. Second, the wind measurements were transformed from polar to Cartesian coordinates $u = (u, v)$ forming a horizontal wind vector, where $u$ corresponds to the velocity component along the $x$ -axis and $v$ corresponds to the velocity component along the $y$ -axis. Lastly, all wind measurements from the month of September were removed. The measurements exhibited a root mean square wind speed of $5.5 {m s}^{- 1}$ as opposed to $4.5 {m s}^{- 1}$ for the remaining months, meaning that the error metrics were disproportionally affected by September. Furthermore, the wind field exhibited low spatial variability during this period. These highly uniform, low variance fields are not the main interest of the study. The removal of the September data did not affect the method comparison significantly.

Figure 1. — The `SWEREF 99 TM` orthographic projection of Sweden. The weather stations are marked with circles, coloured according to their elevation $z$ . The coordinate system is drawn in the bottom left. The centre of the map is at $62^{\circ} 40^{'} 6^{″} N$ , $17^{\circ} 55^{'} 19^{″} E$ . (Online version in colour.)

Figure 2. — Autocorrelation of the east-west component $u$ of the wind $u$ for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.)

Figure 3. — Autocorrelation of the north-south component $v$ of the wind $u$ for the available weather stations, up to a lag of 300 hours. Each red line represents the autocorrelation of one weather station. (Online version in colour.)

(b) . Spatial interpolation models

For the purpose of this report, the definition of interpolation models is restricted to approximations of functions $u : Ω \to R^{2}$ . The range is two-dimensional since we are modelling the horizontal wind vector field, and the region $Ω$ is Sweden. In general, interpolation models are also used for other properties like temperature, pressure and population density, etc. Nevertheless, we define a spatial interpolation model as a map $f$ from a set of velocity measurements $D = {(x_{n}, u_{n}) : x_{n} \in Ω, n = 1, 2, \dots, N}$ to a vector field $f_{D} : Ω \to R^{2}$ . The process of evaluating a model $f$ on $D$ is called training. Usually, the training is done by minimizing a loss function. The above notation will be used throughout our report. Note that there is an important distinction between $f$ and $f_{D}$ . The function $f$ represents a model, which is trained on data $D$ and produces a vector field $f_{D}$ that approximates $u$ . Any time a symbol is indexed by the letter $D$ or variations of it such as $D_{t}$ , it symbolizes an interpolation model trained on that particular dataset. A parametrized set of interpolation models ${f^{θ} : θ \in Θ}$ of interpolation models $f^{θ}$ is called an interpolation family, and the parameters $θ$ are called hyper parameters. For example, the $K$ -nearest neighbours models can be viewed as an interpolation family with hyper parameter $K$ . The process of finding the best model $f^{θ}$ in an interpolation family with respect to some quality of fit is called hyper optimization.

(c) . Quality of fit

Given an interpolation model $f$ and observations ${D_{t}}_{t \in T}$ at different times $t$ in a time span $T$ , the quality of fit $Q (f)$ is defined as the expected square loss with respect to the distribution $ρ (d x, d t)$ of the data in space and time $Ω \times T$ :

Q (f) = E [| | f_{D_{t}} (x) - u (x, t) | |^{2}] = \int_{T} \int_{Ω} | | f_{D_{t}} (x) - u (x, t) | |^{2} ρ (d x, d t) .

2.1

We only have access to a limited set of data, both in space and time. Hence, calculating the exact value of $Q$ is not possible. Furthermore, using all available data to compute $Q$ is too computationally expensive. Instead, we use a Monte Carlo sampling average in time and a cross-validation scheme in space. Let $t_{1}, t_{2}, \dots, t_{N_{T}}$ be all times in $T$ for which measurements exist, that is every full hour for the entirety of 2018, excluding the entirety of September. Next, we create a set $K$ of indices by sampling randomly from the integers 1 to $N_{T}$ with replacement. Then, define a set ${t_{k}}_{k \in K}$ of time samples and let $D_{k} = {(x_{n}, u_{k n}) : n = 1, 2, \dots, N_{k}}$ be the set of observations made at the time $t_{k}$ . As stated in §2a, the weather stations are not always active, so the number of measurements $N_{k}$ at the time $t_{k}$ varies. Define a random partition of the weather stations into $M$ disjoint sets of equal size $D_{k}^{m}, m = 1, 2, \dots, M$ , and denote $D_{k}^{- m} = D_{k} ∖ D_{k}^{m}$ . The fixed time expectations $Q_{t} (f) = E [| | f_{D_{t}} - u (x, t) | |^{2} ∣ t]$ of the quality of fit at the sampled times $t_{k}$ are approximated using cross-validation:

Q_{t_{k}} (f) \approx {\tilde{Q}}_{t_{k}} (f) := \frac{1}{| D_{k} |} \sum_{m = 1}^{M} \sum_{(x, u) \in D_{k}^{m}} | | f_{D_{k}^{- m}} (x) - u | |^{2} .

2.2

The above formula can be summarized by the following procedure. For every split $m$ , train the model $f$ on the data $D_{k}^{- m}$ and calculate the square errors on the remaining points $D_{k}^{m}$ . Then, take the mean of all squared errors. We found that fivefold cross-validation ( $M = 5$ ) struck a good balance between computation time and accuracy. Averaging ${\tilde{Q}}_{t_{k}} (f)$ over the samples in $K$ yields a Monte Carlo sample estimate of $Q (f)$ :

Q (f) \approx \tilde{Q} (f) := \frac{1}{| K |} \sum_{k \in K} {\tilde{Q}}_{t_{k}} (f),

2.3

also called the variance unexplained. An alternative measure $E$ is obtained from normalizing $Q (f)$ by the expected square wind speed $E [| | u (x, t) | |^{2}]$ , denoted $Q (0)$ :

E (f) = \frac{Q (f)}{Q (0)} \approx \frac{\tilde{Q} (f)}{\tilde{Q} (0)} =: \tilde{E} (f) .

2.4

This measurement is referred to as the fraction of variance unexplained. If the wind field $u$ has zero mean, the number $1 - E (f)$ is referred to as $R^{2}$ , or the coefficient of determination. We estimate the variance of $\tilde{Q}$ as

Var (\tilde{Q}) = \frac{Var (Q_{t})}{| K |} and Var (Q_{t}) \approx \frac{1}{| K |} \sum_{k \in K} {({\tilde{Q}}_{t_{k}} - \tilde{Q})}^{2} .

2.5

In §4, we show that the variance unexplained approximately follows a normal distribution. Consequently, two standard deviations of $\tilde{Q}$ constitute an approximate 95% confidence interval for $Q$ . Note that the confidence bound for $Q$ is not sufficient when comparing models, since the errors can be highly correlated. Given two models $f, g$ we instead define the difference $Δ Q (f, g) := Q (f) - Q (g)$ and estimate it as $Δ \tilde{Q} (f, g) := \tilde{Q} (f) - \tilde{Q} (g)$ . The variance of $Δ \tilde{Q} (f, g)$ is estimated similarly to (2.5):

Var (Δ \tilde{Q} (f, g)) = \frac{Var (Δ Q_{t} (f, g))}{| K |}, Var (Δ Q_{t} (f, g)) \approx \frac{1}{| K |} \sum_{k \in K} (Δ {\tilde{Q}}_{t_{k}} (f, g) - Δ \tilde{Q} {(f, g))}^{2} .

2.6

The sample size $| K |$ was adjusted to obtain 95% confidence intervals within 1% of the mean square wind speed on the 2018 wind dataset, for both the variance unexplained $Q (f)$ and difference in variance unexplained $Δ Q (f, g)$ . Due to high autocorrelation in the wind demonstrated in figure 2 as well as the presence of seasonality and trends, these confidence intervals do not generalize to longer time intervals than 2018. The quality of fit was used for both hyper parameter optimization and validation of the models, but with different data samples to avoid overfitting.

3. Models

(a) . Fourier models

In this section, we introduce two Fourier series-based models. At the end of the section, we arrive at the random Fourier features model, which is the main focus of our report. In §3b, we present some well established and high-performing interpolation models. We use these as benchmark models when assessing the interpolation quality of the random Fourier features.

(i) . Fourier series

The Fourier series-based model takes the form

β (x) = ℜ {\sum_{k = 1}^{K} {\hat{β}}_{k} e^{i ω_{k} \cdot x}}, ω_{k} \in R^{2}, {\hat{β}}_{k} \in C^{2}, k = 1, 2, \dots, K,

3.1

where $K$ is the number of terms, $ω_{k} \cdot x$ denotes the scalar product between $ω_{k}$ and $x$ and $ℜ {z}$ denotes the real part of the complex-valued vector $z$ . The Fourier series generalizes to arbitrary dimensions of the input $x$ , but in this report we chose $x = (x, y)$ to be just the horizontal coordinates. With slight abuse of notation, we let ${\hat{β}}_{k}$ and $ω_{k}$ denote two-dimensional vectors in $C^{2}$ and $R^{2}$ respectively. Furthermore, $ω = (ω_{1}, ω_{2}, \dots, ω_{K})$ and $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2}, \dots, {\hat{β}}_{K})$ . In the Fourier series-based model, $ω$ is held fixed, and the parameters $\hat{β}$ are estimated by optimizing with respect to the expectation of a loss function $ℓ : R^{2} \times R^{2} \to R^{+}$ :

min_{\hat{β}} E_{\tilde{ρ}} [ℓ (β (x), u)],

3.2

where the expectation is taken over the joint density $\tilde{ρ}$ of the data $x, u$ . For our application, $\tilde{ρ}$ is unknown. Therefore, the expected loss is replaced by a Monte Carlo sample estimate of the loss, called the empirical loss. Given a dataset $D = {(x_{n}, u_{n})}_{n = 1}^{N}$ , the empirical loss is expressed as

\frac{1}{N} \sum_{n = 1}^{N} ℓ (β (x_{n}), u_{n}) \approx E_{\tilde{ρ}} [ℓ (β (x), u)] .

3.3

For this work, we chose the following loss function:

ℓ (β (x), u) = | | β (x) - u | |^{2} + λ | | β | |_{S (2, 2)}^{2} + η | | \nabla \cdot β | |_{L^{2}}^{2} .

3.4

The expression $| | β | |_{S (2, 2)}^{2}$ denotes the second-order squared Sobolev-norm of $β$ [19]:

| | β | |_{S (2, 2)}^{2} = | | β | |^{2} + r (| | \partial_{x} β | |^{2} + | | \partial_{y} β | |^{2}) + r^{2} (| | \partial_{x x} β | |^{2} + 2 | | \partial_{x} \partial_{y} β | |^{2} + | | \partial_{y y} β | |^{2}),

3.5

where $r > 0$ is a hyper parameter. By orthogonality of the Fourier features, the Sobolev norm can be simplified to

| | β | |_{S (2, 2)}^{2} = \sum_{k = 1}^{K} (r^{2} | | ω_{k} | |^{4} + r | | ω_{k} | |^{2} + 1) | | {\hat{β}}_{k} | |^{2} .

3.6

The hyper parameter $r$ was added to allow for more flexibility in the penalty, and is equivalent to rescaling the input variable before applying the Sobolev norm. The expression $| | \nabla \cdot β | |_{L^{2}}^{2}$ is the squared $L^{2}$ -norm of the divergence of $β$ , which can be rewritten as

| | \nabla \cdot β | |_{L^{2}}^{2} = \sum_{k = 1}^{K} | ω_{k} \cdot {\hat{β}}_{k} |^{2} .

3.7

The intended effect of the Sobolev norm is to dampen high frequencies, and the divergence penalty is supposed to simulate incompressible flow. The empirical loss using the loss function as defined in (3.4) is

\frac{1}{N} \sum_{n = 1} | | β (x_{n}) - u_{n} | |^{2} + λ | | β | |_{S (2, 2)}^{2} + η | | \nabla \cdot β | |_{L^{2}}^{2} .

3.8

Here, $λ$ and $η$ are hyper parameters. In order to determine a suitable choice for $ω$ , assume the standard regression setting $u = f (x) + ϵ$ , where $ϵ$ is zero mean and independent of $x$ . Since the spatial region $Ω$ is bounded, $f$ can be extended periodically over $R^{2}$ . That is, the relation $f (x + m τ_{x}, y + n τ_{y}) = f (x, y)$ is imposed on $f$ for all $x = (x, y) \in R^{2}$ , whole numbers $m, n$ and some two-dimensional period $τ = (τ_{x}, τ_{y})$ . This means that $f$ can be written as a Fourier series. Thus, $f$ can in theory be approximated arbitrarily well by $β$ as the number of terms $K$ tends to infinity. However, since there is only a limited amount of data and $β$ contains a limited number of terms, the choice of $ω$ determines how well $β$ can approximate $f$ . In the Fourier series-based model, we settle on choosing $ω$ as a square grid with side length $2 M + 1$ , centred at the origin in the frequency domain:

ω = {(π \frac{m}{τ_{x}}, π \frac{n}{τ_{y}}) : - M \leq m \leq M, - M \leq n \leq M} .

3.9

As such, the Fourier series-based model is a spatial interpolation family with the hyper parameters $λ$ , $η$ , $M$ and $τ$ . We found $M = 10$ struck a balance between computational cost and accuracy. The remaining hyper parameters $λ$ , $η$ , $τ$ and $r$ were chosen as explained in the next section. Minimizing the expression in (3.8) amounts to solving a system of linear equations with respect to the elements $β$ .

(ii) . Random Fourier features

Instead of optimizing with respect to $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2}, \dots, {\hat{β}}_{K})$ , random Fourier features aims at solving the harder problem of also optimizing with respect to the Fourier frequencies $ω = (ω_{1}, ω_{2}, \dots, ω_{K})$ :

min_{\hat{β}, ω} E [ℓ (β (x), u)] .

3.10

The random Fourier features is an example of a neural network with one hidden layer and trigonometric activation function [15]. Here, $ω$ are the weights connecting the inputs $x$ to the hidden layer, and $\hat{β}$ are the weights connecting the nodes in the hidden layer to the output layer. What distinguishes random Fourier features is the training algorithm. The training of neural networks is traditionally done using some greedy method such as stochastic gradient descent, but in random Fourier features, the frequencies $ω$ are assumed to be randomly sampled according to some distribution $ρ$ . The task of optimizing frequencies is thus changed to optimizing $ρ$ . Note that the distribution can be deterministic, so no approximation error is introduced by this. Next, the optimization of $\hat{β}$ is moved inside the frequency expectation

min_{\hat{β}, ρ} E [ℓ (β (x), u)] \leq min_{ρ} E [min_{\hat{β}} E [ℓ (β (x), u) ∣ ω]] .

3.11

This reduces the inner minimization problem to linear regression with Fourier features. Using the quadratic loss function defined in (3.4), the coefficients $\hat{β}$ can be found using a standard matrix inversion.

The key approximation of random Fourier features is to assume that the elements of $ω$ are independent and identically distributed according to a density $ρ$ . That is, $ρ (ω) = \prod_{k = 1}^{K} ρ (ω_{k})$ . The optimal $ρ$ is approximated by finding an analytical minimizer to an upper bound of (3.10). We assume the same standard regression setting $u = f (x) + ϵ$ as presented in the previous section, where $ϵ \in R^{2}$ is zero mean and independent of $x$ and $f$ a periodic function. Referring to proposition A.1 found in the appendix, we can then put an upper bound

\begin{aligned} min_{ρ} E [min_{\hat{β}} E [ℓ (β (x), u) ∣ ω]] \\ \leq \frac{1 + λ {\bar{C}}_{1} + η {\bar{C}}_{2}}{{(2 π)}^{2} K} \sqrt{E [\frac{| | \hat{f} (ω) | |^{4}}{ρ {(ω)}^{4}}]} - \frac{1}{K} E [| | f (x) | |^{2}] + E [| | ϵ | |^{2}], \end{aligned}

3.12

on the expected loss, where ${\bar{C}}_{1}$ and ${\bar{C}}_{2}$ are positive constants determined by the Sobolev and divergence loss functions, respectively, and $ω$ is a random variable distributed according to $ρ$ , representing an arbitrary element of $ω$ . The proof assumes that the distribution $ρ$ is discrete and has bounded moments up to a degree determined by the order of the derivatives used in the regularization. Furthermore, we show that this upper bound is minimized by choosing $ρ (ω) \propto | | \hat{f} (ω) | |$ , where $\hat{f} (ω)$ are the Fourier coefficients for the Fourier series expansion of $f$ .

We make two important comments about the limitations of this method. First, note that we are only minimizing the bound of the iterated expectation in (3.12). Second, the iterated expectation is also only a bound for the true loss function (3.11). Despite these limitations, random Fourier features has in some examples been shown to outperform traditional methods of optimizing (3.10) like stochastic gradient descent, possibly due to its ability to efficiently learn high-frequency details early on in the training [15].

Since the target distribution $ρ (ω) \propto | | \hat{f} (ω) | |$ is not known a priori, this distribution has to be approximated somehow. The authors of [15] present an adaptive Metropolis algorithm for sampling from $ρ$ in a related setting. The main differences are that $f$ is $L^{2}$ integrable on the entirety of $R^{2}$ and has a one-dimensional range. Drawing from the work in [15], we devise an adaptive Metropolis algorithm for sampling the frequencies $ω$ from $| | \hat{f} | |$ . This algorithm is described in detail below (algorithm 1).

(ii) .

The hyper parameters of the random Fourier features consist of the Fourier frequencies $K$ , the periodicity $τ$ , the regularization parameters $η, r$ and $λ$ —these are also found in the Fourier series model—as well three new parameters that appear in the Metropolis sampling algorithm: the number of steps $B$ , the standard deviation $σ$ of the random walk and a parameter $γ$ that adjusts the acceptance probability of each step. Similarly to the work [16], the period $τ$ was chosen equal in both cardinal directions, about two times the north-south length of the region of interest: $τ_{x} = τ_{y} = 4000 km$ . The regularization parameter $r$ was set to $1 / τ$ in order to prevent the second derivative from dominating the Sobolev regularization term. The number of frequencies was fixed to $400$ and the number of steps $B$ was fixed to $500$ . Given a $d$ -dimensional feature space and a Gaussian target distribution, [15] show that for a fixed computational cost, the upper bound (3.12) as $K, N \to \infty$ is approximately minimized by $γ = 3 d - 2$ . Furthermore, the classical result from [20] shows that the optimal variance of the proposal kernel for a general random walk Metropolis–Hastings algorithm is $σ^{2} \approx {2.4}^{2} / d$ .

The regularization parameters $λ$ and $η$ were obtained by running a grid search to minimize the variance unexplained in (2.4), while keeping $τ$ , $r$ , $γ$ and $σ$ fixed as specified above. Since the values for $γ$ and $σ$ are only optimal in the limit, a second grid search was done in a neighbourhood of the limit values for $γ$ and $σ$ , while keeping $τ$ , $r$ , $λ$ and $η$ fixed. The number of steps was adjusted to ensure convergence of the Metropolis algorithm to a stationary distribution, while also keeping the computation time low. Note that the random Fourier features algorithm is not guaranteed to find the optimal frequencies. There is ongoing research in this area, and an iterative greedy method is explored in [17].

(b) . Benchmarking models

The following interpolation models were used for benchmarking: nearest neighbours, inverse distance weighting (IDW), kriging, random forest, neural networks and a Fourier series-based model introduced in the previous section. This section will serve as a short introduction to each of the methods as well as motivation as to why they are relevant. We use the same notation $D = {(x_{n}, u_{n}) \in Ω \times R^{2} : n = 1, 2, \dots, N}$ for the measurements as in §2c.

(i) . IDW

IDW methods ${f^{p} : p \geq 0}$ is an interpolation family of methods that are evaluated at a point $x$ by taking a component-wise weighted average of the horizontal wind vector data in $D$ . Specifically, the weights $α (x_{n}, x)$ for a model $f^{p}$ from the IDW family are proportional to $1 / d {(x_{n}, x)}^{p}$ where $d : Ω \times Ω \to [0, \infty)$ is a distance on $Ω$ . That is,

f_{D}^{p} (x) = \frac{\sum_{n = 1}^{N} α (x_{n}, x) u_{n}}{\sum_{n = 1}^{N} α (x_{n}, x)}, where α (x_{n}, x) = \frac{1}{d {(x_{n}, x)}^{p}} .

3.13

The singularities at $x = x_{n}, n = 1, 2, \dots, N$ are removed by setting $f_{D}^{p} (x_{n}) = u_{n}$ . The hyper parameter $p$ adjusts the amount of influence each data point has over its immediate surroundings. Letting $p = 0$ will result in all points weighing equally everywhere, i.e. taking the arithmetic mean of the data. Letting $p \to \infty$ will result in the nearest neighbour method. Usually, $p$ is chosen somewhere in between. A common shortcoming of IDW is that the interpolated values are bounded by the maximum and minimum values of the data and therefore IDW fails to predict unobserved extreme points. The main benefits are interpretability and relatively short training time. In this report, two values of $p$ were tested, namely $p = 2$ and $p = \infty$ (i.e. nearest neighbours). Furthermore, we used the horizontal distance between points, ignoring elevation.

(ii) . Kriging

Kriging is a statistical approach to spatial interpolation [21]. The true velocity $u$ is assumed to satisfy the equality

u (x) = μ (x) + ϵ_{x},

3.14

where $μ : Ω \to R^{2}$ is a deterministic function and $ϵ_{x}$ is a constant-mean, stochastic process over $x$ . Although it is possible to model correlations between the vector components of $u$ , we treat the two as independent, and run separate kriging pipelines for each component. For the remainder of this subsection, we therefore assume that the residual $ϵ$ is in $R$ . The key idea in kriging is to endow the residuals with a specific translation-invariant structure. Namely, it is assumed that the variance of the difference $ϵ_{x} - ϵ_{x^{'}}$ between two arbitrary residuals measured at points $x$ and $x^{'}$ depends only on the distance $| | x - x^{'} | |$ through a function called the variogram:

E [{(ϵ_{x} - ϵ_{x^{'}})}^{2}] = γ (| | x - x^{'} | |) .

3.15

The process of training a kriging model consists of two steps. First, a deterministic model is used to estimate the mean $μ$ from the data, resulting in some trained model $μ_{D}$ . For each $(x_{n}, u_{n}) \in D$ , the residuals $ϵ_{x_{n}}$ are estimated as $ϵ_{x_{n}} \approx u_{n} - μ_{D} (x_{n})$ . The second training step then consists of fitting a variogram $γ_{D}$ to the residuals $ϵ_{x_{n}}$ . Evaluating the model on a specific point $x$ can then be formulated as solving a minimization problem involving the variogram. Namely, kriging seeks a linear combination $\sum_{n} ω_{n} ϵ_{x_{n}}$ such that the weights $ω$ minimize the expected square error $E [{(ϵ_{x} - \sum_{n} ω_{n} ϵ_{x_{n}})}^{2}]$ . In order to get an unbiased estimate, the minimization is often subjected to a constraint $\sum_{n} ω_{n} = 1$ . Solving the constrained minimization problem results in a linear system of $N + 1$ equations

\sum_{m = 1}^{N} γ_{D} (| | x_{n} - x_{m} | |) ω_{m} + λ = γ_{D} (| | x_{n} - x | |), n = 1, 2, \dots, N

3.16

and

\sum_{m = 1}^{N} ω_{m} = 1,

3.17

where $λ$ is the Lagrange multiplier, an artificial variable that is brought in to enforce the constraint (3.17). Solving this system of equation produces a $N \times 1$ vector of weights $ω_{n} (x), n = 1, 2, \dots, N$ , which can be used to provide the final estimation $f_{D}$ of $u$ :

f_{D} (x) = μ_{D} (x) + \sum_{n = 1}^{N} ϵ_{x_{n}} ω_{n} (x) .

3.18

A key strength of kriging is that the statistical model allows for point-wise error estimation. Furthermore, kriging can be combined with virtually any other unbiased deterministic interpolation model by interpreting it as the mean $μ (x)$ . However, when the statistical assumptions do not hold, kriging can perform poorly. Machine learning methods such as random forests have been successful in beating kriging for various spatial interpolation tasks, see for example [7,9]. The authors in [9] argue that even though kriging might be redundant in terms of accuracy, it remains a valuable tool for understanding data, exactly because of its statistical properties and interpretability. We used a version of kriging called Universal kriging, which differs from kriging in that a linear regression approximation of the mean $μ (x)$ and the residuals $ϵ_{x}$ are fitted to the data simultaneously, resulting in a joint system of equations for all the weights. The Python package pykrige was used to implement Universal kriging with a linear variogram and a piece-wise linear mean.

(iii) . Random forest

Random forests are constructed by averaging an ensemble of regression trees. A regression tree is a function that recursively splits the domain of interest into smaller domains, based on a user-specified criteria. Each domain is then assigned a constant value by minimizing some objective function on the training data [22]. In a random forest, each tree is trained on a random sample of the data $D$ and each split in the tree is chosen by randomly selecting one feature out of the input features and picking a split that minimizes the mean square error [23]. Random forests have been used successfully in spatial interpolation problems, for example to predict temperatures on and around Kilimanjaro, Tanzania [7] as well as mineral concentrations [9]. The main drawback of random forests is lack of interpretability.

In this report, we used a random forest with $200$ trees with mean square loss for splitting, and unlimited tree depth. The forest was implemented in Python using the package scikit-learn. Furthermore, the random forest was trained on a polynomial feature map $ϕ$ of the horizontal coordinates $x = (x, y)$ and the elevation $z$ . That is, the trained model $f_{D}$ is a composition $h \circ ϕ$ of the random forest $h_{D}$ and the feature map $ϕ$ , which is fixed. The feature map increases expressivity of the random forest, which can improve performance if the data are not too noisy or sparse. There are many possible ways to construct feature maps. We found that a combination of polynomial features $x^{p_{1}} y^{p_{2}} z^{p_{3}}$ with a total order $p_{1} + p_{2} + p_{3}$ of $\leq 3$ struck a good balance between expressivity and generalizability.

(iv) . Feedforward neural network

Feedforward neural networks have been used extensively in different areas of applied mathematics. To read more about neural networks, see for example [24]. In this report, only a specific family of feedforward neural networks was considered. Namely, the networks are characterized by three fully connected hidden layers, a constant number of $n$ nodes in each layer and the ReLU activation function. The input layer consists of the three spatial coordinates $x, y$ and $z$ . The weights were optimized using the Adam algorithm [25], with respect to the $L_{2}$ -regularized loss. The network was implemented in TensorFlow, and hyper parameter optimization of the number of nodes and regularization parameter was done with a grid search on the variance unexplained.

(v) . Weighted linear combination of model

Given a set of $n$ interpolation models ${f^{1}, f^{2}, \dots f^{n}}$ and a dataset $D_{t}$ , we can improve on the individual models by forming a weighted average

f_{D_{t}} = α_{1} f_{D_{t}}^{1} + α_{2} f_{D_{t}}^{2} + \dots + α_{n} f_{D_{t}}^{n},

3.19

where $α_{1}, α_{2}, \dots, α_{n}$ are the weights. The weights are regarded as hyper parameters. If the number of models $n$ is not too high, there is little risk of overfitting, and the hyper parameters can simply be directly fitted to minimize the quality of fit. In §4, we use this method to combine the random forest and random Fourier features models.

4. Results

We begin the results section by establishing a suitable choice for the number of time samples $| K |$ , as discussed in §2c. We are looking to satisfy two main conditions. First, $\tilde{Q}$ needs to be approximately normally distributed. As seen from figure 4, the central limit theorem seems to hold for estimates of $E$ with sample sizes $| K | > 50$ . Using normality of $Q$ and $E$ means that two standard deviations of $E$ and $Q$ correspond to 95% confidence intervals for $E$ and $Q$ , respectively. Second, $| K |$ needs to be sufficiently large for the error to be reasonably small. We chose $| K | = 500$ . Tables 1 and 2 show that this sample size achieves the goal of $1 %$ relative error with 95% confidence for both $Q$ and $Δ Q$ that we formulated in §2c. The only exception is the nearest neighbours model, for which the confidence interval for the error estimate is 2%. This is still sufficient accuracy for our purpose.

Figure 4. — A histogram of 5000 samples of the variance unexplained $\tilde{Q} (f) = \frac{1}{| K |} \sum_{k \in K} {\tilde{Q}}_{t_{k}} (f)$ with $| K | = 50$ , bootstrapped from a total of 500 samples of ${\tilde{Q}}_{t_{k}}$ . The model $f$ is the Fourier series, and the black line is a fitted normal distribution.

Table 1.

Quality of fit measurements $Q$ and $E$ with 95% confidence intervals for a number of different models. The dimension is indicated at the top row.

interpolation model	$\tilde{E} [1]$	$\tilde{Q} [m^{2} s^{- 2}]$
nearest neighbours	$0.628 \pm 0.022$	$11.145 \pm 0.386$
inverse distance weighting	$0.407 \pm 0.014$	$7.220 \pm 0.250$
Universal kriging	$0.388 \pm 0.013$	$6.887 \pm 0.235$
random forest (RF)	$0.386 \pm 0.013$	$6.862 \pm 0.238$
neural network	$0.381 \pm 0.013$	$6.762 \pm 0.225$
Fourier series	$0.380 \pm 0.013$	$6.740 \pm 0.226$
random Fourier features (FF)	$0.370 \pm 0.012$	$6.569 \pm 0.220$
FF and RF average	$0.357 \pm 0.012$	$6.333 \pm 0.212$

Open in a new tab

Table 2.

Difference in quality of fit $Δ Q$ and $Δ E$ with 95% confidence intervals for the benchmark models relative to the random Fourier features model (FF). The dimension is indicated at the top row, in square brackets. A positive number means that the given model is worse than random Fourier features.

interpolation model	$Δ \tilde{E} [1]$	$Δ \tilde{Q} [m^{2} s^{- 2}]$
nearest neighbours	$0.258 \pm 0.010$	$4.576 \pm 0.183$
inverse distance weighting	$0.037 \pm 0.003$	$0.651 \pm 0.049$
Universal kriging	$0.018 \pm 0.002$	$0.318 \pm 0.033$
random forest	$0.017 \pm 0.003$	$0.293 \pm 0.060$
neural network	$0.011 \pm 0.003$	$0.192 \pm 0.056$
Fourier series	$0.010 \pm 0.001$	$0.171 \pm 0.017$
FF and RF average	$- 0.013 \pm 0.002$	$- 0.236 \pm 0.032$

Open in a new tab

The quality of fit measurements reported in table 1 were all obtained using this sample size, and the reported uncertainty corresponds to two standard deviations, estimated according to (2.5). As the table shows, the confidence bounds vary slightly depending on the model. We observe that the distribution is narrower for better performing models. The same sample size was also used for the differences $Δ Q$ between the quality of fit of the benchmarking models and the random Fourier features model listed in table 2.

Figure 5 shows examples of the reconstructed wind field for the Fourier series, random Fourier features and Universal kriging interpolation models.² Figure 6 shows hourly means of the variance explained for different seasons throughout 2018.

Figure 6. — Hourly variance unexplained at different seasons, for a selection of the investigated interpolation models. The remaining models were left out to prevent clutter. The measurement is aggregated over all stations and plotted on a log-scale to improve visibility. Note that September is not in this data. The unexplained variance of the zero model corresponds to the mean square wind speed. (Online version in colour.)

The hyper parameter grid searches for the random Fourier features model are shown in figures 7 and 8. The optimal hyper parameters for the random Fourier features model were $0.01$ for the Sobolev regularization constant $λ$ , $0.001$ for the divergence penalty $η$ , $1.4$ for the exponent $γ$ and $2.25$ for the step size $σ$ in the proposal kernel in the adaptive Metropolis algorithm (algorithm 1). Running the Metropolis algorithm for $B = 500$ steps struck a good balance between convergence and computation time. Figure 9 shows an example of the frequency distribution that the algorithm samples from, and figure 10 shows the magnitude of the Fourier coefficients for the Fourier series model as a comparison.

Figure 7. — Fraction of variance unexplained (2.4) in the random Fourier features method, as a function of the Sobolev penalty $λ$ and divergence penalty $η$ explained in (3.4), with $τ = 4 \times 10^{7}$ , $σ = 2.25$ , $γ = 1.25$ . (Online version in colour.)

Figure 8. — Fraction of variance unexplained (2.4) as a function of the exponent $γ$ and step size $σ$ in the adaptive Metropolis algorithm, with $τ = 4 \times 10^{7}$ , $η = 0.001$ , $λ = 0.01$ . (Online version in colour.)

Figure 9. — Sampling density $ρ (ω) = ρ (ω_{x}, ω_{y})$ of the random Fourier features algorithm for measurement data from 10.00, 22 January 2018. Algorithm (1) was run with $B = 1000$ , collecting the Fourier frequencies at each step into a histogram. The profile plots show marginal distributions for the latitude frequencies $ω_{x}$ (top) and longitude frequencies $ω_{y}$ (right). (Online version in colour.)

Figure 10. — Heat map showing the magnitudes $| | \hat{β} | |$ of the Fourier terms $\hat{β} e^{i ω \cdot x}$ as a function of the frequencies $ω$ in the Fourier series model, for measurement data from 10.00, 22 January 2018. The support is a square grid with width 41, centred at the origin. The profile plots are analogous to figure 9. (Online version in colour.)

We omit the hyper optimization results for the benchmarking models, which were all performed using the same type of grid search methods. The weights for the linear combination of the random forest and random Fourier features were chosen to minimize loss on the training data. We obtained a weight of $0.49$ for the random forest, and $0.51$ for random Fourier features.

5. Discussion

Table 1 shows that the model consisting of an average between the random forest and random Fourier features model performed the best out of the tested models. Table 2, consisting of the difference between quality of fit of the random Fourier features and the remaining models, indicates that this ordering is statistically significant, since none of the confidence intervals overlap with zero. In particular, the transition from the fixed frequencies in the Fourier series-based model to the randomly sampled frequencies of the random Fourier features results in a significant improvement. The enlarged area in figure 5 shows that the reconstructed wind field is better aligned with unseen data when using random Fourier features. The numerical experiments in [15] indicate that the random Fourier features model outperforms its reference models as the amount of data increases. The improved performance is likely because the high-frequency details can be captured by exploring remote parts of the frequency domain, whereas a fixed grid of frequencies centred at the origin cannot. Figure 9 shows a noticeable mass outside the white square, compared with figure 10, in which the high frequency contributions are truncated.

The success of the random forest and random Fourier features average hints that there might be more potential for improving the results using similar types of mixture models. It is also clear that the random Fourier features model on its own is competitive in comparison with the benchmarking models. A common argument for favouring statistical models such as kriging is easy access to precise error analysis, given that the prerequisite assumptions discussed in §3b(ii) hold. Whether or not this is worth trading in exchange for higher accuracy depends on the situation. The random Fourier features model has the upside of being easy to manipulate once it has been trained. It can be efficiently evaluated, differentiated and integrated for analysis of physical quantities such as divergence, vorticity and energy.

In figure 6, the relative performance of the interpolation models is visualized at different times of day and times of the year. The relative performance of the models is consistent throughout. A source of difficulty in wind field reconstruction is near-surface turbulence. In general, the interpolation error is lower during night-time. For the spring and summer months, this can be explained by the observed decrease in wind speeds during night-time. For autumn and winter, however, wind speeds are consistently high. The atmosphere tends to be more stable at night-time during the winter months [26], which might explain this decrease in error.

The autocorrelation plots in figures 2 and 3 demonstrate a strong autocorrelation in the wind field. Furthermore, figure 6 shows that the wind exhibits seasonal shifts in strength, as well as daily variations. This behaviour is well documented [26]. The models used in [12–14] are trained on multiple time samples whereas the spatial interpolation models used in our work only use the time aspect for hyper parameter optimization. Therefore, extending the spatial interpolation models to take time into account could improve the results. Note that since the data are only collected throughout 1 year, it is not possible to see the effect of inter-annual variability on the modelling error. More careful preprocessing of the data that account for these effects is needed to establish which model is optimal over a longer time interval.

As seen in figure 7, there is little to no change in the quality of fit when setting the divergence penalty to zero. Albeit a compelling idea, it makes sense that penalizing the divergence on a two-dimensional slice of the wind field would not improve the results. The near-surface wind flow is not typically parallel with the ground due to thermally and mechanically induced turbulence in the atmospheric boundary layer [27]. If we were to transition from a two-dimensional to a three-dimensional setting by incorporating elevation as a feature in the random Fourier features, divergence penalization would likely be more useful. More advanced methods can possibly incorporate no-slip or slipping boundary conditions on the ground surface, see [16].

Lastly, we discuss some general areas of improvement that are well documented and known in the environmental sciences community. First, the results may be improved by incorporating weather stations from nearby regions such as the Baltic Sea, Finland, Norway and Denmark. The random Fourier features model is constructed to efficiently find high frequency details in the target function, which makes it suitable when increasing the resolution of the data. The inclusion of additional covariates could also improve the results. Reinhardt & Samimi [3] include geopotential, potential vorticity, relative humidity, vertical velocity and elevation, and note that these covariates improve the results. Furthermore, orographic features like terrain roughness and topographic sheltering are known to affect the wind characteristics in Sweden [26].

6. Conclusion

In this report, we explored the potential for wind field reconstruction with sparse data using interpolation models. We investigated whether a novel interpolation model called random Fourier features model is competitive with respect to popular statistical interpolation models such as kriging, as well as modern machine learning methods such as random forests and neural networks. Drawing from the work [15], we derived an upper bound for the mean square error of the random Fourier Frequencies model, found a density that minimized this bound and devised an adaptive Metropolis algorithm for sampling from this density. We showed that random Fourier features is competitive with respect to a time-space average of the square error and suggested some future areas of research, such as extending the model to incorporate data over multiple times and including more terrain-specific features. The key takeaway of our work is an improved spatial interpolation model for wind field reconstruction that is able to capture highly irregular near-surface wind features.

Supplementary Material

Click here for additional data file.^{(644.8KB, pdf)}

Acknowledgements

The authors of this report would like to thank Prof. Anders Szepessy for his support and feedback. Furthermore, the work of Dmitry Kabanov, Luis Espath, Andreas Enblom and Magnus Tronstad in data processing and programming was integral in realizing the project. We also thank the two anonymous referees whose comments helped improve the manuscript.

Appendix A

In this section, we derive an upper bound for the minimum of the expected loss $E [ℓ (β (x), u)]$ for the Fourier features model. We assume the standard regression setting where $u = f (x) + ϵ$ and $ϵ \in R^{2}$ is independent of $x$ , and $E [| | ϵ | |^{2}] = σ^{2}$ . Let $\hat{f} (ω)$ define the Fourier coefficients of $f$ and suppose for simplicity that $f$ is defined on the domain $X = [0, 2 π] \times [0, 2 π]$ . That is, $f$ can be expressed as the Fourier series

f (x) = \frac{1}{2 π} \sum_{ω \in Z^{2}} \hat{f} (ω) e^{i ω \cdot x} .

A 1

For the random Fourier features model, we choose

β (x) = \sum_{k = 1}^{K} {\hat{β}}_{k} e^{i ω_{k} \cdot x},

A 2

where $\hat{β} = ({\hat{β}}_{1}, {\hat{β}}_{2}, \dots, {\hat{β}}_{K})$ are the complex-valued two-dimensional coefficients and $ω = (ω_{1}, ω_{2}, \dots, ω_{K})$ are independent and identically distributed according to a discrete distribution $ρ : Z^{2} \to [0, \infty)$ . The loss function is defined as

ℓ (β (x), u) = | | β (x) - u | |^{2} + λ | | L β | |^{2},

A 3

where

| | L β | |^{2} = \int_{X} \bar{L β} (x) L β (x) d x,

A 4

and $L = \sum_{m = 1}^{M} c_{m} \partial_{1}^{α_{m, 1}} \partial_{2}^{α_{m, 2}}$ is a linear differential operator with derivatives of at most order $d$ (i.e. $α_{m, 1} + α_{m, 2} \leq d$ ). Examples of regularizers that fit this description are the divergence penalty $L β = \nabla \cdot β = (\partial_{1} + \partial_{2}) β$ (3.7), and each separate term of the Sobolev norm (3.5). The distribution $ρ$ is assumed to exist in a family $P$ of discrete distributions

P := {ρ : Z^{2} \to (0, \infty) | ρ (ω) > 0 and E [| ω_{i} |^{m}] < C, 0 \leq m \leq 4 d i = 1, 2},

A 5

where $C$ is a positive real number. Thus, $ρ \in P$ is strictly positive and has uniformly bounded moments $\sum_{ω \in Z^{2}} | ω_{i} |^{m} ρ (ω)$ of degree up to $4 d$ . Lastly, we make the assumption that

\frac{| | \hat{f} (ω) | |}{\sum_{ω^{'} \in Z^{2}} | | \hat{f} (ω^{'}) | |} \in P,

A 6

which means that $f (x)$ has to be a member of the Sobolev space $W^{4 d, 2} (X)$ .

Proposition A.1. —

In the above setting, the following holds:

(a)
The minimum of $E [ℓ (β (x), u)]$ with respect to the coefficients $β$ can be bounded:
$E [min_{\hat{β} \in C^{2 K}} E [ℓ (β (x), u) ∣ ω]] \leq \frac{1 + λ \bar{C}}{{(2 π)}^{2} K} \sqrt{E [\frac{| | \hat{f} (ω) | |^{4}}{ρ {(ω)}^{4}}]} + σ^{2} - \frac{1}{K} E [| | f (x) | |^{2}],$ A 7
where $\bar{C} > 0$ .

(b)
Furthermore, this upper bound is minimized by the distribution
$ρ (ω) = \frac{| | \hat{f} (ω) | |}{\sum_{ω^{'} \in Z^{2}} | | \hat{f} (ω^{'}) | |}, ω \in Z^{2} .$ A 8

By replacing $λ \bar{C}$ in (A 7) with $\sum_{s = 1}^{S} λ_{s} {\bar{C}}_{s}$ , proposition A.1 can be generalized to include a linear combination of $S$ regularization functions $\sum_{s = 1}^{S} λ_{s} | | L_{s} β | |^{2}$ .

Proof. —

We divide the problem into part (a) and part (b) of the proposition.

(a)
Let
${\hat{β}}_{k} = \frac{\hat{f} (ω_{k})}{2 π K ρ (ω_{k})}, k = 1, 2, \dots, K .$ A 9
$β$ is then a function of the random variables $x$ and $ω$ , which we denote $β (x; ω)$ . With this definition, $β$ is unbiased given $x$ :
$E [β (x; ω) | x] = \sum_{ω \in Z^{2}} K \frac{\hat{f} (ω)}{2 π K ρ (ω)} e^{i ω \cdot x} ρ (ω) = \frac{1}{2 π} \sum_{ω^{'} \in Z^{2}} \hat{f} (ω) e^{i ω \cdot x} = f (x),$ A 10
where we used that the components of $ω$ are iid. Using independence of the residuals $ϵ$ and the data $x$ , we see that
$\begin{aligned} E [| | β (x; ω) - u | |^{2} ∣ x] \\ = E [| | β (x; ω) - (f (x) + ϵ) | |^{2} ∣ x] = E [| | β (x; ω) - f (x) | |^{2} ∣ x] + σ^{2}, \end{aligned}$ A 11
showing that the expected square error of $β$ as a function of $x$ is the variance of $β$ with respect to the frequencies $ω$ , plus the variance of the noise. The expected value of (A 11) can be simplified further since the frequencies $ω_{k}$ are assumed independent
$E [{| | \sum_{k = 1}^{K} {\hat{β}}_{k} e^{i ω_{k} \cdot x} - f (x) | |}^{2}] \leq \frac{1}{{(2 π)}^{2} K} \sqrt{E [\frac{| | \hat{f} (ω) | |^{4}}{ρ {(ω)}^{4}}]} - \frac{1}{K} E [| | f (x) | |^{2}] .$ A 12
Here, $ω$ denotes an arbitrary frequency with a distribution identical to the components of $ω$ . The last step is due to the Jensen inequality. Consider the penalty term containing the linear operator $L$ . Applying $L$ to $β (x)$ is equivalent to multiplying each term ${\hat{β}}_{k j} e^{i ω_{k} x}, j = 1, 2$ of the Fourier series with $ℓ_{j} (ω_{k}) = \sum_{m = 1}^{M} c_{m} {(i ω_{k, 1})}^{α_{m, 1}} {(i ω_{k, 2})}^{α_{m, 2}}$ , a multivariate polynomial in $ω_{k}$ of degree $d$ . Define $r (ω) = | ℓ_{1} (ω) |^{2} + | ℓ_{2} (ω) |^{2}$ . Note that $r$ has degree $2 d$ . It follows that
$\begin{aligned} E [| | L β | |^{2}] & = \int_{X} E [| L β (x; ω) |^{2} | x] d x \\ = K {(2 π)}^{2} E [| ℓ_{1} (ω_{k}) {\hat{β}}_{k 1} + ℓ_{2} (ω_{k}) {\hat{β}}_{k 2} |^{2}] . \end{aligned}$ A 13
The first equality comes from switching the order of integration, and in the second we use independence of the Fourier frequencies combined with the size ${(2 π)}^{2}$ of the region $X = [0, 2 π] \times [0, 2 π]$ . Applying the Cauchy–Schwartz inequality twice on the last expression of (A 13) results in
$\begin{aligned} K {(2 π)}^{2} E [| ℓ_{1} (ω_{k}) β_{k 1} + ℓ_{2} (ω_{k}) β_{k 2} |^{2}] \leq K E [r (ω) | | 2 π β | |^{2}] \\ \leq \frac{1}{K} \sqrt{E [r {(ω)}^{2}] E [\frac{| | \hat{f} (ω) | |^{4}}{ρ {(ω)}^{4}}]} . \end{aligned}$ A 14
The multivariate polynomial $r {(ω)}^{2} := \sum_{m = 1}^{M^{'}} r_{m} ω_{1}^{α_{m, 1}} ω_{2}^{α_{m, 2}}$ is of order $4 d$ , i.e. $α_{m, 1} + α_{m, 2} \leq 4 d$ . For each term $m$ , define $γ_{m} := α_{m, 1} / α_{m, 1} + α_{m, 2} \in (0, 1)$ . By assumption, $ρ$ lies in $P$ , and therefore, the expectation $E [r {(ω)}^{2}]$ can be uniformly bounded as follows:
$E [r {(ω)}^{2}] = \sum_{m = 1}^{M^{'}} r_{m} E [ω_{1}^{α_{m, 1}} ω_{2}^{α_{m, 2}}] \leq \sum_{m = 1}^{M^{'}} | r_{m} | E [| ω_{1} |^{(α_{m, 1} + α_{m, 2}) γ_{m}} | ω_{2} |^{(α_{m, 1} + α_{m, 2}) (1 - γ_{m})}] .$ A 15
Next, we used Hölder’s inequality, which states that $E [X Y] \leq {(E [| X |^{p}])}^{1 / p} (E {[Y^{q}]}^{1 / q})$ for any $1 / p + 1 / q = 1$ , term wise with $p = \frac{1}{γ_{m}}$ .
$\leq \sum_{m = 1}^{M^{'}} | r_{m} | \underset{\leq C^{γ_{m}} C^{1 - γ_{m}} = C}{\underset{⏟}{({E [| ω_{1} |^{α_{m, 1} + α_{m, 2}}]}^{γ_{m}} {E [| ω_{2} |^{α_{m, 1} + α_{m, 2}}]}^{1 - γ_{m}})}} \leq C \sum_{m = 1}^{M^{'}} | r_{m} | .$ A 16
The last inequality comes from the bounded moments of the probability distribution $ρ$ . If we define $\bar{C}$ such that $\frac{\bar{C}}{{(2 π)}^{2}} = C \sum_{m = 1}^{M^{'}} | r_{m} |$ and put together (A 11), (A 12) and (A 16), we get
$\begin{aligned} E [min_{\hat{β}} E [| | β (x; ω) - u | |^{2} + | | L β | |^{2} ∣ ω]] \leq \frac{1 + λ \bar{C}}{{(2 π)}^{2} K} \sqrt{E [\frac{| | \hat{f} (ω) | |^{4}}{ρ {(ω)}^{4}}]} - \frac{1}{K} E [| | f (x) | |^{2}] + σ^{2} . \end{aligned}$ A 17

(b)
To derive an optimal choice for $ρ$ , we seek to minimize the expression inside the root in (A 17). We can redefine this problem in terms of a non-normalized function $p$ such that $ρ (ω) = p (ω) / \sum_{Z^{2}} p (ω^{'})$ :
$minimize (\sum_{Z^{2}} \frac{| | \hat{f} (ω) | |^{4}}{p {(ω)}^{3}}) \cdot {(\sum_{Z^{2}} p (ω))}^{3} .$ A 18
First, define a real-valued function $H (ϵ)$ where $ϵ$ is a real number close to zero
$H (ϵ) = (\sum_{Z^{2}} \frac{| | \hat{f} (ω) | |^{4}}{{(p (ω) + ϵ δ (ω))}^{3}}) \cdot {(\sum_{Z^{2}} p (ω) + ϵ δ (ω))}^{3},$ A 19
and $δ$ is a small arbitrary variation of $p$ . Next, seek a solution $p$ to $H^{'} (0) = 0$ . After taking the derivative of (A 19) and simplifying, we are left with the following equation:
$- c_{1} \sum_{Z^{2}} \frac{| | \hat{f} (ω) | |^{4}}{p {(ω)}^{4}} δ (ω) + c_{2} \sum_{Z^{2}} δ (ω) = 0,$ A 20
where $c_{1}$ and $c_{2}$ are constants. The equation can be rewritten as
$\sum_{Z^{2}} (\frac{| | \hat{f} (ω) | |^{4}}{p {(ω)}^{4}} - \frac{c_{2}}{c_{1}}) δ (ω) = 0.$ A 21
Since $δ (ω)$ is arbitrary, the expression inside the sum must be zero, that is $p (ω) = \sqrt[4]{c_{2} / c_{1}} | | \hat{f} (ω) | |$ for all $ω$ . Hence, the optimal $ρ$ is $ρ (ω) = | | \hat{f} (ω) | | / \sum_{Z^{2}} | | \hat{f} (ω^{'}) | |$ . Lastly, some straightforward calculations show that the optimization with respect to $ρ$ is a convex optimization problem (that is, $P$ is a convex set and $E [| | \hat{f} (ω) | |^{4} / ρ {(ω)}^{4}]$ is a convex function with respect to $ρ$ ). But then, the above derived local minimum must also be a global minimum.

Footnotes

More information about the SWEREF 99 TM map projection can be found here: www.lantmateriet.se/en/maps-and-geographical-information/gps-geodesi-och-swepos/Referenssystem/Tvadimensionella-system/SWEREF-99-projektioner.

An animation of the reconstructed wind field for the random Fourier features model can be found at www.youtube.com/watch?v=eOMMJVPn0v8.

Data accessibility

Python code for this paper can be found on Bitbucket via the following url: https://bitbucket.org/the_eye_of_the_breeze/div-fourier-publication/src/master/. It implements interpolation methods discussed in §3 and contains the dataset used for training and evaluation.

Authors' contributions

J.K. designed and coordinated the study, was responsible for the data collection and pre-processing, designed the statistical framework, revised the manuscript and adapted the proof of Proposition A.1 to general regularization terms; E.S. carried out much of the statistical work, adapted the random Fourier algorithm to discrete frequencies and drafted the manuscript; R.T. conceived the study, is responsible for the theoretical motivation of the Fourier models, assisted in the proof of Proposition A.1 and revised the manuscript.

Competing interests

We declare we have no competing interests.

Funding

J.K. and R.T. were partially supported by the KAUST Office of Sponsored Research (OSR) under Award numbers OSR-2019-CRG8-4033 and OSR-2019-CRG8-4033.2, and the Alexander von Humboldt Foundation, through the Alexander von Humboldt Professorship award. E.S. was supported by the KAUST Visiting Student Research Program (VSRP).

References

1.Smith A, Clifton A, Fields M. 2016. Wind plant preconstruction energy estimates: Current practice and opportunities. Technical report, NREL/TP-5000-64735, National Renewable Energy Laboratory, Golden, CO.
2.Bauer P, Thorpe A, Brunet G. 2015. The quiet revolution of numerical weather prediction. Nature 525, 47-55. ( 10.1038/nature14956) [DOI] [PubMed] [Google Scholar]
3.Reinhardt K, Samimi C. 2018. Comparison of different wind data interpolation methods for a region with complex terrain in central asia. Clim. Dyn. 51, 3635-3652. ( 10.1007/s00382-018-4101-y) [DOI] [Google Scholar]
4.Carta JA, Ramírez P, Velázquez S. 2009. A review of wind speed probability distributions used in wind energy analysis: case studies in the canary islands. Renew. Sustain. Energy Rev. 13, 933-955. ( 10.1016/j.rser.2008.05.005) [DOI] [Google Scholar]
5.Luo W, Taylor MC, Parker SR. 2008. A comparison of spatial interpolation methods to estimate continuous wind speed surfaces using irregularly distributed data from England and Wales. Int. J. Climatol. 28, 947-959. ( 10.1002/joc.1583) [DOI] [Google Scholar]
6.Pielke R. 2003. Mesoscale atmospheric modeling. In Encyclopedia of Physical Science and Technology (ed. RA Meyers), 3rd edn, pp. 383–389. New York, NY: Academic Press.
7.Appelhans T, Mwangomo E, Hardy DR, Hemp A, Nauss T. 2015. Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania. Spatial Stat. 14, 91-113. ( 10.1016/j.spasta.2015.05.008) [DOI] [Google Scholar]
8.Erxleben J, Elder K, Davis R. 2002. Comparison of spatial interpolation methods for estimating snow distribution in the colorado rocky mountains. Hydrol. Processes 16, 3627-3649. ( 10.1002/hyp.1239) [DOI] [Google Scholar]
9.Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Gräler B. 2018. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6, e5518. ( 10.7717/peerj.5518) [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Cellura M, Cirrincione G, Marvuglia A, Miraoui A. 2008. Wind speed spatial estimation for energy planning in Sicily: introduction and statistical analysis. Renew. Energy 33, 1237-1250. ( 10.1016/j.renene.2007.08.012) [DOI] [Google Scholar]
11.Jung C, Schindler D. 2015. Statistical modeling of near-surface wind speed: a case study from Baden-Wuerttemberg (southwest Germany). Austin J. Earth Sci. 2, 1006. [Google Scholar]
12.Callaham JL, Maeda K, Brunton SL. 2019. Robust flow reconstruction from limited measurements via sparse representation. Phys. Rev. Fluids 4, 103907. ( 10.1103/PhysRevFluids.4.103907) [DOI] [Google Scholar]
13.Erichson NB, Mathelin L, Yao Z, Brunton S, Mahoney M, Kutz JN. 2020. Shallow neural networks for fluid flow reconstruction with limited sensors. Proc. R. Soc. A 476, 20200097. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Jin X, Laima S, Chen W-L, Li H. 2020. Time-resolved reconstruction of flow field around a circular cylinder by recurrent neural networks based on non-time-resolved particle image velocimetry measurements. Exp. Fluids 61, 1-23. ( 10.1007/s00348-019-2836-9) [DOI] [Google Scholar]
15.Kammonen A, Kiessling J, Plecháč P, Sandberg M, Szepessy A. 2020. Adaptive random fourier features with metropolis sampling. Foundations Data Sci. 2, 309. [Google Scholar]
16.Tempone RF. 1999. Approximation and interpolation of divergence free flows. Tesis de maestría. Universidad de la República (Uruguay). Facultad de Ingeniería.
17.Espath L, Kabanov D, Kiessling J, Tempone R. 2021. Statistical learning for fluid flows: sparse fourier divergence-free approximations. Phys. Fluids 33, 097108. ( 10.1063/5.0064862) [DOI] [Google Scholar]
18.The Swedish Meterological and Hydrological Institute SMHI. Download meterological observations, wind speed and wind direction. www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/#param=wind,stations=all. (accessed 30 July 2020).
19.Blanchard P, Bruning E. 2015. Mathematical Methods in Physics, vol. 69. Progress in Mathematical Physics, 2nd edn. Basel, Switzerland: Birkhuser. [Google Scholar]
20.Roberts GO, Rosenthal JS. 2001. Optimal scaling for various Metropolis-Hastings algorithms. Stat. Sci. 16, 351-367. ( 10.1214/ss/1015346320) [DOI] [Google Scholar]
21.Li J, Heap AD. 2008. A review of spatial interpolation methods for environmental scientists. Canberra/Australian Capital Territory (ACT), Australia: Geoscience Australia. [Google Scholar]
22.Hastie T, Tibshirani R, Friedman J. 2009. The elements of statistical learning. Springer Series in Statistics, 2nd edn. New York, NY: Springer. [Google Scholar]
23.Breiman L. 2001. Random forests. Mach. Learn. 45, 5-32. ( 10.1023/A:1010933404324) [DOI] [Google Scholar]
24.Goodfellow I, Bengio Y, Courville A. 2016. Deep Learning. Cambridge, MA: MIT Press. [Google Scholar]
25.Kingma DP, Ba J. 2017. Adam: a method for stochastic optimization.
26.Achberger C, Chen D, Alexandersson H. 2006. The surface winds of Sweden during 1999–2000. Int. J. Climatol. 26, 159-178. ( 10.1002/joc.1254) [DOI] [Google Scholar]
27.Holtslag AAM. 2003. Atmospheric turbulence. In Encyclopedia of Physical Science and Technology (ed. RA Meyers), 3rd edn, pp. 707–719. New York, NY: Academic Press.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Click here for additional data file.^{(644.8KB, pdf)}

Data Availability Statement

[RSPA20210236C1] 1.Smith A, Clifton A, Fields M. 2016. Wind plant preconstruction energy estimates: Current practice and opportunities. Technical report, NREL/TP-5000-64735, National Renewable Energy Laboratory, Golden, CO.

[RSPA20210236C2] 2.Bauer P, Thorpe A, Brunet G. 2015. The quiet revolution of numerical weather prediction. Nature 525, 47-55. ( 10.1038/nature14956) [DOI] [PubMed] [Google Scholar]

[RSPA20210236C3] 3.Reinhardt K, Samimi C. 2018. Comparison of different wind data interpolation methods for a region with complex terrain in central asia. Clim. Dyn. 51, 3635-3652. ( 10.1007/s00382-018-4101-y) [DOI] [Google Scholar]

[RSPA20210236C4] 4.Carta JA, Ramírez P, Velázquez S. 2009. A review of wind speed probability distributions used in wind energy analysis: case studies in the canary islands. Renew. Sustain. Energy Rev. 13, 933-955. ( 10.1016/j.rser.2008.05.005) [DOI] [Google Scholar]

[RSPA20210236C5] 5.Luo W, Taylor MC, Parker SR. 2008. A comparison of spatial interpolation methods to estimate continuous wind speed surfaces using irregularly distributed data from England and Wales. Int. J. Climatol. 28, 947-959. ( 10.1002/joc.1583) [DOI] [Google Scholar]

[RSPA20210236C6] 6.Pielke R. 2003. Mesoscale atmospheric modeling. In Encyclopedia of Physical Science and Technology (ed. RA Meyers), 3rd edn, pp. 383–389. New York, NY: Academic Press.

[RSPA20210236C7] 7.Appelhans T, Mwangomo E, Hardy DR, Hemp A, Nauss T. 2015. Evaluating machine learning approaches for the interpolation of monthly air temperature at Mt. Kilimanjaro, Tanzania. Spatial Stat. 14, 91-113. ( 10.1016/j.spasta.2015.05.008) [DOI] [Google Scholar]

[RSPA20210236C8] 8.Erxleben J, Elder K, Davis R. 2002. Comparison of spatial interpolation methods for estimating snow distribution in the colorado rocky mountains. Hydrol. Processes 16, 3627-3649. ( 10.1002/hyp.1239) [DOI] [Google Scholar]

[RSPA20210236C9] 9.Hengl T, Nussbaum M, Wright MN, Heuvelink GBM, Gräler B. 2018. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6, e5518. ( 10.7717/peerj.5518) [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPA20210236C10] 10.Cellura M, Cirrincione G, Marvuglia A, Miraoui A. 2008. Wind speed spatial estimation for energy planning in Sicily: introduction and statistical analysis. Renew. Energy 33, 1237-1250. ( 10.1016/j.renene.2007.08.012) [DOI] [Google Scholar]

[RSPA20210236C11] 11.Jung C, Schindler D. 2015. Statistical modeling of near-surface wind speed: a case study from Baden-Wuerttemberg (southwest Germany). Austin J. Earth Sci. 2, 1006. [Google Scholar]

[RSPA20210236C12] 12.Callaham JL, Maeda K, Brunton SL. 2019. Robust flow reconstruction from limited measurements via sparse representation. Phys. Rev. Fluids 4, 103907. ( 10.1103/PhysRevFluids.4.103907) [DOI] [Google Scholar]

[RSPA20210236C13] 13.Erichson NB, Mathelin L, Yao Z, Brunton S, Mahoney M, Kutz JN. 2020. Shallow neural networks for fluid flow reconstruction with limited sensors. Proc. R. Soc. A 476, 20200097. [DOI] [PMC free article] [PubMed] [Google Scholar]

[RSPA20210236C14] 14.Jin X, Laima S, Chen W-L, Li H. 2020. Time-resolved reconstruction of flow field around a circular cylinder by recurrent neural networks based on non-time-resolved particle image velocimetry measurements. Exp. Fluids 61, 1-23. ( 10.1007/s00348-019-2836-9) [DOI] [Google Scholar]

[RSPA20210236C15] 15.Kammonen A, Kiessling J, Plecháč P, Sandberg M, Szepessy A. 2020. Adaptive random fourier features with metropolis sampling. Foundations Data Sci. 2, 309. [Google Scholar]

[RSPA20210236C16] 16.Tempone RF. 1999. Approximation and interpolation of divergence free flows. Tesis de maestría. Universidad de la República (Uruguay). Facultad de Ingeniería.

[RSPA20210236C17] 17.Espath L, Kabanov D, Kiessling J, Tempone R. 2021. Statistical learning for fluid flows: sparse fourier divergence-free approximations. Phys. Fluids 33, 097108. ( 10.1063/5.0064862) [DOI] [Google Scholar]

[RSPA20210236C18] 18.The Swedish Meterological and Hydrological Institute SMHI. Download meterological observations, wind speed and wind direction. www.smhi.se/data/meteorologi/ladda-ner-meteorologiska-observationer/#param=wind,stations=all. (accessed 30 July 2020).

[RSPA20210236C19] 19.Blanchard P, Bruning E. 2015. Mathematical Methods in Physics, vol. 69. Progress in Mathematical Physics, 2nd edn. Basel, Switzerland: Birkhuser. [Google Scholar]

[RSPA20210236C20] 20.Roberts GO, Rosenthal JS. 2001. Optimal scaling for various Metropolis-Hastings algorithms. Stat. Sci. 16, 351-367. ( 10.1214/ss/1015346320) [DOI] [Google Scholar]

[RSPA20210236C21] 21.Li J, Heap AD. 2008. A review of spatial interpolation methods for environmental scientists. Canberra/Australian Capital Territory (ACT), Australia: Geoscience Australia. [Google Scholar]

[RSPA20210236C22] 22.Hastie T, Tibshirani R, Friedman J. 2009. The elements of statistical learning. Springer Series in Statistics, 2nd edn. New York, NY: Springer. [Google Scholar]

[RSPA20210236C23] 23.Breiman L. 2001. Random forests. Mach. Learn. 45, 5-32. ( 10.1023/A:1010933404324) [DOI] [Google Scholar]

[RSPA20210236C24] 24.Goodfellow I, Bengio Y, Courville A. 2016. Deep Learning. Cambridge, MA: MIT Press. [Google Scholar]

[RSPA20210236C25] 25.Kingma DP, Ba J. 2017. Adam: a method for stochastic optimization.

[RSPA20210236C26] 26.Achberger C, Chen D, Alexandersson H. 2006. The surface winds of Sweden during 1999–2000. Int. J. Climatol. 26, 159-178. ( 10.1002/joc.1254) [DOI] [Google Scholar]

[RSPA20210236C27] 27.Holtslag AAM. 2003. Atmospheric turbulence. In Encyclopedia of Physical Science and Technology (ed. RA Meyers), 3rd edn, pp. 707–719. New York, NY: Academic Press.

PERMALINK

Wind field reconstruction with adaptive random Fourier features

Jonas Kiessling

Emanuel Ström

Raúl Tempone

Abstract

1. Introduction

2. Problem formulation

(a) . Data

Figure 1.

Figure 2.

Figure 3.

(b) . Spatial interpolation models

(c) . Quality of fit

3. Models

(a) . Fourier models

(i) . Fourier series

(ii) . Random Fourier features

(b) . Benchmarking models

(i) . IDW

(ii) . Kriging

(iii) . Random forest

(iv) . Feedforward neural network

(v) . Weighted linear combination of model

4. Results

Figure 4.

Table 1.

Table 2.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Figure 9.

Figure 10.

5. Discussion

6. Conclusion

Supplementary Material

Acknowledgements

Appendix A

Proposition A.1. —

Proof. —

Footnotes

Data accessibility

Authors' contributions

Competing interests

Funding

References

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases