Skip to main content
UKPMC Funders Author Manuscripts logoLink to UKPMC Funders Author Manuscripts
. Author manuscript; available in PMC: 2022 Sep 7.
Published in final edited form as: IEEE J Sel Top Appl Earth Obs Remote Sens. 2018 Oct 26;11(12):4918–4931. doi: 10.1109/jstars.2018.2875330

Emulation as an Accurate Alternative to Interpolation in Sampling Radiative Transfer Codes

Jorge Vicent 1,, Jochem Verrelst 2, Juan Pablo Rivera-Caicedo 3, Neus Sabater 4, Jordi Muñoz-Marí 5, Gustau Camps-Valls 6, José Moreno 7
PMCID: PMC7613334  EMSID: EMS152625  PMID: 36081454

Abstract

Computationally expensive radiative transfer models (RTMs) are widely used to realistically reproduce the light interaction with the earth surface and atmosphere. Because these models take long processing time, the common practice is to first generate a sparse look-up table (LUT) and then make use of interpolation methods to sample the multidimensional LUT input variable space. However, the question arise whether common interpolation methodsperform most accurate. As an alternative to interpolation, this paper proposes to use emulation, i.e., approximating the RTM output by means of the statistical learning. Two experiments were conducted to assess the accuracy in delivering spectral outputs using interpolation and emulation: at canopy level, using PROSAIL; and at top-of-atmosphere level, using MODTRAN. Various interpolation (nearest-neighbor, inverse distance weighting, and piece-wice linear) and emulation [Gaussian process regression (GPR), kernel ridge regression, and neural networks] methods were evaluated against a dense reference LUT. In all experiments, the emulation methods clearly produced more accurate output spectra than classical interpolation methods. The GPR emulation performed up to ten times more accurately than the best performing interpolation method, and this with a speed that is competitive with the faster interpolation methods. It is concluded that emulation can function as a fast and more accurate alternative to commonly used interpolation methods for reconstructing RTM spectral data.

Index Terms: Emulation, interpolation, look-up tables (LUT), machine learning, peformance simulators, processing speed, radiative transfer models (RTMs)

I. Introduction

PHYSICALLY-BASED radiative transfer models (RTMs) allow remote sensing scientists to understand the light interactions between water, vegetation, and atmosphere [1]–[3]. RTMs are physically-based computer models that describe scattering, absorption, and emission processes in the visible to microwave region [4], [5]. These models are widely used in applications, such as inversion of atmospheric and vegetation properties from remotely sensed data (see [6] for a review), to generate artificial scenes as would be observed by a sensor [7]–[9], and sensitivity analysis of RTMs [10]. In the optical domain, a diversity of vegetation, atmosphere, and water RTMs have continuous been improved in accuracy from simple semiempirical RTMs toward advanced ray tracing RTMs. This evolution has led to an increase in complexity, intepretability, and computational requirements to run the model, which bears implications toward practical applications. On the one hand, computationally cheap RTMs are models with relatively few input parameters that enables fast calculations (e.g., [11] and [12] for vegetation and [13], [14] for atmosphere). On the other hand, computationally expensive RTMs are complex physically-based mathematical models with a large number of input variables. In short, the following families of RTMs can be considered as computationally expensive: Monte Carlo ray tracing models (e.g., Raytran [15], FLIGHT [16], and librat [17]), voxel-based models (e.g., DART [18]), and advanced integrated vegetation and atmospheric transfer models that consists of various subroutines (e.g., SimSphere [19], SCOPE [20], and MODTRAN [21]). Despite the higher accuracy of these RTMs to model the light-vegetation and atmosphere interactions (see e.g., [15] and [22]), their high computational burden make them impractical for practical applications that demand many simulations, and alternatives have to be sought.

In order to overcome this limitation, RTMs are most commonly applied by means of look-up tables (LUTs) [6]. LUTs are prestored RTM output data so that the computational of the RTM has to be done only one time, prior to the application. Nevertheless, for reasons of memory storage and processing time, LUTs are usually kept to a reasonable size, especially in case of computationally expensive RTMs. The common approach is then to seek through the multidimensional LUT input variable space by means of interpolation techniques. Various interpolation techniques have been developed both for gridded and scattered datasets [23]–[25]. Linear interpolation is the most used approach in both gridded and scattered datasets due to its balance between processing speed and accuracy [26]–[28]. However, the main drawback of the linear interpolation in high-dimensional scattered datasets is that the underlying triangulation is computationally expensive and uses large computer memory.

Emulation of costly codes is an alternative to interpolation, but based on statistical principles. The core idea of an emulator is to extract (or learn) the statistical information from a limited set of simulations of the original deterministic model [29], [30]. Emulators then approximate the original RTM at a tiny fraction of its speed and this can be readily applied in tedious processing routines [31], [32]. The use of emulators deals with some extra advantages such as the use of a nongridded input parameter space, making it more versatile than several interpolation methods (e.g., piece-wise cubic splines). In different research fields, such as engineering, energy, robotics, and environmental modeling, emulators have already demonstrated to be a more efficient alternative to classical LUT interpolation methods [29], [33]–[38]. However, these studies are generally limited to models with a few output dimensions, low levels of noise, or low degrees of freedom and ill-posedness. Therefore, an important question arises here whether emulators are able to compete with interpolation methods in sampling capabilities of hyperspectral RTM outputs, both in terms of accuracy and processing speed. The problem is thus new and actually challenges the potential capabilities of emulators. This brings us to the main objective of this paper, i.e., to analyze the performance of emulators as an alternative of classical RTM-based LUT interpolation for sampling the LUT parameter space. To do so, two contrasting LUT spectral outputs are examined: one of spectrally smooth top-of-canopy (TOC) reflectance data, and another of more sharper (high frequency bands) top-of-atmosphere (TOA) radiance data. We give experimental evidence that emulation generally outperforms interpolation in both computational cost and accuracy, which suggest they might be better suited for RTM-based LUT sampling.

The remainder of this paper is as follows. Section II gives a theoretical overview of the analyzed interpolation and emulation methods. Section III presents the materials and methods to study the performance of interpolation and emulation methods in terms of accuracy and computation time. This is followed by presenting the results in Section IV which are discussed in a broader context in Section V. Section VI concludes this paper.

II. Interpolation and emulation theory

In this section, we first present common interpolation methods (see Section II-A), and then address the emulation theory (see Section II-B).

A. Interpolation

Let us consider a D-dimensional input space χ from where we sample x ∈ χ ⊂ ℝD in which a K-dimensional object function f (x; λ) = [f (x; λ1),...,f(x; λK)] : ℝ → ℝK is evaluated. In the context of this paper, χ comprises the D input variables [e.g., leaf area index (LAI), aerosol optical thickness (AOT), visual zenith angle (VZA)] that control the behavior of the function f (x; λ), i.e., a water, canopy or atmospheric RTM. Here, λ represents the wavelengths in the K-dimensional output space.1 An interpolation, f^(x) is, therefore, a technique used to approximate model simulations, f(x)=f^(x)+ε, based on the numerical analysis of an existing set of nodes, fi = f (xi), conforming a precomputed LUT. The concept of interpolation has been widely used in remote sensing applications, including retrieval of biophysical parameters [39] and atmospheric correction algorithms [26], [27]. The following nonexhaustive list gives an overview of commonly used interpolation techniques in remote sensing.

1). Nearest-Neighbor

This is the simplest method for interpolation, which is based on finding the closest LUT node xi to a query point xq (e.g., by minimizing their Euclidean distance) and associate their output variables, i.e., f^(xq)=f(xi). This fast method is valid for both gridded and scattered LUTs.

2). Inverse Distance Weighting (IDW) [40]

Also known as Shepard’s method, this method weights the n closest LUT nodes to the query point xq (1) by the inverse of the distance metric d(xq, xi) : χ → ℝ+ (e.g., the Euclidean distance)

f^(xq)=i=1nωif(xi)i=1nωi (1)

where ωi = d(xq, xi)–p and p (typically p = 2) is a tuneable parameter known as power parameter. When p is large, this method produces the same results as the nearest-neighbor interpolation. The method is computationally cheap but it is affected by LUT nodes far from the query point. The modified Shepard’s method [41] aims to reduce the effect of distant grid points by modifying the weights by

ωi=(Rd(xq,xi)Rd(xq,xi))p (2)

where R is the maximum Euclidean distance to the n closest LUT nodes.

3). Piece-Wise Linear

This method is commonly used in remote sensing applications due to its balance between computation time and interpolation error [26], [42], [43]. The implementation of the linear interpolation is based on the Quickhull algorithm [44] for triangulations in multidimensional input spaces. For the scattered LUT input data, the piece-wise linear interpolation method is reduced to find the corresponding Delaunay simplex [45] (e.g., a triangle when D = 2) that encloses a query D-dimensional point xq [see (3) and Fig. 1]

f^i(xq)=j=1D+1ωjf(xj) (3)

where ωj are barycentric coordinates of xq with respect to the D-dimensional simplex (with D + 1 vertices) [46]. Notice that, though similar, the IDW with parameters n = D + 1 and p = 1 is not strictly the same as linear interpolation since IDW uses Euclidean distances instead of the barycentric coordinates in linear interpolation. Since f(x) is a K-dimensional function, the result of the interpolation is also K-dimensional.

Fig. 1. Schematic representation of a two-dimensional interpolation of a query point xq (white *) after Delaunay triangulation (solid lines) of the scattered LUT nodes Xi (*).

Fig. 1

In scattered LUTs, the underlying Delaunay triangulation is computationally expensive in high dimensional input spaces (typically D > 6) and is also limited by its intensive memory consumption [44], [47].

Other existing advanced interpolation methods (e.g., Sibson’s interpolation [46], [48] and piece-wise cubic splines [49]) were not considered in the analysis due to their even more intensive memory consumption in high-dimensional input spaces.

B. Emulation

Emulation is a statistical learning technique used to estimate model simulations when the model under investigation is too computationally costly to be run many times [29]. The concept of developing emulators have already been applied in the last few decades in the climate and environmental modeling communities [19], [50]–[56]. The basic idea is that an emulator uses a limited number of simulator runs, i.e., input-output pairs (corresponding to training samples), to train a machine learning regression algorithm in order to infer the values of the complex simulator output given a yet-unseen input configuration. These training data pairs should ideally cover the multidimensional input parameter space using a space-filling sampling algorithm, e.g., Latin hypercube sampling [57].

As with the LUT interpolation, once the emulator is built, it is not necessary to perform any additional runs of the model; the emulator computes the output that is otherwise generated by the simulator [29]. Accordingly, emulators are statistical models that can generalize the input–output relations from a subset of examples generated by RTMs to unseen data. Note that building an emulator is essentially nothing more than building an advanced regression model as typically done for biophysical parameter retrieval applications [see pioneering works using neural networks (NNs] [58], [59] and also more recent statistical methods [6], [60], [61], but in reversed order: whereas a retrieval model converts input spectral data (e.g., reflectance) into one or more output biophysical variables, an emulator converts input biophysical variables into output spectral data.

When it comes to emulating RTM spectral outputs, however, the challenge lies in delivering a full spectrum, i.e., predicting contiguous spectral bands. This is an additional difficulty compared to traditional interpolation methods or standard emulators that only deliver one output [62]. It bears the consequence that the machine learning methods should be able to generate multiple outputs to be able reconstructing a full spectral profile. This is not a trivial task. For instance, the full, contiguous spectral profile between 400 an 2500 nm consists of over 2000 bands when binned to 1 nm resolution. Not all regression models are able to deal with high-dimensional outputs. Only some of them can obtain multioutput models. For instance, with NNs it is possible to train multioutput models. However, training a complex multioutput statistical model with the capability to generate so many output bands would take considerable computational time and would probably incur in a certain risk of overfitting because of model overrepresentation. A workaround solution has to be developed that enables the regression algorithms to cope with large, spectroscopy datasets. An efficient solution is to take advantage of the so-called curse of spectral redundancy, i.e., the Hughes phenomenon [63]. Since spectroscopy data typically shows a great deal of collinearity, it implies that such data can be converted to a lower-dimensional space through dimensionality reduction (DR) techniques. Accordingly, spectroscopy data can be converted into components, which are only a fraction of the original amount of bands, and implies that the multioutput problem is greatly reduced to a number of components that preserve the spectral information content. Afterward the components are then again reconstructed to spectral data. In this paper, we first apply a principal component analysis (PCA) [64] to the spectral data in order to reduce it to a given number of features (components). This step greatly reduces the number of dimensions while keeping 99% of the spectra variance. Through DR, the problem is better conditioned and allows us to either train multioutput or single-output models on this reduced set of components [30]–[32], [55], [65]. As the models are trained to predict on the reduced set of components, the final step of the process is to project back the predictions to the original spectra space by applying the inverse PCA.

C. Machine Learning Regression Algorithms

Two steps are required to enable approximating an RTM through emulation. The first step involves building a statistically-based representation (i.e., an emulator) of the RTM using statistical learning from a set of training data points derived from runs of the actual model under study (LUT nodes in the context of interpolation). The second step uses the emulator previously built in the first step to compute the output in the LUT input parameter space that would otherwise have to be generated by the original RTM [29]. Based on the literature review above and earlier emulation evaluation studies [31], [32], [65], the following three machine learning methods potentially serve as powerful methods to function as accurate emulators, being: First, Gaussian processes regression (GPR); second, kernel ridge regression (KRR); and third, NNs.

We selected these three methods as representative the state-of-the-art machine learning families for regression. KRR generalizes linear regression via kernel functions. The GPR is essentially the probabilistic version of the KRR, and has been widely used for biogeophysical parameter and emulation [36], [66], [67]. NNs are standard approximation tools in statistics and artificial intelligence, and are currently revived through the popular adoption of deep learning models [68]. We explore all these techniques for the sake of a complete benchmarking of standard methods available. These methods are briefly outlined later.

Kernel methods in the machine learning owe their name to the use of kernel functions [69]–[71]. These functions quantify similarities between input samples of a dataset. Similarity reproduces a linear dot product (scalar) computed in a possibly higher dimensional feature space, yet without ever computing the data location in the feature space. The following two methods are gaining increasing attention: the GPR generalize Gaussian probability distributions in function spaces [72], and KRR, which perform least squares regression in feature spaces [73]. The expressions defining the weights and the predictions obtained by GPR and KRR are the same, but interestingly these expressions are obtained following different approaches. GPR follow a probabilistic approach (see [72]), whereas KRR implement a discriminative approach for regression and function approximation. In both cases, the prediction and the predictive variance of the model for new samples are given by

f^(xq)=i=1nαik(xi,xq) (4)
V[f^(xq)]=k(xq,xq)k*T(K+σn2I)1k* (5)

where k(·, ·) is a covariance (or kernel function), k* is the vector of covariances between the query point, xq, and the n or training points, and σn2 accounts for the noise in the training samples. As one can see, the prediction is obtained as a linear combination of weighted kernel (covariance) functions, the optimal weights given by α=(K+σn2I)1f(x). Many different functions can be used as kernels for both GPR and KRR. In this paper, we used a standard Gaussian radial basis function kernel for KRR, which has a single length hyperparameter for all input dimensions, and the automatic relevance determination squared exponential kernel for GPR, which has a separate length hyperparameter for each input dimension. For KRR, these hyperparameters are tuned through standard cross-validation techniques are used to choose the best hyperparameters. For GPR, stochastic gradient descent algorithms maximizing the marginal log-likelihood are employed, which allow us to optimize a large number of hyperparemeters (compared to KRR) in a computational effective way.

NNs are essentially fully connected layered structures of artificial neurons [74]. An NN is a (potentially fully) connected structure of neurons organized in layers. Neurons of different layers are interconnected with the corresponding links (weights). The output on the final layer of the NN, and thus the prediction, is given by

f^(x)=g(k=1nwjklxkl1+bkl) (6)

where wjkl and bkl are the weights and bias at the lth layer, respectively, xkl1 is the input vector at the l – 1th layer, and g is an activation function, which at the output layer and for regression problems could be the identity function. Training an NN implies selecting a structure (number of hidden layers and nodes per layer), initialize the weights, shape of the nonlinear activation function, learning rate, and regularization parameters to prevent overfitting [75]. The selection of a training algorithm and the loss function both have an impact on the final model. In this paper, we used the standard multilayer perceptron, which is a fully-connected network. We selected just one hidden layer of neurons. We optimized the NN structure using the Levenberg–Marquardt learning algorithm with a squared loss function.

One can note the similarity between the prediction functions used in the emulators [see (4) and (6)] with those used for interpolation (see Section II-A). In fact, the theoretical relation between both approaches was extensively discussed back in 1970 in the context of splines and GPR in [76]. Essentially, machine learning emulators perform regression and, hence, are more flexible functions for fitting than interpolation, as the solution is not forced to pass through the observed points. On the downside, the emulation approach may be hampered by an adequate estimation of the hyperparameters (i.e., regularization). When a good estimate of the hyperparameters is achieved emulation should obtain equal or better results than interpolation, otherwise the regression may incur in a certain risk of overfitting.

III. Materials and Methods

In this section, we will start by giving an overview the software used to generate the synthetic datasets used to assess the performance of the interpolation/emulation methods (see Sections III-A and III-B). We will continue by describing these datasets (see Section III-C) and finish by explaining the error metrics used to evaluate the performance of the interpolation/emulation methods (see Section III-D).

A. Automated Radiative Transfer Models Operator (ARTMO) and Atmospheric Look-Up Table Generator (ALG) Toolboxes

This study was conducted within two in-house developed graphical user interface software packages named ARTMO [77] and ALG [78]. Both software packages facilitate the usage of a suite of leaf, canopy and atmosphere RTMs including, among others, PROSAIL (i.e., the leaf model PROSPECT coupled with the canopy model SAIL [79]) and MODTRAN5. As a novelty, the latest ARTMO version (v. 3.24) is coupled with ALG (v. 1.2), which allows generating large multidimensional LUTs of TOA radiance data for Lambertian surfaces.

ARTMO also embodies a set of retrieval toolboxes, and recently an “Emulator toolbox” was added [65]. In the Emulator toolbox, several of those MLRAs can be trained by RTM-generated LUTs, whereby biophysical variables are used as input in the regression model, and spectral data is generated as an output. In addition, ALG includes a function that allows various methods of interpolating gridded and scattered LUTs (i.e., nearest neighbor, piece-wise linear/splines, IDW). The ARTMO and ALG packages are developed in MATLAB and can be freely downloaded from http://ipl.uv.es/artmo/.

B. Description of Simulated Datasets

1). Prospect-4

The leaf optical model PROSPECT-4 [11] calculates leaf reflectance and transmittance as a function of four biochemistry and anatomical variables: leaf structure (N), equivalent water thickness (Cw), chlorophyll content (Cab), and dry matter content (Cm). PROSPECT-4 simulates directional reflectance and transmittance over the solar spectrum from 400 to 2500 nm at the fine spectral resolution of 1 nm.

2). Sail

At the canopy scale, SAIL [12] approximates the RT equation through two direct fluxes (incident solar flux and radiance in the viewing direction) and two diffuse fluxes (upward and downward hemispherical flux) [80]. SAIL input variables consist of LAI, leaf angle distribution (LAD), ratio of diffuse and direct radiation (skyl), soil coefficient (soil coeff.), hot spot and sun-target-sensor geometry, i.e., solar/view zenith angle and relative azimuth angle (SZA, VZA and RAA, respectively). Spectral input consists of leaf reflectance and transmittance spectra and a soil reflectance spectrum. The leaf optical properties can come from a leaf RTM such as PROSPECT, which results in the leaf-canopy model PROSAIL [3]. PROSAIL allows analyzing the impact of leaf biochemical variables on the hemispherical and bidirectional TOC reflectance.

3). Modtran5

At the atmosphere scale, MODTRAN5 [21], the moderate resolution transmittance code, is one of the most widely used radiative transfer codes in the atmospheric community due to its accurate simulation of the coupled absorption and scattering effects [81], [82]. MODTRAN solves the RT equation in a multilayered spherically symmetric atmosphere by including the effects of molecular and particulate absorption/emission and scattering, surface reflections and emission, solar/lunar illumination, and spherical refraction.

C. Experimental Setup

Here, we outline the experimental setup for running the interpolation and emulation experiments. For both PROSAIL and MODTRAN RTMs, LUTs were generated by means of Latin hypercupe sampling (LHS) within the RTM variable space with minimum and maximum boundaries as given in Tables I and II. The selected input variables were chosen given their influence in both the entire spectra (e.g., Aerosol optical thickness, Ångström exponent) and in specific absorption bands (e.g., Chlorophyll absorption, water vapour, and ozone). An LHS of training data is preferred, as LHS covers the full parameter space, and thus, in principle, assures that the developed emulator/interpolation will be able to reconstruct correct spectral output for any possible combination of input variables. For both the canopy and atmospheric RTMs, three sizes of LUTs were created given the same LUT boundaries: 500, 2000, and 5000. While the most dense LUT (5000) was used as a reference LUT to evaluate the performances of the emulation and interpolation algorithms, the first two LUTS (500 and 2000) where simulated to actually run the emulation and interpolation techniques for different LUTs sizes. Additionally, the 64 vertex of the input variable space (i.e., where the input variables get the minimum/maximum values) were added to these two LUTs. The addition of these vertex enables consistent functioning of all tested interpolation techniques, i.e., that the input variable space is bounded and no extrapolation is performed.

Table I. Range of Vegetation Input Variables for the PROSAIL LUTs According to Latin Hypercube Sampling.

Model variables Units Minimum Maximum
Leaf variables (PROSPECT-4)
N Leaf structure index unitless 1.3 2.5
Cw Leaf water content [cm] 0.002 0.05
Cab Leaf chlorophyll content [μg/cm2] 1 70
Cm Leaf dry matter content [g/cm2] 0.002 0.05
Canopy variables (SAIL)
LAI Leaf area index [m2/m2] 0.1 7
LAD Leaf angle distribution [°l 0 90

SAIL fixed variables: hot spot: 0.01; solar zenith angle: 30°; observer zenith angle: 0°; azimuth angle: 0°.

Table II. Range of Atmospheric Input Variables for the MODTRAN LUTs According to Latin Hypercube Sampling.

Model variables Units Minimum Maximum
O3C O3 column concentration [amt-cm] 0.2 0.45
CWV Columnar Water Vapour scale-factor 1 4
AOT Aerosol Optical Thickness unitless 0.05 0.4
G Asymmetry parameter unitless 0.65 0.99
α Ångström exponent unitless 1 2
SSA Single Scattering Albedo unitless 0.75 1

MODTRAN fixed geometric variables: solar zenith angle: 55°; observer zenith angle: 0°; azimuth angle: 0°. Remaining MODTRAN parameters were set to their default values.

The MODTRAN LUTs consist on TOA radiance spectra constructed according to (7) under the Lambertian assumption

LTOA=L0+(Edirμs+Edif)(Tdif+Tdir)ρπ(1Sρ) (7)

where L0 is the path radiance, Edir/dif are the direct/diffuse at-surface solar irradiance, Tdir/dif are the surface-to-sensor direct/diffuse atmospheric transmittance, S is the spherical albedo, μs is the cosine of SZA, and ρ is the Lambertian surface reflectance (in our case we used the conifer trees surface reflectance from ASTER spectral library [83]). The atmospheric transfer functions are derived after applying the MODTRAN interrogation technique described in [26].

For the emulation approach, each LUT was used to develop and evaluate the different statistical models.

The role of number of components has been systematically studied before [32]. The selection of 10 and 20 PCA components (i.e.,~ 100% explained variance) was found an acceptable trade-off between accuracy and processing time. Better reconstruction of the spectral profiles can be achieved with additional components, but at expenses of slower processing times. Further, since emulators only produce an approximation of the original model, it is important to realize that such an approximation introduces a source of uncertainty referred to as “code uncertainty” associated to the emulator [29]. Therefore, validation of the generated model is an important step in order to quantify the emulator’s degree of accuracy. To test the accuracy of the 500- and 2000-LUT emulators, part of the original data is kept aside as validation dataset. Various training/validation sampling design strategies are possible with the “Emulator toolbox.” Because of the deterministic nature of RTM data, an initial cross-validation sampling testing led to similar accuracies as one-time validation. To speed up the processing time [31], a single data split was, therefore, applied using 70% samples for training and the remaining 30% for validation.

D. Validation

In order to show the differences between the RTM outputs and the approximation inferred by interpolation or emulation techniques, some goodness-of-fit statistics as a function of wavelength are calculated against the n = 5000 references LUTs as generated by the RTMs. The root-mean-square error (RMSE) and the normalized RMSE (NRMSE) [%] [see (8) and (9)] are calculated, both per wavelength and then averaged over all wavelengths (λ)

RMSE=1ni=1n[f(xi)f^(xi)]2 (8)
NRMSE=100RMSEfmaxfmin (9)

where fmax and fmin are, respectively, the maximum and minimum values of the n spectra in the reference dataset. A closer inspection will be given to the most interesting results by plotting the histogram of the relative residuals (εi, in absolute terms and expressed in %)

εi=100|f(xi)f^(xi)|f(xi) (10)

Specifically, the average relative error and the percentiles 2.5%, 16%, 84%, and 95.5% will be plotted as function of wavelength.

The processing time of executing the emulator/interpolation method on the reference dataset has also been tracked. These calculations were performed in a i7-4710MP CPU at 2.5 GHz with 16 GB of RAM and 64-bits operating system.

IV. Results

In this section, we will show the results of applying the emulator and interpolation methods on the described canopy and top-of-atmosphere datasets. In Section IV-A, we will show an overview of the performance of interpolation and emulation methods in terms of accuracy and computation time. In Section IV-B, we will inspect in greater detail the error histograms for the best performing interpolation and emulation methods.

A. Interpolation Versus Emulation Comparison

For both PROSAIL and MODTRAN outputs four scenarios are evaluated: training/interpolating with 500 and 2000 samples. The emulation approach is additionally tested with entering 10 or 20 components in the regression algorithm. All approaches are validated against the reference 5000 samples’ LUTs.

Starting with the PROSAIL analysis, validation results and processing time is given in Table III. NRMSE results along the spectral range for the four scenarios are shown in Fig. 2. Inspection of these four graphs suggest the following. Each of the four scenarios show approximately the same patterns, with expected higher NMRSE errors in spectral channels with lower reflectance values (e.g., bottom of Chlorophyll absorption at 680 nm and inside the water absorptions at 1440 nm and 1900 nm). The three emulation methods clearly outperform the three interpolation methods in reproducing LUT reflectance spectra. The GPR is best able to reconstruct spectra with high accuracies (i.e., low NRMSE errors). The KRR is second best performing, while NN performs still better than the interpolation techniques but no longer with a substantial gain in accuracy. Among the interpolation methods, linear and IDW achieve similar accuracy, particularly when using the 2000 samples LUT. Only in the near-infrared plateau (i.e., 720–1300 nm) linear interpolation obtain the lowest NRMSE errors among the interpolation techniques, similar to those obtained with NN emulation. Thereby, results improved when more input data is involved, i.e., when the statistical models are trained with more samples, or when a denser LUT is used for interpolation. This is clearly notable when comparing the results of 2000 samples with that those of 500. A decrease in errors is especially noticeable for KRR, but also the interpolation methods lower errors with a few percents. The superiority of emulation methods can perhaps be better appreciated when considering Table III: GPR trained by a 2000-LUT yielded RMSEλ on the order of ten times lower than the best interpolation method.

Table III. PROSAIL Interpolation and Emulators Validation Results Against 5000 LUT Reference Dataset (RMSEλ, NRMSEλ) and Processing Time (s: seconds).

Method RMSEλ NRMSEλ (%) CPU (s)
LUT training size: 500 2000 500 2000 500 2000
Interpolation:
- Nearest 0.051 0.042 11.60 9.61 0.2 0.7
- Linear 0.042 0.030 9.99 7.05 62 171
- IDW 0.039 0.032 8.86 7.44 0.5 1.2
Emulation 10PCA:
- GPR 0.005 0.003 1.23 0.68 0.7 2.2
- KRR 0.015 0.007 3.56 1.67 0.1 0.2
- NN 0.024 0.022 5.33 5.00 0.2 0.2
Emulation 20PCA:
- GPR 0.005 0.003 1.21 0.64 0.9 4.2
- KRR 0.015 0.007 3.54 1.67 0.3 0.3
- NN 0.021 0.022 4.74 5.04 0.2 0.2

Fig. 2. PROSAIL interpolation versus emulation results. Note that the number of PCA components refers only to the emulator methods since no DR is applied in interpolation.

Fig. 2

PROSAIL emulation results can further improved when more components are entered in the statistical learning, as then in principle more variability is preserved. However, these improvements were not obvious in our results: when doubling the number of components from 10 to 20 hardly differences were observed for KRR and GPR. This is especially the case for the 2000 training LUT: Table III gives the same RMSEλ results. Hence, this suggests that about ten components are more than enough to preserve a maximal amount of information. NN appears more affected by the number of components in case of the 500 training samples: clear improvements can be observed from 1500 nm onward. Conversely, in the visible part errors increased, which implies that the gain of adding more components is not systematic. In case of trained with 2000 samples then doubling the components did not influence at all.

When subsequently also considering processing time (see Table III), then the emulation methods become even more attractive. Although the interpolation methods nearest neighbor and IDW are very fast (below 1% of the slowest, linear interpolation, method), they are not the most accurate: especially nearest neighbor is fastest but also the poorest performing. On the contrary, the emulation methods are not only accurate, but are also very fast. GPR processes the output spectra with a speed that is on the order of these interpolation methods. However, the GPR is affected by the training size and number of components, which slows down somewhat the processing. Yet, even for the 2000 LUT and including 20 components the processing of 5000 output spectra took only a few seconds, i.e., 2.4% of the time spent by linear interpolation. NN and KRR deliver spectral output still several times faster, in a fraction of a second, and this regardless of the training size. Hence, when a tradeoff between accuracy and processing speed is to be made, given that NN delivers poorer accuracies, then KRR tends to become an attractive option. In all cases KRR emulated the 5000 spectra a factor 100 faster than linear interpolation (0.3 s), and this with second-best accuracies.

The same analysis has been repeated with the MoDTRAN LUTs (see Table IV and Fig. 3). NRMSE results are now plotted in logarithmic scale in order to better visualize differences between the various interpolation/emulation methods and the wide range of error values between inside and outside atmospheric absorption bands. Similar patterns as the PROSAIL results are obtained, yet some differences should be remarked. GPR and second KRR are again clearly top performing LUT parameter sampling methods. Table IV indicates that KRR and especially GPR yielded RMSEλ results more than ten times lower than the best performing interpolation method. However, NN performs now on the order of the interpolation methods. Regarding the interpolation methods, linear interpolation now systematically outperformed the other two methods, but still errors are nearly one order of magnitude higher than the GPR and KRR emulators. Moving from 500 to 2000 training samples did not lead to significant improvements. Table IV suggests that only for KRR a substantial gain in accuracy was achieved. The same holds for adding more components into the emulators: although some small improvements can be obtained with more components, e.g., as is noticeable for GPR, overall the gain in accuracy is modest.

Table IV. MODTRAN Interpolation and Emulators Validation Results Against 5000 LUT Reference Dataset (RMSEλ, NRMSEλ) and Processing time (s: seconds).

Method RMSEλ NRMSEλ (%) CPU (s)
LUT training size: 500 2000 500 2000 500 2000
Interpolation:
- Nearest 1.160 0.896 7.58 5.83 0.3 0.8
- Linear 0.386 0.265 2.84 1.98 68 183
- IDW 0.837 0.630 5.47 4.12 0.5 1.3
Emulation 10PCA:
- GPR 0.037 0.031 0.65 0.59 0.5 2.0
- KRR 0.100 0.054 1.01 0.73 0.1 0.2
- NN 0.488 0.406 4.13 3.62 0.1 0.1
Emulation 20PCA:
- GPR 0.029 0.022 0.43 0.37 1.2 4.7
- KRR 0.094 0.051 0.84 0.56 0.1 0.2
- NN 0.487 0.466 4.83 4.40 0.1 0.1

Fig. 3. MODTRAN interpolation versus emulation results. Note that the number of PCA components refers only to the emulator methods since no DR is applied in interpolation.

Fig. 3

Also regarding processing time similar trends emerged as for PROSAIL: all three emulation methods produced the spectral output very fast, with NN and KRR delivering the 5000 MODTRAN-like spectra in a fraction of a second (a factor <1% when compared against linear interpolation). GPR suffered somewhat from adding more samples and components, leading to a slightly slower emulation method in case of 2000 LUT and trained with 20 components: the output spectra is again produced in a few seconds.

B. Closer Inspection of Best Performing Results

Having observed the general trends of the interpolation and emulation methods, in this section, we will inspect a few methods in more detail. Specifically, the histograms of the relative residuals (in absolute terms) for the best performing interpolation and emulation methods, i.e., linear and GPR, are plotted in Figs. 4 and 5. Thereby, the linear interpolation method is shown as obtained with a 2000-LUT, whereas the GPR is shown as trained with only a 500-LUT and 10 PCA components. Interestingly, although the emulator method is not presented in its optimized configuration, already a substantial gain in accuracy as compared to the optimized linear interpolation method is achieved.

Fig. 4.

Fig. 4

Histogram statistics of PROSAIL relative residuals (in absolute terms)(%) for 2000-LUT interpolation linear (top) and 500-LUT GPR (bottom). To ease the comparison, the mean residual for linear interpolation is added on top of the GPR residuals (red dashed line).

Fig. 5.

Fig. 5

Histogram statistics of MODTRAN relative residuals (in absolute terms)(%) for 2000-LUT interpolation linear (top) and 500-LUT GPR (bottom). To ease the comparison, the mean residual for linear interpolation is added on top of the GPR residuals (red dashed line).

To fully appreciate the predictive power of the emulator methods, we start by analyzing the results on the PROSAIL residuals in Fig. 4. We can observe how the GPR emulator obtains relative errors that, on average, are a factor 5–10 lower than those obtained with the linear interpolation (see mean values on the solid line). These error differences between the GPR emulator and linear interpolation methods are still maintained on both the lower and higher part of the histogram (see 2.5% and 84% percentiles, where the reconstruction errors are lower/higher, respectively).

We continue by analyzing the results on the MODTRAN residuals in Fig. 5. As previously observed in NRMSE values (see Fig. 3 and Table IV), the reconstruction errors with GPR emulators obtains the best performance. The residual errors are in this case a factor 2–10 lower than using linear interpolation for both the lowest and highest errors in the histogram and in most part of the spectrum (<1800 nm).

V. Discussion

Advanced RTMs are widely used in various remote sensing applications, such as atmospheric correction (e.g., [84]), inversion of vegetation properties (see [6] for a review), sensitivity analysis [10], and scene generation [7]. Because advanced RTMs take long processing time, the LUT interpolation is commonly used to sample the input parameter space and to infer an approximation of the RTM outputs [26]. Although various interpolation techniques are standard practice in remote sensing applications and computer vision, in this paper, we challenged these techniques by comparing them against statistical learning methods, i.e., emulation. The basic principle of emulation is using a sparse LUT to train machine learning methods so that the trained statistical model is able to reproduce the spectral output given unseen parameter combinations. According to this principle, the emulation technique can be considered to function similarly as interpolation methods, but based on statistical learning.

To ascertain the predictive power of both interpolation and emulation, two experiments were conducted: one for surface reflectance as generated by PROSAIL, and another for TOA radiance data as generated by MODTRAN. For both experiments the interpolation and emulation methods were tested against a common reference dataset of 5000 simulations. Despite the TOA radiance data being much more irregular and spiky (due to narrow atmospheric absorption regions) than surface reflectance, common trends were observed in the majority of cases. They are summarized as follows.

  • 1)

    In all experiments, the three tested emulation methods produced substantially more accurately spectral outputs than the three tested interpolation techniques. Particularly, GPR was by far the most accurate emulation method with errors on the order of 5–10 times lower than the linear interpolation method and at a fraction of the time with respect linear interpolation. The KRR was the second most accurate emulation method, with reconstruction errors 2–5 times lower than the best performing interpolation method. The GPR is thus clearly the preferred method given accuracy and fast processing. At the same time, the KRR run still much faster than GPR, i.e., producing 5000 spectra in 0.3 s, which is faster than any interpolation method behind while reaching superior accuracies.

  • 2)

    Regarding the emulation methods, the GPR and KRR emulate output spectra not only perform more accurately, but also tend to be more stable than the NN (e.g. Fig. 2, in the visible part). Having more parameters than KRR/GPR to adjust, training the NN emulator can be more complicated, and thus requires a larger number of training samples to avoid overfitting and to obtain a good prediction. In addition, an increasing number of components can make the problem harder in terms of adjusting the NNs parameters. Conversely, for GPR and KRR clearly some little extra gain can be achieved by training with more samples or by adding more PCA components in the regression model. However, this slows down processing time, especially in case of GPR (see also [32] for a discussion on GPR emulation).

  • 3)

    When comparing the PROSAIL emulation results with those of MODTRAN, e.g., as in the histogram statistics (Figs. 4 and 5), it is noteworthy that the spectrally smoother TOC reflectance is reconstructed with lower accuracy than the more irregular TOA radiance. Although the parameter space of both LUTs consists of six variables, the discrepancy can be explained that the six PROSAIL variables exert more influence, in relative terms, on the reflectance data than the six MODTRAN variables on the TOA radiance data, implying more variability to construct the reflectance spectra. This has been observed before with a global sensitivity analysis [31].

  • 4)

    Regarding the interpolation methods, while performing substantially poorer than emulation, the linear interpolation is the most accurate method. This is particularly the case when increasing the LUT size. However, it is also the most computationally intensive interpolation method, i.e., on the order of few minutes for 5000 spectra. This is mostly due to the exponential increase of memory usage in construction of the implicit Delaunay triangulation [44].

The superiority of KRR and especially GPR emulation is most noteworthy. The strength of the GPR statistical learning was already demonstrated in previous studies with application of machine learning algorithms in various fields [66], [85], [86], yet in fact any machine learning regression algorithm can function as emulator. Moreover, since the accuracy and run-time of these methods depend on the size of training LUT and number of components, these methods can be further optimized in view of balancing between accuracy and processing speed. For instance, it is likely that the GPR can still deliver excellent accuracies with a smaller training LUT or less components, which would imply a faster processing. As addressed before [32], it requires some iterations to deduce an optimized accuracy-speed tradeoff.

Hereby, another point to be remarked is that emulation requires the additional effort of a training step and validation of the model. This implies that a sparse LUT is always required to train an emulator, i.e., to finding the best hyperparameters that optimize its performance. Two issues have an impact on finding these optimal parameters: the method for tuning the hyperparameters and the size of the training dataset. Regarding the first issue, and in the specific case of the implemented GPR, the maximum log-likelihood method was adopted to find the best set of hyperparameters. Nevertheless, other alternative procedures are often employed with similar success and adoption. Examples include random sampling [87], the Nelder–Mead method (aka downhill simplex) [88], Bayesian optimization [88]–[91], and many flavors of derivative-free optimization approaches, such as stochastic local search, simulated annealing, or evolutionary computation [92], [93]. Other approaches consider Monte Carlo methods [94]–[96], which search in a portion of the space according to the posterior distribution of the hyperparameters given the observed data. With respect to the training dataset, we found in our examples that typically about 500 samples according Latin hypercube sampling should suffice. The training and model validation step requires some additional processing time as compared to interpolation. In case of KRR this is in the order of seconds, but NN and GPR that can take longer depending on the complexity of training setup (number of samples and components). Nevertheless, the training phase is to be done only one time; the generated emulator model can afterward replace the LUT interpolation in the processing chain. This approach of replacing an LUT interpolation by an emulator may lead not only to a more accurate processing, but likely also faster and less computationally demanding.

With respect to interpolation methods, in this paper, we focused on exploring the accuracy of the most widely used methods working on scattered LUT data. From the considered methods, the linear interpolation achieve the higher accuracy. However, this method is typically limited to low dimensions (<6) of the input space due to its exponential demands of memory usage and computation time. Other more advanced interpolation methods (not studied here) exist for gridded LUT data. Among them, piece-wise cubic splines interpolation [49] can achieve high accuracies, but at expenses of an increase of computation time with respect linear interpolation. Also, cubic splines require gridded LUTs with at least four points in each dimension (e.g., 4096 LUT samples for six dimensions), which implies both an increase of memory usage and computation time to generate and interpolate the resulting LUTs. Sibson method is another advanced and accurate interpolation algorithm [?] that can have very fast implementations for low-dimension spaces (see, e.g., [48]). However, the extension of Sibson interpolation in high-dimensional input variable spaces (>6) is likely not to be effective due to its exponential growth in memory consumption. In turn, the emulation approach only requires a small LUT for training, which in principle can be developed for any number of variables. It remains yet to be studied, how adding more variables to the LUT parameter space affects the accuracy of the emulator. This will also depend on the role these variables play in driving the spectral output [32].

Altogether, considering all strengths and weaknesses of both interpolation and emulation, this study leads us concluding that the emulation technique can become an attractive alternative to interpolation in sampling an LUT parameter space. Based on the results presented here, it is foreseen that relying on emulation rather than interpolation will lead to more accurate, and typically faster querying through an LUT parameter space. This bears consequences in various RTM-based processing applications in which high accuracy and fast run-time is needed, e.g., in scene generation [32], [97], [98], in atmospheric correction procedures, in LUT-based inversion of biophysical parameters [99], [100], in hyperspectral target detection [101] or in instrument performance modeling [102]. Further studies are required to consolidate whether emulation techniques are always to be trusted as functioning more accurately than interpolation techniques.

VI. Conclusion

Computationally expensive RTMs are commonly used in various remote sensing applications. Because these RTMs take long processing time, the common practice is develop one time an LUT and then making use of interpolation techniques to sample the LUT parameter space. However, the question arose whether these interpolation techniques are most accurate. This paper proposed to use emulation, i.e., statistical learning, as an alternative to interpolation. Two experiments were conducted to ascertain the accuracy in delivering spectral outputs of both techniques: one for TOC reflectance as generated by PROSAIL, and another for TOA radiance data as generated by MODTRAN. The interpolation and emulation methods were evaluated against a reference LUT of 5000 simulations, leading to the following results.

  • 1)

    In all experiments the emulation methods clearly produced output spectra more accurately than the tested interpolation techniques.

  • 2)

    GPR reproduced RTM output spectra up to ten times more accurately than interpolation methods and this with a speed that is <5% of the linear interpolation method, i.e., in mere seconds. The KRR was the second most accurate method, and this emulator is extremely fast: 5000 spectra were produced in a fraction of a second. Hence, the KRR shows an attractive tradeoff between accuracy and computational time.

  • 3)

    Regarding emulation, some little more gain can be achieved by training with more samples or adding more PCA components in the regression model. However, for GPR this is at the expense of somewhat slowing down processing. It is thus concluded that emulation methods offer a better alternative in computational cost and accuracy than traditional methods based on interpolation to sample an LUT parameter space.

Future work will aim to include GPR emulators as an alternative to the current LUT interpolation methods implemented in the FLEX end-to-end mission simulator [9], [98]. This will likely reduce the computation time for the generation of synthetic scenes, which will extend the current FLEX simulator capabilities to perform sensitivity analysis for various leaf/canopy and atmospheric conditions. In addition, two research lines are currently being explored to improve the accuracy of interpolation/emulation methods. On the one hand, we are optimizing the LUT nodes distribution in order to reduce LUT size while increasing the accuracy of interpolation methods [103], [104]. On the other hand, we can improve the accuracy of statistical methods to generate multioutput emulators by using advanced machine learning methods, deep learning algorithms, or combining emulators, e.g., for different wavelength regions [105]. Both research lines will turn into more efficient sampling methods, reducing both computation burden and RTM-output reconstruction error.

Biographies

graphic file with name EMS152625-i001.gif Jorge Vicent received the B.Sc. degree in physics from the University of Valencia, Valencia, Spain, in 2008, the M.Sc. degree in physics from the École polytechnique fédérale de Lausanne, Lausanne, Switzerland, in 2010, and the Ph.D. degree in remote sensing from the University of Valencia, Valencia, Spain, in 2016.

Since November 2017, he has been with Magellium with the Department of Earth Observation, Toulouse, France as a R&D Engineer. He is currently involved in developing the Level-2 processing chain for ESA’s FLEX mission. His research interests include the modeling of earth observation satellites, system engineering, radiative transfer modelling, atmospheric correction, and hyperspectral data analysis.

graphic file with name EMS152625-i002.gif Jochem Verrelst received the M.Sc. degree in tropical land use and in geo-information science both in 2005 and the Ph.D. degree in remote sensing in 2010 from Wageningen University, Wageningen, Netherlands. His dissertation focused on the space-borne spectrodirectional estimation of forest properties.

Since 2010, he has been involved in preparatory activities of FLEX. He is the founder of the ARTMO software package. In 2017, he received a H2020 ERC Starting Grant (#755617) to work on the development of vegetation products based on synergy of FLEX and Sentinel-3 data. His research interests include retrieval of vegetation properties using airborne and satellite data, canopy radiative transfer modeling and emulation, and hyperspectral data analysis.

graphic file with name EMS152625-i003.gif Juan Pablo Rivera-Caicedo received the B.Sc. degree in agricultural engineering from the University National of Colombia and University of Valle, Cali, Colombia, in 2001, the Master’s degree in irrigation engineering from the CEDEX-Centro de Estudios y Experimentación de Obra Públicas, Madrid, Spain, in 2003, and the M.Sc. and Ph.D. degrees in remote sensing from the University of Valencia, Valencia, Spain, in 2011 and 2014, respectively.

Since January 2011, he has been a member of the Laboratory for Earth Observation, Image Processing Laboratory, University of Valencia, Spain. Since 2016, he has been with the Concejo Nacional de Ciencia y Tecnologia—CONACYT in México in the program: Catedras-Conacyt. He is currently involved in preparatory activities of the Fluorescence Explorer. His research interests include retrieval of vegetation properties using airborne and satellite data, leaf and canopy radiative transfer modeling, and hyperspectral data analysis.

graphic file with name EMS152625-i004.gif Neus Sabater received the B.Sc. degree in physics in 2010, and the M.Sc. and Ph.D. degrees in remote sensing in 2018 from the Universitat de València, Valencia, Spain.

Since August 2012, she has been involved in the activities of the Laboratory for Earth Observation at the Image Processing Laboratory, University of Valencia, as a Research Technician. Main activities during this period were related to the development of the preparatory activities of the FLEX mission. Her research interests include atmospheric correction, atmospheric radiative transfer, meteorology, and hyperspectral RS.

Dr. Sabater was the recipient of awarded a Ph.D. scholarship from the Spanish Ministry of Economy and Competitiveness, associated to the Ingenio/Seosat Spanish space mission in 2013 and also was the recipient of the University of Valencia to the best student records in the M.Sc. Of remote sensing (2012–2013).

graphic file with name EMS152625-i005.gif Jordi Muñoz-Marí (M’11) received the B.Sc. degree in physics in 1993, and the B.Sc. and Ph.D. degrees in electronics engineering in 1996 and 2003, respectively, from the Universitat de València, Valencia, Spain.

He is currently an Associate Professor with the Department of Electronics Engineering, Universitat de València, where he teaches electronic circuits and, programmable logical devices, digital electronic systems and microprocessor electronic systems. His research interests include development of machine learning algorithms for signal and image processing. Please visit http://www.uv.es/jordi/ for more information.

graphic file with name EMS152625-i006.gif Gustau Camps-Valls (M’04–SM’07–F’18) received the Ph.D. degree in physics from the Universitat de València, Valencia, Spain, in 2002.

He is currently a Full Professor in electrical engineering, and a Coordinator of the Image and Signal Processing Group, Universitat de València. He is interested in the development of machine learning algorithms for geoscience and remote sensing data analysis. He entered the list of highly cited researchers by Thomson Reuters in 2011, holds an h = 55, has authored and coauthored more than 150 journal papers, 200 conference papers, and 4 books on machine learning, remote sensing and signal processing.

Dr. Camps-Valls was the recipient of the prestigious ERC Consolidator Grant to advance in statistical inference for Earth observation data analysis in 2015. He is an Associate Editor for the IEEE TRANSACTIONS ON SIGNAL PROCESSING, IEEE Signal Processing Letters, and IEEE Geoscience and Remote Sensing Letters. Visit http://isp.uv.es for more information.

graphic file with name EMS152625-i007.gif José Moreno (M’18) is currently Ph.D degree in theoretical physics with the Department of Earth Physics and Thermodynamics, Faculty of Physics, University of València, Valencia, Spain, as a Professor of earth physics, teaching and working on different projects related to remote sensing and space research as responsible for the Laboratory for Earth Observation. His main work is related to the modeling and monitoring of land surface processes by using remote sensing techniques. He has been involved in many international projects and research networks, including the preparatory activities and exploitation programmes of several satellite missions (ENVISAT, CHRIS/PROBA, GMES/Sentinels, SEOSAT) and the Fluorescence Explorer (FLEX), ESA’s 8th Earth Explorer mission.

Dr. Moreno was an Associate Editor for the IEEE TRANSACTIONS On GEOSCIENCE AND Remote Sensing (1994–2000) and has been a member of the ESA Earth Sciences Advisory Committee (1998–2002), the Space Station Users Panel, and other international advisory committees. He is the Director of the Laboratory for Earth Observation with the Image Processing Laboratory/Scientific Park.

Footnotes

1

For sake of simplicity, the wavelength dependency is omitted in the formulation in this paper, i.e., f(x; λ) ≡ f(x).

Contributor Information

Jorge Vicent, Email: jorge.vicent@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

Jochem Verrelst, Email: jochem.verrelst@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

Juan Pablo Rivera-Caicedo, Email: jprivera@conacyt.mx, CONACYT-UAN, Departamento: Secretaria de investigatión y posgrado, 63155, Tepic, Mexico.

Neus Sabater, Email: m.neus.sabater@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

Jordi Muñoz-Marí, Email: jordi.munoz@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

Gustau Camps-Valls, Email: gustau.camps@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

José Moreno, Email: jose.moreno@uv.es, Image Processing Laboratory, University of Valencia, Valencia 46980, Spain.

References

  • [1].Mobley C. Light and Water: Radiative Transfer in Natural Waters. Academic; New York, NY, USA: 1994. [Online] http://www.sequoiasci.com/product/hydrolight. [Google Scholar]
  • [2].Zhang Y, Rossow W, Lacis A, Oinas V, Mishchenko M. Calculation of radiative fluxes from the surface to top of atmosphere based on ISCCP and other global data sets: Refinements of the radiative transfer model and the input data. J Geophys Res D, Atmos. 2004;109(19):1–27. [Google Scholar]
  • [3].Jacquemoud S, et al. PROSPECT + SAIL models: A review of use for vegetation characterization. Remote Sens Environ. 2009;113:S56–S66. [Google Scholar]
  • [4].Deiveegan M, Balaji C, Venkateshan S. A polarized microwave radiative transfer model for passive remote sensing. Atmos Res. 2008;88(3/4):277–293. [Google Scholar]
  • [5].Van der Tol C, Berry JA, Campbell PKE, Rascher U. Models of fluorescence and photosynthesis for interpreting measurements of solar-induced chlorophyll fluorescence. J Geophys Res, Biogeosci. 2014;119(12):2312–2327. doi: 10.1002/2014JG002713. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [6].Verrelst J, et al. Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties—A review. ISPRS J Photogrammetry Remote Sens. 2015;108:273–290. [Google Scholar]
  • [7].Verhoef W, Bach H. Simulation of Sentinel-3 images by four-stream surface-atmosphere radiative transfer modeling in the optical and thermal domains. Remote Sens Environ. 2012;120:197–207. [Google Scholar]
  • [8].Meharrar K, Bachari N. Modelling of radiative transfer of natural surfaces in the solar radiation spectrum: Development of a satellite data simulator (SDDS) Int J Remote Sens. 2014;35(4):1199–1216. [Google Scholar]
  • [9].Vicent J, et al. FLEX end-to-end mission performance simulator. IEEE Trans Geosci Remote Sens. 2016 Jul;54(7):4215–4223. [Google Scholar]
  • [10].Verrelst J, Rivera J, Van Der Tol C, Magnani F, Mohammed G, Moreno J. Global sensitivity analysis of the scope model: What drives simulated canopy-leaving sun-induced fluorescence? Remote Sens Environ. 2015;166:8–21. [Google Scholar]
  • [11].Feret JB, et al. PROSPECT-4 and 5: Advances in the leaf optical properties model separating photosynthetic pigments. Remote Sens Environ. 2008;112(6):3030–3043. [Google Scholar]
  • [12].Verhoef W. Light scattering by leaf layers with application to canopy reflectance modeling: The SAIL model. Remote Sens Environ. 1984;16(2):125–141. [Google Scholar]
  • [13].Rahman H, Dedieu G. SMAC: A simplified method for the atmospheric correction of satellite measurements in the solar spectrum. Int J Remote Sens. 1994;15(1):123–143. [Google Scholar]
  • [14].Lipton A, Moncet J-L, Boukabara S-A, Uymin G, Quinn K. Fast and accurate radiative transfer in the microwave with optimum spectral sampling. IEEE Trans Geosc Remote Sens. 2009 Jul;47(7):1909–1917. [Google Scholar]
  • [15].Govaerts YM, Verstraete MM. Raytran: A Monte Carlo raytracing model to compute light scattering in three-dimensional heterogeneous media. IEEE Trans Geosc Remote Sens. 1998 Mar;36(2):493–505. [Google Scholar]
  • [16].North P. Three-dimensional forest light interaction model using a Monte Carlo method. IEEE Trans Geosc Remote Sens. 1996 Jul;34(4):946–956. [Google Scholar]
  • [17].Lewis P. Three-dimensional plant modelling for remote sensing simulation studies using the botanical plant modelling system. Agronomie. 1999;19(3/4):185–210. [Google Scholar]
  • [18].Gastellu-Etchegorry J, Demarez V, Pinel V, Zagolski F. Modeling radiative transfer in heterogeneous 3-D vegetation canopies. Remote Sens Environ. 1996;58(2):131–156. [Google Scholar]
  • [19].Petropoulos G, Wooster M, Carlson T, Kennedy M, Scholze M. A global Bayesian sensitivity analysis of the 1D SimSphere soil vegetation atmospheric transfer (SVAT) model using Gaussian model emulation. Ecol Model. 2009;220(19):2427–2440. [Google Scholar]
  • [20].Van Der Tol C, Verhoef W, Timmermans J, Verhoef A, Su Z. An integrated model of soil-canopy spectral radiances, photosynthesis, fluorescence, temperature and energy balance. Biogeosciences. 2009;6(12):3109–3129. [Google Scholar]
  • [21].Berk A, et al. MODTRAN™5: 2006 update. Proc SPIE. 2006;6233:Art. no. 62331F [Google Scholar]
  • [22].España M, Baret F, Aries F, Chelle M, Andrieu B, Prâvot L. Modeling maize canopy 3d architecture: Application to reflectance simulation. Ecol Model. 1999;122(1/2):25–43. [Google Scholar]
  • [23].Abramowitz M, Stegun I. Applied Mathematics Series. Vol. 55. Nat Bureau Std; Washington, DC, USA: 1964. Handbook of mathematical functions. 25.2. [Google Scholar]
  • [24].Amidror I. Scattered data interpolation methods for electronic imaging systems: A survey. J Electron Imag. 2002;11(2):168–176. [Google Scholar]
  • [25].Ientilucci E, Bajorski P. Stochastic modeling of physically derived signature spaces. J Appl Remote Sens. 2008;2(1):023532 [Google Scholar]
  • [26].Guanter L, Richter R, Kaufmann H. On the application of the MODTRAN4 atmospheric radiative transfer code to optical remote sensing. Int J Remote Sens. 2009;30(6):1407–1424. [Google Scholar]
  • [27].Lyapustin A, Martonchik J, Wang Y, Laszlo I, Korkin S. Multiangle implementation of atmospheric correction (MAIAC): 1. Radiative transfer basis and look-up tables. J Geophys Res Atmospheres. 2011;116(3):D03210 [Google Scholar]
  • [28].Scheck L, Fràrebeau P, Buras-Schnell R, Mayer B. A fast radiative transfer method for the simulation of visible satellite imagery. J Quant Spect rosc Radiat Transf. 2016;175:54–67. [Google Scholar]
  • [29].O’Hagan A. Bayesian analysis of computer code outputs: A tutorial. Rel Eng Syst Saf. 2006;91(10-11):1290–1300. [Google Scholar]
  • [30].Gómez-Dans JL, Lewis PE, Disney M. Efficient emulation of radiative transfer codes using Gaussian processes and application to land surface parameter inferences. Remote Sens. 2016;8(2):119. [Google Scholar]
  • [31].Verrelst J, et al. Emulation of leaf, canopy and atmosphere radiative transfer models for fast global sensitivity analysis. Remote Sens. 2016;8(8):673. [Google Scholar]
  • [32].Verrelst J, Rivera Caicedo J, Muñoz Marí J, Camps-Valls G, Moreno J. SCOPE-based emulators for fast generation of synthetic canopy reflectance and sun-induced fluorescence spectra. Remote Sens. 2017;9(9):927. [Google Scholar]
  • [33].Busby D. Hierarchical adaptive experimental design for Gaussian process emulators. Rel Eng Syst Safety. 2009;94(7):1183–1193. [Google Scholar]
  • [34].Kim Y-J. Comparative study of surrogate models for uncertainty quantification of building energy model: Gaussian process emulator vs. polynomial chaos expansion. Energy Buildings. 2016;133:46–58. [Google Scholar]
  • [35].Razavi S, Tolson BA, Burn DH. Numerical assessment of metamodelling strategies in computationally intensive optimization. Environ Model Softw. 2012;34:67–86. [Google Scholar]
  • [36].Bastos LS, O’Hagan A. Diagnostics for Gaussian process emulators. Technometrics. 2009;51(4):425–438. [Google Scholar]
  • [37].O’Hagan A. Probabilistic uncertainty specification: Overview, elaboration techniques and their application to a mechanistic model of carbon flux. Environ Model Softw. 2012;36:35–48. [Google Scholar]
  • [38].Owen NE, Challenor P, Menon PP, Bennani S. Comparison of surrogate-based uncertainty quantification methods for computationally expensive simulators. SIAM/ASA J Uncertainty Quantification. 2015;5:403–435. [Google Scholar]
  • [39].Gastellu-Etchegorry J, Gascon F, Esteve P. An interpolation procedure for generalizing a look-up table inversion method. Remote Sens Environ. 2003;87(1):55–71. [Google Scholar]
  • [40].Shepard D. Two-dimensional interpolation function for irregularly-spaced data; Proc 23rd Nat Conf; 1968. pp. 517–524. [Google Scholar]
  • [41].Łukaszyk S. A new concept of probability metric and its applications in approximation of scattered data sets. Comput Mech. 2004 Mar;33(4):299–304. [Google Scholar]
  • [42].Cooley T, et al. FLAASH, a MODTRAN4-based atmospheric correction algorithm, its applications and validation; Proc IEEE Int Geosci Remote Sens Symp; 2002. pp. 1414–1418. [Google Scholar]
  • [43].Richter R, Schläpfer D. Geo-atmospheric processing of airborne imaging spectrometry data. Part 2: Atmospheric/topographic correction. Int J Remote Sens. 2002;23(13):2631–2649. [Google Scholar]
  • [44].Barber C, Dobkin D, Huhdanpaa H. The quickhull algorithm for convex hulls. ACM Trans Math Softw. 1996;22(4):469–483. [Google Scholar]
  • [45].Delaunay B. Sur la sphère vide. A la mémoire de Georges Voronoï. Bull de l’Académie Des Sci de l’URSS Classe Des Sci Mathématiques et na. 1934;(6):793–800. [Google Scholar]
  • [46].Sibson R. Interpolating Multivariate Data. Willey; New York, NY, USA: 1989. A brief description of natural neighbor interpolation; pp. 21–36. ch. 2. [Google Scholar]
  • [47].I. The MathWorks. Interpolate N-D scattered data. 2017. 2017a [Online]. Available: https://es.mathworks.com/help/matlab/ref/griddatan.html.
  • [48].Park SW, Linsen L, Kreylos O, Owens JD, Hamann B. Discrete Sibson interpolation. IEEE Trans Vis Comput Graph. 2006 Mar/Apr;12(2):243–253. doi: 10.1109/TVCG.2006.27. [DOI] [PubMed] [Google Scholar]
  • [49].Bartels BJ, H R, Barsky B. An Introduction Splines Use Computer Graphics and Geometric Modelling. 2nd ed. Morgan Kaufmann; San Francisco, CA, USA: 1998. Hermite and cubic spline interpolation; pp. 9–17. ch. 3. [Google Scholar]
  • [50].Rohmer J, Foerster E. Global sensitivity analysis of large-scale numerical landslide models based on Gaussian-process meta-modeling. Comput Geosci. 2011;37(7):917–927. [Google Scholar]
  • [51].Carnevale C, Finzi G, Guariso G, Pisoni E, Volta M. Surrogate models to compute optimal air quality planning policies at a regional scale. Environ Model Softw. 2012;34:44–50. [Google Scholar]
  • [52].Villa-Vialaneix N, Follador M, Ratto M, Leip A. A comparison of eight metamodeling techniques for the simulation of N2 O fluxes and n leaching from corn crops. Environ Model Softw. 2012;34:51–66. [Google Scholar]
  • [53].Castelletti A, Galelli S, Ratto M, Soncini-Sessa R, Young P. A general framework for dynamic emulation modelling in environmental problems. Environ Model Softw. 2012;34:5–18. [Google Scholar]
  • [54].Lee L, et al. The magnitude and causes of uncertainty in global model simulations of cloud condensation nuclei. Atmos Chem Phys. 2013;13(17):8879–8914. [Google Scholar]
  • [55].Bounceur N, Crucifix M, Wilkinson R. Global sensitivity analysis of the climate-vegetation system to astronomical forcing: an emulatorbased approach. Earth Syst Dyn Discuss. 2014;5(2):901–943. [Google Scholar]
  • [56].Ireland G, Petropoulos G, Carlson T, Purdy S. Addressing the ability of a land biosphere model to predict key biophysical vegetation characterisation parameters with global sensitivity analysis. Environ Model Softw. 2015;65:94–107. [Google Scholar]
  • [57].McKay M, Beckman R, Conover W. Comparison of three methods for selecting values of input variables in the analysis of output from a computer code. Technometrics. 1979;21(2):239–245. [Google Scholar]
  • [58].Baret F, Clevers J, Steven M. The robustness of canopy gap fraction estimates from red and near-infrared reflectances: A comparison of approaches. Remote Sens Environ. 1995;54(2):141–151. [Google Scholar]
  • [59].Combal B, et al. Retrieval of canopy biophysical variables from bidirectional reflectance using prior information to solve the ill-posed inverse problem. Remote Sens Environ. 2003;84(1):1–15. [Google Scholar]
  • [60].Verrelst J, et al. Machine learning regression algorithms for biophysical parameter retrieval: Opportunities for Sentinel-2 and -3. Remote Sens Environ. 2012;118:127–139. [Google Scholar]
  • [61].Rivera J, Verrelst J, Delegido J, Veroustraete F, Moreno J. On the semi-automatic retrieval of biophysical parameters based on spectral index optimization. Remote Sens. 2014;6(6):4924–4951. [Google Scholar]
  • [62].Hankin RK. Introducing BACCO, an R package for Bayesian analysis of computer code output. J Stat Softw. 2005;14(16):1–21. [Google Scholar]
  • [63].Hughes G. On the mean accuracy of statistical pattern recognizers. IEEE Trans Inf Theory. 1968 Jan;14(1):55–63. [Google Scholar]
  • [64].Wold S, Esbensen K, Geladi P. Principal component analysis. Chemometrics Intell Lab Syst. 1987;2(1-3):37–52. [Google Scholar]
  • [65].Rivera JP, Verrelst J, Gómez-Dans J, Muñoz Mari J, Moreno J, Camps-Valls G. An emulator toolbox to approximate radiative transfer models with statistical learning. Remote Sens. 2015;7(7):9347. [Google Scholar]
  • [66].Conti S, Gosling J, Oakley J, O’Hagan A. Gaussian process emulation of dynamic computer codes. Biometrika. 2009;96(3):663–676. [Google Scholar]
  • [67].Liu F, West M. A dynamic modelling strategy for Bayesian computer model emulation. Bayesian Anal. 2009;4(2):393–412. [Google Scholar]
  • [68].Zhu XX, et al. Deep learning in remote sensing: A comprehensive review and list of resources. IEEE Geosci Remote Sens Mag. 2017 Dec;5(4):8–36. [Google Scholar]
  • [69].Shawe-Taylor J, Cristianini N. Kernel Methods for Pattern Analysis. Cambridge Univ. Press; Cambridge, U.K: 2004. [Google Scholar]
  • [70].Camps-Valls G, Bruzzone L, editors. Kernel Methods for Remote Sensing Data Analysis. Wiley; Chichester, U.K: 2009. Dec, [Google Scholar]
  • [71].Rojo-Álvarez J, Martínez-Ramón M, Muñoz Marí J, Camps-Valls G. Digital Signal Processing With Kernel Methods. Wiley; Chichester, U.K: 2017. Apr, [Google Scholar]
  • [72].Rasmussen CE, Williams CKI. Gaussian Processes for Machine Learning. MIT Press; New York, NY, USA: 2006. [Google Scholar]
  • [73].Suykens J, Vandewalle J. Least squares support vector machine classifiers. Neural Process Lett. 1999;9(3):293–300. [Google Scholar]
  • [74].Haykin S. Neural Networks—Comprehensive Foundation. 2nd ed. Prentice-Hall; Englewood Cliffs, NJ, USA: 1999. Oct, [Google Scholar]
  • [75].Bishop CM. Neural Networks for Pattern Recognition. Oxford Univ. Press; New York, NY, USA: 1995. [Google Scholar]
  • [76].Kimeldorf GS, Wahba G. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann Math Statist. 1970;41(2):495–502. [Google Scholar]
  • [77].Verrelst J, Romijn E, Kooistra L. Mapping vegetation density in a heterogeneous river floodplain ecosystem using pointable CHRIS/PROBA data. Remote Sens. 2012;4(9):2866–2889. [Google Scholar]
  • [78].Vicent J, Sabater N, Verrelst J, Alonso L, Moreno J. Assessment of approximations in aerosol optical properties and vertical distribution into FLEX atmospherically-corrected surface reflectance and retrieved sun-induced fluorescence. Remote Sens. 2017;9(7):675. [Google Scholar]
  • [79].Jacquemoud S, et al. PROSPECT + SAIL models: A review of use for vegetation characterization. Remote Sens Environ. 2009;113:S56–S66. [Google Scholar]
  • [80].Verhoef W, Jia L, Xiao Q, Su Z. Unified optical-thermal four-stream radiative transfer theory for homogeneous vegetation canopies. IEEE Trans Geosci Remote Sens. 2007 Jun;45(6):1808–1822. [Google Scholar]
  • [81].Stamnes K, Tsay S-C, Wiscombe W, Jayaweera K. Numerically stable algorithm for discrete-ordinate-method radiative transfer in multiple scattering and emitting layered media. Appl Opt. 1988 Jun;27(12):2502–2509. doi: 10.1364/AO.27.002502. [DOI] [PubMed] [Google Scholar]
  • [82].Goody R, West R, Chen L, Crisp D. The correlated-k method for radiation calculations in nonhomogeneous atmospheres. J Quant Spectrosc Radiat Transf. 1989;42(6):539–550. [Google Scholar]
  • [83].Baldridge A, Hook S, Grove C, Rivera G. The ASTER spectral library version 2.0. Remote Sens Environ. 2009;113(4):711–715. [Google Scholar]
  • [84].Richter R. A spatially adaptive fast atmospheric correction algorithm. Int J Remote Sens. 1996;17(6):1201–1214. [Google Scholar]
  • [85].O’Hagan A. Bayesian Inference (Kendall’s Advanced Theory of Statistics) 2B Arnold; London, U.K: 1994. [Google Scholar]
  • [86].Oakley J, O’Hagan A. Probabilistic sensitivity analysis of complex models: A Bayesian approach. J Roy Statist Soc Series B, Statist Methodol. 2004;66(3):751–769. [Google Scholar]
  • [87].Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012 Feb;13:281–305. [Google Scholar]
  • [88].Nelder JA, Mead R. A simplex method for function minimization. Comput J. 1965;7:308–313. [Google Scholar]
  • [89].Gutmann MU, Corander J. Bayesian optimization for likelihood-free inference of simulator-based statistical models. J Mach Learn Res. 2015;16:4256–4302. [Google Scholar]
  • [90].Mockus J. Bayesian Approach to Global Optimization. Kluwer; Dordrecht, The Netherlands: 1989. [Google Scholar]
  • [91].Snoek J, Larochelle H, Adams RP. Practical Bayesian optimization of machine learning algorithms. Proc Neural Inf Process Syst. 2012:2951–2959. [Google Scholar]
  • [92].Kirkpatrick SK, Gelatt CD, Jr, Vecchi MP. Optimization by simulated annealing. Science. 1983 May;220(4598):671–680. doi: 10.1126/science.220.4598.671. [DOI] [PubMed] [Google Scholar]
  • [93].Martino L, Elvira V, Luengo D, Corander J, Louzada F. Orthogonal parallel MCMC methods for sampling and optimization. Digital Signal Process. 2016;58:64–84. [Google Scholar]
  • [94].Robert CP, Casella G. Monte Carlo Statistical Methods. Springer; New York, NY, USA: 2004. [Google Scholar]
  • [95].Martino L, Elvira V. Metropolis Sampling. Wiley; New York, NY, USA: 2017. [Google Scholar]
  • [96].Martino L, Read J. A multi-point Metropolis scheme with generic weight functions. Statist Probability Lett. 2012;82(7):1445–1453. [Google Scholar]
  • [97].Ientilucci E, Brown S. Advances in wide area hyperspectral image simulation. Proc SPIE. 2003;5075:110–121. [Google Scholar]
  • [98].Tenjo C, et al. Design of a generic 3D Scene Generator for passive optical missions and its implementation for the ESA’s FLEX/Sentinel-3 tandem mission. IEEE Trans Geosc Remote Sens. 2017 Mar;55(13):1290–1307. [Google Scholar]
  • [99].Rivera J, Verrelst J, Leonenko G, Moreno J. Multiple cost functions and regularization options for improved retrieval of leaf chlorophyll content and LAI through inversion of the PROSAIL model. Remote Sens. 2013;5(7):3280–3304. [Google Scholar]
  • [100].Verrelst J, Rivera J, Leonenko G, Alonso L, Moreno J. Optimizing LUT-Based RTM inversion for semiautomatic mapping of crop biophysical parameters from sentinel-2 and -3 Data: Role of cost functions. IEEE Trans Geosc Remote Sens. 2014 Jan;52(1):257–269. [Google Scholar]
  • [101].Matteoli S, Ientilucci E, Kerekes J. Operational and performance considerations of radiative-transfer modeling in hyperspectral target detection. IEEE Trans Geosc Remote Sens. 2011 Apr;49(4):1343–1355. [Google Scholar]
  • [102].Kerekes J, Baum J. Full-spectrum spectral imaging system analytical model. IEEE Trans Geosc Remote Sens. 2005 Mar;43(3):571–580. [Google Scholar]
  • [103].Martino L, Vicent J, Camps-Valls G. Automatic emulator and optimized look-up table generation for radiative transfer models; Proc Int Geosci Remote Sens Symp; 2017. Jul, pp. 1457–1460. [Google Scholar]
  • [104].Vicent J, et al. Gradient-based automatic look-up table generator for radiative transfer models. IEEE Trans Geosc Remote Sens. 2018:1–9. doi: 10.1109/TGRS.2018.2864517. (preprint) [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [105].Heestermans-Svendsen D, Martino L, Vicent J, Camps-Valls G. Multioutput automatic emulator for radiative transfer models; Proc Int Geosci Remote Sens Symp; 2018. Jul, [Google Scholar]

RESOURCES