Abstract
Gaussian processes (GPs) are widely recognized for their robustness and flexibility across various domains, including machine learning, analysis of time series, spatial statistics, and biomedicine. In addition to their common usage in regression tasks, GP kernel parameters are frequently interpreted in various applications. For example, in spatial transcriptomics, estimated kernel parameters are used to identify spatial variable genes, which exhibit significant expression patterns across different tissue locations. However, before these parameters can be meaningfully interpreted, it is essential to establish their identifiability. Existing studies of GP parameter identifiability have focused primarily on Matérn-type kernels, as their spectral densities allow for more established mathematical tools. In many real-world applications, particuarly in time series analysis, other kernels such as the squared exponential, periodic, and rational quadratic kernels, as well as their combinations, are also widely used. These kernels share the property of being holomorphic around zero, and their parameter identifiability remains underexplored. In this paper, we bridge this gap by developing a novel theoretical framework for determining kernel parameter identifiability for kernels holomorphic near zero. Our findings enable practitioners to determine which parameters are identifiable in both existing and newly constructed kernels, supporting application-specific interpretation of the identifiable parameters, and highlighting non-identifiable parameters that require careful interpretation.
1. Introduction
Gaussian Processes (GPs) are powerful and flexible tools extensively used across multiple fields, such as machine learning (ML), geospatial and spatiotemporal analysis, biomedicine, finance, and environmental modeling (Rasmussen and Williams, 2006; Banerjee et al., 2014; Cressie and Wikle, 2015). They serve various purposes: as regression or classification methods through GP regression or GP classification; as priors over functions in Bayesian inference (Ghosal and van der Vaart, 2017); for modeling latent distributions via Gaussian Process Latent Variable Models (GPLVM, Lawrence (2003)); and in demonstrating equivalencies to deep neural networks with infinite width (Lee et al., 2018). The flexibility of GPs as universal approximators, their inherent interpretability – especially regarding kernel parameters – and their capability to quantify uncertainty, are among their key advantages.
The kernel function, also known as the covariance function or covariogram, which defines the covariance structure within a GP, is pivotal to application and effectiveness. Over recent decades, there has been a proliferation of research into developing specialized kernels tailored for specific data types including time-series, spatial, imaging, and spatiotemporal datasets. Popular choices such as the squared exponential (SE, also known as RBF or Gaussian), rational quadratic (RQ), periodic (Per), and Matérn kernels are frequently employed, often in innovative combinations that enhance model performance (Wang et al., 2018). These combinations involve operations such as summation, multiplication, and spectral mixtures (Duvenaud et al., 2011; 2013; Kronberger and Kommenda, 2013; Wilson and Adams, 2013; Samo and Roberts, 2015; Remes et al., 2017; Cheng et al., 2019; Verma and Engelhardt, 2020), which enable the leveraging of individual kernel strengths to better capture complex data patterns.
Despite the extensive literature on GP theory and its application to regression or prediction tasks, less attention has been paid to the parameter inference, particularly the identifiability and interpretability of kernel parameters. Parameter inference is critical in applications that use estimated parameters in downstream tasks such as model comparison and problem-specific parameter interpretation.
One such application is in the study of spatial transcriptomics, which measures gene expression across different tissue locations to understand cellular and tissue-level biological processes (Marx, 2021). One important task within this field is identifying spatially variable genes (SVGs), which are genes that show significant changes in expression patterns across spatial locations, among tens of thousands of genes. Svensson et al. (2018) models gene expressions as a GP across spatial coordinates using a SE kernel with nuggets: . The kernel parameter was then interpreted as the magnitude of the spatial effects to identify SVGs, estimated by the Maximum Likelihood Estimator (MLE). Other applications of GP parameter inference to spatial transcriptomics include Weber et al. (2023) and Sun et al. (2020).
Another example where kernel parameter estimates are interpreted is the decomposition of the Mauna Loa CO2 time series data (Tans and Keeling, 2023) into four kernel components in the impactful book Rasmussen and Williams (2006):
| (1) |
where is a SE kernel that captures the long-term smooth rising trend, , called the damped periodic kernel, is a multiplication of a SE kernel and a periodic kernel that accounts for seasonal variations, is a rational quadratic kernel that models medium-term irregularities, and is a sum of a SE and a white noise kernel that measure correlated and independent noise respectively. This kernel is also used as an example in the tutorial of the widely used Python package “sklearn.gaussian_process”, with detailed interpretation of all 11 parameters proposed by the authors in Section 3. Although the interpretation seems reasonable, a theoretical understanding with a rigorous proof is missing.
Although parameter identifiability in a GP model might seem straightforward at first glance, it is a challenging and nuanced problem. In fact, not all parameters in widely used GP kernels are identifiable: if a parameter is not identifiable, consistent estimation and subsequent interpretation are impossible. For example, for the Matérn kernel in dimension with spatial variance , lengthscale , and known smoothness parameter , Zhang (2004) proved that neither nor is identifiable or consistently estimable, no matter how sophisticated the estimator is. In fact, the only identifiable parameter in the Matérn kernel, termed the microergodic parameter, is . Follow-up studies for a single Matérn kernel include Anderes (2010); Kaufman and Shaby (2013); Li (2022); Li et al. (2023), and Chen et al. (2024) for a linear combination of Matérns with different smoothness. Such negative results raise a natural question: are all parameters used in practice, including those in Equation (1), identifiable so that their interpretations are justified?
As far as we are aware, identifiability of the parameters in Equation (1) has not been proven before. More importantly, there is still a gap in the literature for more complicated kernel combinations like those popularized in ML, especially when the combinations involve periodic kernels. The lack of theoretical examination is partly due to the failure of traditional methods used to study GP parameter identifiability, such as the integral test (Stein, 1999), which requires conditions on the spectral density not met by common kernels like SE, Per, and RQ, and even more so when these kernels are combined. This necessitates the development of new analytic tools to better understand kernel parameter identifiability and interpretability.
Motivated by these observations and challenges, this paper proves a general theorem (Theorem 3.4) that determines all the identifiable functions of the parameters in any family of stationary kernels holomorphic around 0. The result applies to complex combinations of kernels, particularly those common in the ML community, such as the one used in Equation (1). We demonstrate that all parameters in this kernel are identifiable under mild constraints, supporting the interpretation of kernel parameters in Rasmussen and Williams (2006) and the “sklearn.gaussian_process” Python package tutorial. Additionally, we establish a general result that is used to determine the identifiable functions of the parameters for a kernel that is a sum of products of other kernels.
The paper is organized as follows. Section 2 provides a comprehensive background, introduces the necessary notation and concepts, and reviews the relevant literature. Section 3 presents our main theoretical contributions. Section 4 contains simulation studies to support our theories, followed by Section 5 with a discussion of limitations and future work. A brief discussion of the connection between parameter identifiability and prediction is given in Appendix C.
2. Background
This section defines key concepts and notations and summarizes existing literature on GP kernel identifiability and interpretability. We begin with the definition of GPs.
2.1. Gaussian process
Definition 1 (GP).
A stochastic process is said to follow a GP in domain with a mean function and a positive definite covariance/kernel/covariogram function if for all ,
For our study, as well as presentation simplicity, we assume , without loss of generality (Stein, 1999). In this situation, since the distribution of is completely determined by , we sometimes call a GP, which refers to a GP with covariance kernel .
Throughout this paper, we focus on the infill domain (also known as fixed domain or interpolation), i.e., the domain does not grow with sample size, a situation commonly considered in the literature (Stein, 1999). Next, we introduce the commonly accepted stationarity assumption:
Definition 2 (Stationarity).
is called stationary if , .
For a stationary kernel, we can reformulate the kernel to a function on instead of by . Stationarity is a common assumption in GP literature due to its satisfactory practical performance and simplicity in both implementation and theoretical analysis. Throughout this paper, we focus only on stationary kernels, and still denote the simplified kernel without causing any confusion.
2.2. Kernels
We first note that all kernel functions considered in this paper are continuous functions unless noted otherwise. Then we introduce the following commonly used kernels in Table 1.
Table 1:
Example kernels, parameters, and domain dimension
| Name | Parameters | Dimension | |
|---|---|---|---|
| SE | , | ||
| Per | , , | ||
| RQ | , , | ||
| Matérn | , , |
In this table, is called the spatial variance, or partial sill, which measures point-wise variance; is called the length scale that measures the spatial dependency; is the period parameter; is called the scale mixture parameter; is called the smoothness parameter. Among them, Per is well-defined only when , i.e., is a closed interval, while others are well-defined on for any .
Each individual kernel in the above table captures some unique behavior in the process . However, when the process has complicated structure, a common approach is to combine some of these kernels to create a new one. Such a combination can be simply a sum of products of these kernels, which is guaranteed to be a positive definite function.
The following example extends Equation (1) used in Rasmussen and Williams (2006) to study the Mauna Loa CO2 time series data:
| (2) |
where be the vector of all parameters in the above kernel.
Note that this kernel is more flexible than the one in Equation (1), which assumes the period . We adopt this more challenging modification since the period is sometimes unknown in practice so practitioners have to estimate it from the data.
2.3. Identifiability
The study of identifiability of GP kernel parameters relies on the notation of equivalence of measures defined below:
Definition 3 (Equivalence of measures).
Two measures and are said to be equivalent if they are absolutely continuous with respect to each other, denoted by . That is, . Two measures are said to be orthogonal, denoted by if there exists a measurable set such that but .
Two GP laws and are either equivalent, or are orthogonal (Feldman, 1958), which means they assign probability 1 to disjoint sets: and . We define the identifiability of GP parameters as follows:
Definition 4 (Microergodicity).
Let be a family of covariance kernels of a GP. Then a function of is said to be microergodic if .
If and are both microergodic, then , so and are related by a bijection. Thus the migroergodic function is unique up to a bijective transformation, and it makes sense to speak of ‘the’ microergodic function .
Definition 5 (Identifiability).
Let be a family of covariance kernels of a GP. A function of is said to be identifiable if , or equivalently, is a function of the microergodic function . We say that the family is identifiable if is identifiable.
Note that a consistent estimator of can exist only when is identifiable – when is not identifiable, say with , it is not possible to find a consistent estimator of , since there is no way to distinguish between data generated from and those from almost surely (see Stein (1999); Zhang (2004) for more detailed discussion). Thus anything that can be consistently estimated is identifiable. The microergodic function is the maximal identifiable function, so knowing the microergodic function completely solves the identifiability problem for the family of kernels. However, in some cases, it is difficult to fully determine the microergodic function , whereas it is easier to determine that some specific function is identifiable.
To study the identifiability of GP kernel parameters, it suffices to determine when two GPs in the same parametric family are equivalent. However, to determine whether two GPs are equivalent is not an easy task, and the methods for doing so highly depend on the form of the kernels. There is a rich literature focusing on identifiability for Matérn kernels, where it has been shown that
That is, when the domain dimension is greater than or equal to 5, then , so all three parameters are identifiable (Anderes, 2010; Bolin and Kirchner, 2023); when the domain dimension is less than or equal to 3, then is identifiable (Loh et al., 2021), but not or (Zhang, 2004). As a result, there is no consistent estimator of or , but instead, a consistent estimator of , called the microergodic parameter, does exist (Kaufman and Shaby, 2013; Loh et al., 2021), namely the MLE. The microergodic function for is an open problem.
Although the identifiability of Matérn has been understood, the study of other kernels including Per and RQ is much sparser. The key reason is that the tool to study equivalence between Matérn kernels, known as the integral test (Stein, 1999), requires strong conditions on the spectral densities of the kernel, which are not often satisfied by other kernels. The spectral density is defined below:
Definition 6 (Spectral measure).
For a stationary kernel , its spectral measure, denoted by , is defined through
Bochner’s theorem guarantees the existence and uniqueness of . The density of w.r.t. the Lebesgue measure , denoted by , if it exists, is called the spectral density.
The condition to use the integral test is that as for some . That is, the spectral density is required to behave like for some positive . The spectral density of Matérn is (Rasmussen and Williams, 2006, p. 84) (note that we use a different Fourier transform convention than (Rasmussen and Williams, 2006).) However, this condition is not met by RBF, Per, or RQ, as their spectral densities decay very rapidly due to the infinite differentiability of the kernels.
Due to the popularity of these kernels in ML, we aim to address these challenges and study the equivalence of GPs with these kernels and their combinations. The next section provides theoretical support for the success of these kernels in terms of identifiability and interpretability.
3. Theory
In this section, we present our main theory regarding equivalence of GPs, as outlined in the previous sections. We first determine the identifiable parameters of the individual kernels used in Equation (2), i.e., SE, Per, and RQ, with some extensions.
Theorem 3.1.
The microergodic functions of 5 individual kernels in Table 1, including all four components , , , in Equation (2) and an additional kernel, Cosine, are summarized in Table 2.
Table 2:
Microergodic functions of five kernels
| Name | Parameters | Microergodicity | ||
|---|---|---|---|---|
| SE | , | ≥ 1 | ||
| Per | , , | 1 | ||
| Damped Per | , , , | 1 | ||
| RQ | , , | ≥ 1 | ||
| Cosine | , , | ≥1 |
Theorem 3.1 supports the identifiability and interpretability of each kernel parameter in SE, Per and RQ, as discussed in Section 2.2. In addition, we include the cosine kernel, which will be revisited later in this section.
Then we consider the combination of SE, PER and RQ in Equation (2), an extension of the kernel used by the impactful book Rasmussen and Williams (2006) and the tutorial of the widely used Python package “sklearn.gaussian_process”.
Theorem 3.2.
All parameters in Equation (2) are identifiable provided , the length-scale of the SE component to model the correlated noise, is less than , the length-scale of the SE component to model the long-term trend.
Such a constraint is necessary, and not surprising, since otherwise, say, if , then we can merge the two SE components into a single SE: , making identifiable instead of and . This distinction of two SE components is also discussed in Section 5.4.3 in Rasmussen and Williams (2006). Excluding this trivial case, all parameters are identifiable. As a consequence, these parameters are interpretable, as discussed in the same section in Rasmussen and Williams (2006). For example, measures the amplitude and measures the characteristic length-scale of the long-term smooth rising trend; within the seasonal trend, gives the magnitude, gives the decay time for the periodic component, gives the period, while is the smoothness of the periodic component; for the (small) medium term irregularities, is the magnitude, is the typical length-scale and is the shape parameter determining diffuseness of the length-scales; is the magnitude of the correlated noise component, is its lengthscale and is the magnitude of the independent noise component.
Now we would like to answer the following more challenging question with a broader implication: Given a new kernel, how do we determine the microergodic function? Specifically, if we combine a finite number of kernels, such as those in Table 2, by finite multiplication and addition like Equation (2), what is the microergodic function of the resulting kernel? To answer these questions, we need to introduce the following notions first.
Lemma 3.3 (Kernel decomposition).
For any stationary kernel , can be uniquely decomposed as , where is a kernel with continuous spectral measure and is another kernel with discrete spectral measure.
A direct consequence is that if admits a spectral density, then ; while if is periodic, then . We call the continuous component and the discrete component. Note that this notion is different from continuous and discrete functions, and we do assume all kernels are continuous functions themselves. Here the continuous and discrete notion is at the spectrum level. For example, the Per kernel, is continuous as a function, but has a purely discrete spectrum. Moreover, we denote the spectral measure of as and the spectral measure of as , where is the spectral measure of .
Such a decomposition offers deeper insights to understand different types of kernels. Moreover, to understand the equivalence of GPs, it suffices to understand the equivalence of its continuous component and discrete component separately, given by the following key theorem, which is the main result of the paper:
Theorem 3.4.
Given two kernels and with holomorphic on some ball around 0 in , the -dimensional complex space, then if and only if the following two conditions hold:
for every .
There are , such that for all and .
Note that is said to be holomorphic on a ball around 0 in , if it has a holomorphic extension to some ball around 0, such that on . While being holomorphic on a ball around 0 is a stronger condition than being infinitely differentiable, most infinitely differentiable kernels used in practice, including all those in Table 2, are holomorphic on a ball around 0. Condition 1 means the continuous components of and are the same, while Condition 2 means the discrete components of and have the same support, and their relative difference, although allowed to be nonzero, should decay fast enough.
Notably, Theorem 3.4 provides a general pipeline to study the identifiability of kernel parameters, summarized in the following theorem:
Theorem 3.5.
Let be a family of stationary kernels on , each of which is holomorphic on some ball around 0 in . We have the following assertions regarding the microergodic function:
If is microergodic for the continuous component and is microergodic for the discrete component , then is microergodic for .
- Moreover,
- is microergodic for if and only if
- is microergodic for if and only if
That is, in order to find the microergodic function of a parametric family of kernels , it suffices to find the microergodic function of the continuous component, and of the discrete component separately. Moreover, to find , it suffices to understand when two continuous components are equal everywhere; to find , we need to investigate the conditions 2b about the discrete measure .
Having established the foundational aspects of kernel identifiability, we now apply our results to determine the microergodic function of various combinations of kernels. Our general strategy is to use Fourier transform identities to compute the spectral measure of the combined kernel (see, for example, Theorem B.6) and then apply Theorem 3.4. These combinations not only illustrate the practical applications of our theoretical findings in Theorem 3.4, but also provide insights into designing new kernels with desired properties. We start with the squared exponential kernel with automatic relevance determination (ARD).
Theorem 3.6.
For the family
where and is a positive-definite matrix, the microergodic function is .
Next, we study the sum of cosine kernels:
Theorem 3.7.
For the family
where and , under the natural constraint , the microergodic function is .
Theorem 3.7 shows that when cosine kernels are combined linearly, their individual frequencies (or periods) remain identifiable, provided they are distinct. This scenario often arises in signal processing where different periodic components need to be isolated and identified. Notably, the last kernel in Table 1 is a special case of the kernel in Theorem 3.7 with .
Next, we study the product of Cosine kernels:
Theorem 3.8.
For the family
where and , under the natural constraint , the microergodic function is . If , then the mircoergodic function simplifies to .
In Theorem 3.8, when , we do not have identifiability of . For example, for , when and , the values of the microergodic function coincide, that value being . Theorem 3.8 shows that for a product of discrete spectrum kernels that are all a function of the same variable , the parameters of each individual kernel may not be identifiable.
Finally, we explore the sum of periodic kernels as previously discussed.
Theorem 3.9.
Let denote the periodic kernel with variance parameter 1, length-scale , and period . For the family
where , , , , , the microergodic function is , that is, all parameters are identifiable.
This result is crucial for scenarios where multiple periodic processes operate at different scales or periods, as often encountered in geospatial, financial, and environmental data analysis.
4. Simulation
In this section, we provide empirical support to our theoretical results on kernel parameter identifiability, presented in Section 3, by investigating the behavior of the maximum likelihood estimators (MLEs) as the sample size increases.
Before moving to the simulation details, we would like to clarify the broader picture of parameter inference for GPs, which involves three steps: first, determining which parameters are identifiable; second, finding a consistent estimator of identifiable parameters, such as the MLEs or others estimators; and third, developing numerical methods to compute these estimators. While the second and the third steps are crucial, they fall beyond the scope of this paper, which focuses solely on the first step–a theoretical framework to find all the identifiable parameters. In fact, even for simple kernels like the SE and Matérn kernels, whether the MLE is consistent remains open (Loh and Sun, 2023).
Despite these complexities, we use standard optimization packages commonly applied in the GP literature to find the MLEs. Our simulations are not intended to solve the open problem of MLE consistency or introduce new numerical techniques; rather, they serve to illustrate the theoretical results on identifiability through practical examples.
We start from individual kernels, followed by the combination in Equation (2).
4.1. Individual kernels
We consider the individual kernels: SE, Damped Per (DPer), Per, RQ, and Cosine. For the cosine kernel, we parameterize in terms of the period so that . Input samples are generated by adding a unif random shift to evenly spaced points in , where . After generating the outcomes by sampling a GP with the given kernel at the inputs, we added independent Gaussian noise from , , to model measurement errors (see Section D of the appendix for the experiments repeated with ). All kernel parameters were estimated by MLEs, with 100 replicates for each kernel configuration to assess the convergence of the MLEs. The results are summarized in Figure 1. These boxplots demonstrate that the MLEs of all parameters except in the cosine kernel appear consistent, as conjectured by their identifiability, proved in Theorem 3.1.
Figure 1:
Simulation results for various kernel types. Each subfigure shows the boxplots of MLEs for the corresponding kernel, with ground truth in horizontal dashed line.
Some of the MLE standard deviations appear to plateau for large . One explanation for this is numerical limitations – for our squared exponential simulation, where and , the condition numbers of the covariance matrix of the observations are , , , and for sample sizes 500, 1000, 2000, and 5000, respectively.
The failure of the MLE of in the cosine kernel to converge is in agreement with the microergodicity of . In fact, if we treat as known and let the noise variance decrease to 0, then since the covariance matrix has rank 2 for all , it can be shown that the MLE converges to .
4.2. The combined kernel in Equation (2)
Then, we study the combined kernel, one motivating kernel of this paper, defined in Equation (2). Since the kernel was proposed for forecasting CO2level on the Mona Loa dataset, we set the time interval to be [0, 45], presenting the time span of 45 years. Input samples are generated by adding a unif random shift to evenly spaced points in , where . All kernel parameters were estimated by MLEs, with 100 replicates to assess the convergence of the MLEs. Moreover, to further mimic this dataset, the ground truth parameters and noise variance are set to be the MLEs learned from running the “Gaussian process regression” package from the scikit-learn Python package. All truth parameters to be estimated are given by Table 3.
In Figure 2, we again observe that the MLEs generally are unbiased, but for some parameters, their variance does not strictly decrease with sample size. This is likely due to the relatively large number of parameters (10) compared to the small sample size of 500.
Figure 2:
MLEs of parameters in Equation (2), with ground truth in horizontal dashed line.
5. Discussion
This paper has introduced a novel analytical framework that advances the theory of identifiability of kernel parameters in GPs for a large class kernels, those holomorphic around 0. We have demonstrated that all the parameters in certain combinations of kernels, such as the example employed on the Mauna Loa CO2 time series data, are indeed identifiable. This establishes a robust theoretical foundation for selecting or constructing GP kernels and determining the identifiable functions of the parameters in practical applications.
Looking ahead, several avenues of future research present themselves as particularly promising and interesting. First, while establishing the identifiability of kernel parameters is a critical step, it does not necessarily guarantee the consistency of the MLE. The analysis of MLEs is complicated due to the complex nature of the likelihood function involved, which is often multi-modal and difficult to handle. Second, extending our theoretical framework to encompass non-stationary kernels could enhance the flexibility of GPs in modeling data with evolving trends and dynamics. This area is notably challenging due to the current limitations in mathematical tools available, presenting a largely open problem in the field. Third, another intriguing direction for research involves extending our findings to infinitely differentiable kernels that are not holomorphic near 0, though most infinitely differentiable kernels used in applications are holomorphic near 0.
Supplementary Material
Acknowledgment:
AQ was supported by NIH grant R37 AI029168; DL was supported by NIH grants P30 ES010126, R01 HL149683, R01 HL173044, R01 LM014407, R56 LM013784, UM1 TR004406.
Footnotes
Reproducibility Statement: All code used to produce the results of this paper are provided Appendix A. Complete proofs of all lemmas and theorems stated in the paper are provided in Appendix B.
Ethics Statement: Our paper does not deal with sensitive experiments, data, or any methods that can be expected to cause harm. We have no conflicts of interest and have no data privacy concerns.
References
- Abramowitz M. and Stegun IA (1964). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, Volume 55 of Applied Mathematics Series. Washington, D.C.: National Bureau of Standards. Reprinted by Dover Publications, 1972. [Google Scholar]
- Anderes E. (2010). On the consistent separation of scale and variance for Gaussian random fields. The Annals of Statistics. [Google Scholar]
- Banerjee S, Carlin BP, and Gelfand AE (2014). Hierarchical modeling and analysis for spatial data. CRC press. [Google Scholar]
- Bolin D. and Kirchner K. (2023). Equivalence of measures and asymptotically optimal linear prediction for Gaussian random fields with fractional-order covariance operators. Bernoulli 29(2), 1476–1504. [Google Scholar]
- Chen J, Mu W, Li Y, and Li D. (2024). On the identifiability and interpretability of Gaussian process models. Advances in Neural Information Processing Systems 36. [Google Scholar]
- Cheng L, Ramchandran S, Vatanen T, Lietzén N, Lahesmaa R, Vehtari A, and Lähdesmäki H. (2019). An additive Gaussian process regression model for interpretable non-parametric analysis of longitudinal data. Nature communications 10(1), 1798. [Google Scholar]
- Cressie N. and Wikle CK (2015). Statistics for spatio-temporal data. John Wiley & Sons. [Google Scholar]
- Duvenaud D, Lloyd J, Grosse R, Tenenbaum J, and Zoubin G. (2013, 17–19 Jun). Structure discovery in nonparametric regression through compositional kernel search. In Dasgupta S. and McAllester D. (Eds.), Proceedings of the 30th International Conference on Machine Learning, Volume 28 of Proceedings of Machine Learning Research, Atlanta, Georgia, USA, pp. 1166–1174. PMLR. [Google Scholar]
- Duvenaud DK, Nickisch H, and Rasmussen C. (2011). Additive Gaussian processes. Advances in neural information processing systems 24. [Google Scholar]
- Feldman J. (1958). Equivalence and perpendicularity of Gaussian processes. Pacific J. Math 8(4), 699–708. [Google Scholar]
- Ghosal S. and van der Vaart AW (2017). Fundamentals of nonparametric Bayesian inference, Volume 44. Cambridge University Press. [Google Scholar]
- Ibragimov I. and Rozanov Y. (1978). Conditions for regularity of stationary random processes. In Gaussian Random Processes, pp. 108–143. Springer. [Google Scholar]
- Kaufman C. and Shaby BA (2013). The role of the range parameter for estimation and prediction in geostatistics. Biometrika 100(2), 473–484. [Google Scholar]
- Kronberger G. and Kommenda M. (2013). Evolution of covariance functions for Gaussian process regression using genetic programming. [Google Scholar]
- Lawrence N. (2003). Gaussian process latent variable models for visualisation of high dimensional data. In Thrun S, Saul L, and Schölkopf B(Eds.), Advances in Neural Information Processing Systems, Volume 16. MIT Press. [Google Scholar]
- Lee J, Bahri Y, Novak R, Schoenholz SS, Pennington J, and Sohl-Dickstein J. (2018). Deep neural networks as Gaussian processes. In International Conference on Learning Representations. [Google Scholar]
- Li C. (2022). Bayesian fixed-domain asymptotics for covariance parameters in a Gaussian process model. The Annals of Statistics 50(6), 3334–3363. [Google Scholar]
- Li D, Tang W, and Banerjee S. (2023). Inference for Gaussian processes with Matérn covariogram on compact Riemannian manifolds. Journal of Machine Learning Research 24(101), 1–26. [Google Scholar]
- Loh W-L and Sun S. (2023). Estimating the parameters of some common Gaussian random fields with nugget under fixed-domain asymptotics. Bernoulli 29(3), 2519–2543. [Google Scholar]
- Loh W-L, Sun S, and Wen J. (2021). On fixed-domain asymptotics, parameter estimation and isotropic Gaussian random fields with Matérn covariance functions. The Annals of Statistics 49(6), 3127–3152. [Google Scholar]
- Lukacs E, Szász O, et al. (1952). On analytic characteristic functions. Pacific J. Math 2(4), 615–625. [Google Scholar]
- Marx V. (2021). Method of the year: spatially resolved transcriptomics. Nature methods 18(1), 9–14. [DOI] [PubMed] [Google Scholar]
- Rasmussen CE and Williams CKI (2006). Gaussian processes for machine learning. Adaptive computation and machine learning. MIT Press. [Google Scholar]
- Remes S, Heinonen M, and Kaski S. (2017). Non-stationary spectral kernels. Advances in neural information processing systems 30. [Google Scholar]
- Samo Y-LK and Roberts S. (2015). Generalized spectral kernels. [Google Scholar]
- Stein ML (1999). Interpolation of spatial data: some theory for kriging. Springer Science & Business Media. [Google Scholar]
- Sun S, Zhu J, and Zhou X. (2020). Statistical analysis of spatial expression patterns for spatially resolved transcriptomic studies. Nature methods 17(2), 193–200. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Svensson V, Teichmann SA, and Stegle O. (2018). Spatialde: identification of spatially variable genes. Nature methods 15(5), 343–346. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Tans P. and Keeling R. (2023). Trends in atmospheric carbon dioxide. https://gml.noaa.gov/ccgg/trends/data.html. Accessed: 2023-08-01. [Google Scholar]
- Verma A. and Engelhardt BE (2020). A robust nonlinear low-dimensional manifold for single cell RNA-seq data. BMC bioinformatics 21, 1–15. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wang J, Yam WK, Fong KL, Cheong SA, and Wong KM (2018). Gaussian process kernels for noisy time series: Application to housing price prediction. In Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, December 13–16, 2018, Proceedings, Part VI 25, pp. 78–89. Springer. [Google Scholar]
- Weber LM, Saha A, Datta A, Hansen KD, and Hicks SC (2023). nnSVG for the scalable identification of spatially variable genes using nearest-neighbor Gaussian processes. Nature communications 14(1), 4059. [Google Scholar]
- Wilson A. and Adams R. (2013). Gaussian process kernels for pattern discovery and extrapolation. In International conference on machine learning, pp. 1067–1075. PMLR. [Google Scholar]
- Zhang H. (2004). Inconsistent estimation and asymptotically equal interpolations in model-based geostatistics. Journal of the American Statistical Association 99(465), 250–261. [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.


