Skip to main content
Journal of Applied Statistics logoLink to Journal of Applied Statistics
. 2025 Jul 17;53(4):673–709. doi: 10.1080/02664763.2025.2532621

Efficient spline orthogonal basis for representation of density functions

Jana Burkotová a,CONTACT, Ivana Pavlů a, Hiba Nassar b, Jitka Machalová a, Karel Hron a
PMCID: PMC12985404  PMID: 41836473

Abstract

Probability density functions form a specific class of functional data objects with intrinsic properties of scale invariance and relative scale characterized by the unit integral constraint. The Bayes spaces methodology respects their specific nature, and the centred log-ratio transformation enables processing such functional data in the standard Lebesgue space of square-integrable functions. As the data representing densities are frequently observed in their discrete form, the focus has been on their spline representation. Therefore, the crucial step in the approximation is to construct a proper spline basis reflecting their specific properties. Since the centred log-ratio transformation forms a subspace of functions with a zero integral constraint, the standard B-spline basis is no longer suitable. Recently, a new spline basis incorporating this zero integral property, called ZB-splines, was developed. However, this basis does not possess the orthogonal property which is beneficial from computational and application point of view. As a result of this paper, we describe an efficient method for constructing an orthogonal ZB-splines basis, called ZB-splinets. The advantages of the ZB-splinet approach are foremost a computational efficiency and locality of basis supports that is desirable for data interpretability, e.g. in the context of functional principal component analysis. The proposed approach is demonstrated on two empirical datasets.

Keywords: Spline approximation, orthogonalization, efficiency, probability density function, functional data analysis, Bayes space

2020 Mathematics Subject Classifications: 62G07, 62R10, 65D07, 65D10

1. Introduction

Functional data analysis (FDA) is a statistical framework for processing and analysing data that can be represented as continuous functions. In the context of FDA, probability density functions (PDFs) form a specific class of functional data objects as positive functions with the unit integral representation, or more generally, as scale invariant functional objects [32]. They appear in many applications such as geosciences [27], demographics [10,28], economics [29] and more. However, PDFs are usually not recorded directly in their functional form but as discrete observations. Therefore, the first step in FDA is to fit the discrete data with a proper representation, preferably using a basis expansion [13], and then to proceed with statistical methods of FDA. One of the most popular approaches is to use spline representation.

In order to provide a reliable approximation of PDFs, the basis functions should reflect their specific properties which is, however, not the case of the standard spline basis. Therefore, we introduce an efficient spline basis suitable for the representation of PDFs. The proposed approach offers several practical advantages:

  • Structural property invariance. The resulting representation reflects the specific properties of PDFs without additional constraints.

  • Computational efficiency. The computational cost of PDF spline representation is significantly reduced compared to existing methods, especially for smoothing large-scale datasets.

  • Local control. A local adjustment modifies the representation only locally and does not influence the whole domain. This is useful for capturing local variations and enables a more straightforward identification of local patterns in functional data.

  • Enhanced data interpretation. In an analysis of PDFs, particularly in dimension reduction methods like Principal Component Analysis, the proposed approach improves the interpretability in applications and highlights the main independent sources of variance, which helps to understand the functional data behaviour.

These advantages follow from the orthogonality and small local support of the proposed basis functions. In general, the local support property (also called locality), therefore, means that the corresponding functions are nonzero only over a limited region of the domain and thus affect only a part of the domain (typically an interval).

The natural properties of PDFs call for a proper framework for their approximation. The Lebesgue space L2, a standard space for FDA methods, does not respect the primarily relative information carried by PDFs. The L2 geometry is therefore inadequate for PDFs because it ignores their constrained nature, leading to operations that can produce invalid or misleading probability distributions. Instead, the so-called Bayes space methodology was introduced in [7,31,32] and used as a common framework for statistical analysis for PDFs. Although data processing and methodological developments are possible directly in the Bayes space B2 of square-integrable log-functions, one often opts for the equivalent representation of PDFs in the L2 space. The isometric isomorphism between B2 and a subspace of L2 of functions with zero integral L02 represented by the centred log-ratio (clr) transformation [32] enables such one-to-one mapping that equips the resulting functions with the zero integral property while maintaining the relative information of the original densities.

To represent the discrete observations of PDFs as functional data objects, it is common to express them through basis expansion. Therefore, the selection of a proper basis is a crucial step in FDA. Some of the most popular approaches involve B-spline basis [5,6], Fourier basis for periodic data [4], or wavelets [25]. B-spline basis is a usual choice for data approximation since it utilizes the advantageous properties of polynomial functions and benefits from the small local support of individual basis functions. In [2,3,20], a new approach to spline basis selection was presented which used a data-driven knot selection to increase the efficiency of the basis in representing the data.

In general, we can claim that a successful approximation strongly relies on the existence of a suitable class of basis functions reflecting the nature of observed data. Specifically, for clr-transformed PDFs, an appropriate basis is required to maintain the zero integral property. Such a class of spline functions called ZB-splines was first introduced in [18]. However, in the context of many FDA methods, including the foremost popular functional principal component analysis (FPCA) [13] for dimension reduction, the orthogonality of such a spline basis is crucial from the methodological point of view while it is needed to keep it interpretable and computationally efficient. Functional principal components can be calculated directly with the spline coefficients when an orthogonal spline basis is used; see [10,25,30]. Moreover, it is desirable that the resulting orthogonal basis preserves most of the beneficial attributes of the original spline basis and does not violate the characteristic local support property. While this is hardly possible with standard orthogonalization algorithms, the proposed ZB-splinets procedure is able to deal with this challenge.

As mentioned, the approximation of discrete data is a crucial step of data preprocessing. Since the underlying PDF is assumed to be smooth, the smoothing spline is used for approximation [6]. Given that the clr transformation establishes a bijective mapping between B2 and L02 spaces, it seems natural to develop smoothing splines in L02 while adhering to the zero-integral constraint. Smoothing splines having zero integral were first introduced in [17], where necessary and sufficient conditions for the respective B-spline coefficients were provided. In [18], the idea of constructing a smoothing spline as a linear combination of basis functions having zero integral is presented. For this purpose, ZB-splines, which form a basis of the corresponding space and satisfy the zero integral constraint, were defined.

The problem of the orthogonalization of spline bases has been studied in the mathematical literature from both theoretical and application perspectives and can be performed in different ways. In [1,19,26], the orthogonalization of B-spline basis is completed through the well-known Gram-Schmidt process. For uniform periodic splines, an orthonormal basis has been considered in [12] and a quasi-orthogonal basis with a nearly diagonal Gram matrix has been proposed in [9]. However, the orthogonalization through the Gram-Schmidt process has the disadvantage of losing the local property of B-splines since the total support is increasing at every step of the process. Therefore, a modification was proposed in [26] that leads to a smaller total support of the resulting orthogonalized spline basis, as discussed in [14]. Furthermore, a new symmetric orthogonalization procedure for B-splines was introduced in [14], and an associated R package has been developed [15] for this purpose. The process produces a net-like structure of splines called splinets that preserve the locality of B-spline bases, in fact, the total support is only slightly increased compared to the original B-splines.

Although the problem of orthogonalization of B-splines is well-studied, only initial steps in this direction were done in the context of novel ZB-splines for approximation of PDFs. This paper aims to fill this gap. For this purpose, we adapt the splinet approach to ZB-splines and develop a new efficient orthogonal basis ZB-splinets. These splinets form a net of orthonormalized ZB-splines that preserve both the local support property and the zero integral constraint. In this paper, we demonstrate that this method is a more effective orthogonalization approach in contrast to the commonly used Gram-Schmidt method. The orthogonality of the proposed ZB-splinets is crucial for practical statistical analysis of PDFs, such as functional PCA, as it provides a clearer and more interpretable representation, allows for efficient dimensionality reduction and helps to identify the primary source of data variability.

The main contribution of the paper is to provide a construction of an efficient orthogonal basis for representing probability density functions. Specifically, we

  • present how the splinet approach is utilized for orthogonalization of ZB-spline basis;

  • prove the efficiency of the ZB-splinets in terms of the size of its total support and computational complexity of the smoothing spline construction;

  • demonstrate the advantageous properties of ZB-splinets on empirical datasets.

The paper is organized as follows. In Section 2, theoretical aspects of approximation of PDFs are summarized. The idea of ZB-spline approximation in the context of the Bayes spaces framework is introduced, as well as the role of smoothing in the spline representation of PDFs. Section 3 provides the main result of this paper, the construction of a new orthogonal basis for PDF representation and its computational efficiency. The comparison of the proposed orthogonalization approach with standard methods is done in Section 4 by performing functional principal component analysis with empirical demographic and geological datasets. Finally, Section 5 concludes and provides the final outlook.

2. ZB-spline basis

In this section, we present the construction of ZB-spline basis that respects the zero-integral property of clr-transformed PDFs and we utilize this basis for approximation of discretized PDFs with a smoothing spline. To provide a comprehensive understanding, we first offer a brief introduction to the Bayes space framework.

2.1. Notation

For readers' convenience, we summarize the basic notation we use throughout the paper. The Lebesgue space of square-integrable functions over an interval [a,b] is denoted as L2([a,b]), its subspace of functions with zero integral over [a,b] is denoted as L02([a,b]) and the Bayes space of square-integrable log-functions as B2([a,b]). For the inner product and the corresponding norm, the symbols , and , respectively, are used. The vector space of splines of degree kN0=N{0} defined on [a,b] with knots Δλ={λi}i=0g+1:

λ0=a<λ1<<λg<b=λg+1

is denoted as SkΔλ[a,b]. Its subspace of splines with zero integral over [a,b] has the notation ZkΔλ[a,b]. For B-splines of degree kN0 which form a basis of SkΔλ[a,b], we use the standard notation Bik+1(x),i=k,,g. In the space ZkΔλ[a,b], we define a basis of so-called ZB-splines of degree kN0 denoted as Zik+1(x),i=k,,g1.

2.2. Bayes spaces

Probability density functions can be considered as functional compositions characterized by their scale invariance and relative scale properties. Accordingly, the standard space of Lebesgue measurable functions L2 is not appropriate for densities. Therefore, as mentioned in the Introduction section, the Bayes spaces methodology was proposed as a unifying framework for analysing PDFs while maintaining their relative scale properties. Accordingly, operations such as summation, scalar multiplication and inner product are formulated in the Bayes space setting so that they reflect the scale invariance of PDFs.

The Bayes space B2([a,b]) is defined as a space of nonnegative functional compositions with a square-integrable logarithm on a bounded interval [a,b] accompanied by the operations of perturbation and powering, which are defined as

(fg)(x)=f(x)g(x)abf(y)g(y)dy,(αf)(x)=f(x)αabf(y)αdy,αR,x[a,b].

Note that rescaling of the resulting PDFs in perturbation and powering is done merely for the purpose of the usual interpretability with unit integral constraint, nevertheless, it is not needed for the operations themselves. The Bayes space B2([a,b]) with the inner product

f,gB=12ηabablnf(x)f(y)lng(x)g(y)dxdy,η=ba,

is a separable Hilbert space [32]. Consequently, it possesses a suitable geometric structure for statistical analysis of PDFs. Furthermore, the construction of the B2 space allows for an unambiguous representation of the original PDFs in the L2 space. This is achieved through the so-called centred log-ratio (clr) transformation, defined for a density fB2([a,b]) as

clr(f)(x)=fc(x)=lnf(x)1ηablnf(y)dy,x[a,b].

The resulting real functions capture relative properties of the original PDFs while they enable their standard statistical processing using tools of FDA in L2([a,b]). However, the clr transformation implies the additional zero integral condition on the resulting clr-transformed density function. Indeed,

abfc(x)dx=ablnf(x)dxab1ηablnf(y)dydx=0.

Therefore, the clr-transformed PDFs are elements of L02([a,b]). In order to provide a reasonable spline representation of clr-transformed PDFs, a proper spline basis having the same essential properties as the underlying PDFs is required. We describe the construction of such a basis in upcoming sections.

2.3. B-splines

In this section, we first provide a brief general introduction to B-spline representation and their corresponding notation. We denote as SkΔλ[a,b] the vector space of splines of degree kN0, defined on a finite interval [a,b] with the sequence of knots Δλ={λi}i=0g+1 given by

λ0=a<λ1<<λg<b=λg+1.

The space SkΔλ[a,b] has a finite dimension [6]

dim(SkΔλ[a,b])=g+k+1,

where g is the number of inner knots in Δλ and k is the degree of splines in SkΔλ[a,b]. A convenient choice of basis functions of SkΔλ[a,b] are B-splines [5,6]. B-splines of degree kN (order k + 1) can be constructed using a recurrent formula

Bik+1(x)=xλiλi+kλiBik(x)+λi+k+1xλi+k+1λi+1Bi+1k(x),

where the B-spline of zero degree is defined as a piecewise constant function

Bi1(x)={1ifx[λi,λi+1)0otherwise.

B-splines are advantageous for their nice structural properties. They are nonnegative functions with small local support, and for k>0 they are differentiable up to the order k−1 on [a,b]. In order to form a B-spline basis of SkΔλ[a,b], additional knots to Δλ need to be considered. A commonly used choice is to consider coincident additional knots

λk==λ1=λ0=a,b=λg+1=λg+2==λg+k+1. (1)

Therefore, the corresponding extended sequence of knots ΔΛ={λi}i=kg+k+1 is given by

λk==λ1=λ0=a<λ1<<λg<b=λg+1=λg+2==λg+k+1. (2)

Then the corresponding B-spline basis consists of B-splines Bik+1(x), i=k,,g defined on the extended sequence of knots ΔΛ (2) and every spline sk(x)SkΔλ[a,b] has a unique representation

sk(x)=i=kgbiBik+1(x),

which can be written in matrix notation as

sk(x)=Bk+1(x)b,

where Bk+1(x)=(Bkk+1(x),,Bgk+1(x)) is a vector of B-spline functions and b=(bk,,bg) is the vector of B-spline coefficients.

2.4. ZB-splines

For the approximation of clr transformed density functions with the zero integral constraint, the use of the standard B-spline basis is not that beneficial because it ignores the condition in L02([a,b]), which causes additional constraints on the spline coefficients as described in [17]. To overcome this inconvenience, a different basis for splines having the zero integral property was defined in [18]. The proposed basis functions are called ZB-splines. The corresponding ZB-spline basis is defined on a sequence of knots extended with coincident additional knots. This is essential for their zero-integral property on the considered interval [a,b], as discussed further. In what follows, we describe their construction and stress their important properties.

The ZB-spline of degree kN0 is defined as a first derivative of B-spline

Zik+1(x):=ddxBik+2(x). (3)

From this definition of ZB-splines and the formula for derivation of B-splines [6], the relation

Zik+1(x)=(k+1)(Bik+1(x)λi+k+1λiBi+1k+1(x)λi+k+2λi+1),k0

follows. Especially for k = 0, ZB-spline is a piecewise constant function

Zi1(x)={1λi+1λiifx[λi,λi+1)1λi+2λi+1ifx[λi+1,λi+2).

A set of linear ZB-splines is depicted in Figure 1 (right) together with the corresponding antecedent set of quadratic B-splines (left). A similar set of quadratic ZB-splines and cubic B-splines is shown in Figure 2.

Figure 1.

Figure 1.

B-splines of degree 2 (left), the corresponding ZB-splines of degree 1 (right).

Figure 2.

Figure 2.

B-splines of degree 3 (left), the corresponding ZB-splines (right).

From the construction of ZB-splines, similar beneficial properties as for B-splines are retained for functions Zik+1(x). ZB-splines are continuous piecewise polynomial functions of degree k with the local support

suppZik+1(x)=suppBik+2(x)=[λi,λi+k+2).

Moreover, ZB-splines Zik+1(x), i=k,,g1 obey the crucial zero integral property

abZik+1(x)dx=abddxBik+2(x)dx=Bik+2(b)Bik+2(a)=00=0.

This follows from the property of B-splines defined on an extended sequence of knots (2) with coincident additional knots (1), see [6]

Bik+2(λi)=0,ki0,Bik+2(λi+k+2)=0,gk1ig1.

Therefore, ZB-splines are proper for spline representation of clr transformed density functions. Note that coincident additional knots in the knot sequence are crucial because they guarantee that the integral of ZB-splines over [a,b] is zero. Equidistant or periodic additional knots do not guarantee this property.

In the following, ZkΔλ[a,b] denotes the vector space of polynomial splines of degree kN0, defined on a finite interval [a,b] with the sequence of knots Δλ={λi}i=0g+1 and having zero integral over [a,b],

ZkΔλ[a,b]:={sk(x)SkΔλ[a,b]:absk(x)dx=0}SkΔλ[a,b].

The dimension of ZkΔλ[a,b] is finite,

dim(ZkΔλ[a,b])=g+k,

and smaller by one than the dimension of SkΔλ[a,b] due to the zero integral condition, see [18]. Finally, the set of g + k ZB-splines Zik+1(x), i=k,,g1 defined on the set ΔΛ (2) with the coincident additional knots forms a basis of the space ZkΔλ[a,b], see [18] for more details. Thus, every spline sk(x)ZkΔλ[a,b] has a unique representation

sk(x)=i=kg1ziZik+1(x),

which can be written in a matrix notation as

sk(x)=Zk+1(x)z,

where Zk+1(x)=(Zkk+1(x),,Zg1k+1(x)) is a vector of ZB-spline functions and z=(zk,,zg1) is a vector of ZB-spline coefficients. The relation between ZB-spline and B-spline representations is given by

sk(x)=Zk+1(x)z=Bk+1(x)b=Bk+1(x)DKz,

where matrices DRg+k+1,g+k+1, KRg+k+1,g+k follow from (3):

D=(k+1)diag(1λ1λk,,1λg+k+1λg),K=(1000011000011000001100001).

We refer the reader to [18] for a detailed description.

2.5. Orthogonalization

The orthogonalization of the ZB-spline basis offers significant advantages in practical applications. We demonstrate a need for an efficient orthogonal process on simplicial functional principal component analysis in Section 4. In this section, we provide a general description of orthogonalization techniques, where we do not distinguish among different methods for basis orthogonalizations. However, later on, we discuss and compare three particular orthogonalization approaches listed in the following outline:

  • Gram-Schmidt orthogonalization is a standard orthogonalization approach. As the method starts from the first basis function and continues to the next following their order in a set, we denote this approach as one-sided Gram-Schmidt orthogonalization.

  • Symmetric Gram-Schmidt orthogonalization is a modification of the standard method. It proceeds simultaneously from the first and the last basis function in the set reversibly until the functions overlap. The remaining central basis functions are orthogonalized in a symmetric manner. Therefore, we refer to this approach as symmetric two-sided Gram-Schmidt orthogonalization.

  • Dyadic orthogonalization utilizes a locality and resulting natural orthogonality of particular subsets of basis functions and forms a net-like structure of orthogonal basis functions suitable for efficient approximation. In the context of this paper, we denote the resulting orthogonalized set or better net of ZB-splines as ZB-splinet.

Regardless of the specific approach used, orthogonalization can be represented by a linear transformation Φ that maps the ZB-spline basis to an orthogonal spline basis. The new orthogonal basis functions are denoted as Oik+1(x), i=k,,g1. This transformation can be converted to matrix formulation

Ok+1(x)=ΦZk+1(x),

where Ok+1(x)=(Okk+1(x),,Og1k+1(x)) is a vector of corresponding basis functions that are orthogonal, i.e.

Oik+1(x),Ojk+1(x)=δij,i,j=k,,g1.

Then splines sk(x)ZkΔλ[a,b] can be represented as a linear combination of the new basis Oik+1(x), i=k,,g1

sk(x)=i=kg1oiOik+1(x)=Ok+1(x)o,

where o=(ok,,og1) are coefficients of the new spline basis representation.

2.6. Smoothing splines

The next step in spline approximation of PDFs is to reconstruct the underlying density based on discrete distributional observations, usually resulting from the aggregation of the raw values using histograms [18]. The approximation is formulated as a linear combination of basis functions and the quality of the estimate is usually measured by the mean square error. However, since the underlying density is assumed to be smooth, a smoothing spline is employed. The smoothing spline represents a trade-off between achieving the best fit to the observed data and ensuring that the resulting approximation remains smooth.

We suppose to have discrete observations of a probability density function. These data are transformed with a discrete version of the clr transformation and the resulting data points are denoted as (xi,yi), where axib, i=1,,n. The desired smoothing spline skZkΔλ[a,b] minimizes the following functional,

Jl(sk)=(1α)ab[sk(l)(x)]2dx+αj=1nwj[yjsk(xj)]2,

where the weights used in the approximation are denoted as wi>0, i=1,,n. The parameter α(0,1] is a smoothing parameter that measures the compromise between the best fit to the data and the variability of the approximation controlled by the order of the spline derivative l{1,,k1}. A practical approach for selecting the smoothing parameter is to use cross-validation, which helps to identify the optimal value by evaluating model performance on a validation set. Alternatively, α can be chosen based on user control, allowing users to adjust it according to the specific problem or dataset. The functional Jl can be rewritten in a matrix notation in the form of a quadratic function,

Jl(o)=(1α)oNklo+α[yOk+1(x)o]W[yOk+1(x)o],

where x=(x1,,xn), y=(y1,,yn), w=(w1,,wn), W=diag(w), Nkl is a positive semidefinite matrix

Nkl=(nijkl)i,j=kg1,nijkl=(Oik+1(x))(l),(Ojk+1(x))(l)

and Ok+1(x) is a collocation matrix of orthogonalized ZB-splines in given data, i.e.

Ok+1(x)=(Okk+1(x1)Og1k+1(x1)Okk+1(xn)Og1k+1(xn)).

Our objective now is to determine the minimum of the quadratic function Jl(o), which can be written as

Jl(o)=o[(1α)Nkl+αOk+1(x)WOk+1(x)]o2αoOk+1(x)Wy+αyWy. (4)

It is obvious that this function has just one minimum if and only if its Hessian is positive definite. From (4) one can see that

(1α)Nkl+αOk+1(x)WOk+1(x)isp.d.Ok+1(x) is of full column rank.

The collocation matrix Ok+1(x) is of full column rank iff there exists {uk,,ug1}{x1,,xn} with ui<ui+1, such that λi<ui<λi+k+2, i=k,,g1. Then, in the case of a positive definite Hessian of function Jl(o), its minimum can be expressed as

o=[(1α)Nkl+αOk+1(x)WOk+1(x)]1Ok+1(x)Wy (5)

and the corresponding smoothing spline sk(x) for given data is

sk(x)=i=kg1oiOik+1(x)=Ok+1(x)o,

where o=(ok,,og1) is the vector of coefficients for related basis functions. In case the collocation matrix Ok+1(x) is not of full column rank, then the Hessian of function Jl(o) is positive semidefinite and we can find so-called optimal smoothing spline by using pseudoinverse, for more detail see in [16].

3. Dyadic orthogonalization

A spline basis can be orthogonalized in several ways. One of the well-known approaches is the Gram-Schmidt (GS) orthogonalization method, which recursively orthogonalizes each element of the basis with respect to an already orthogonal subset of basis functions. However, GS orthogonalization violates the locality of a spline basis with small local supports, since the support of corresponding orthogonalized functions grows up to the entire interval. We refer to the primal GS methods as one-sided GS. The more efficient approach is a modified GS orthogonalization called the symmetric two-sided GS method. The idea is to use one-sided GS applied to the left half of the interval in a right-to-left manner and to the right half of the interval in a left-to-right manner. The remaining basis splines located in the center of the interval are symmetrically orthogonalized with respect to every basis function. For details, we refer to [14,19]. The size of the total support can be further reduced using the splinet approach which is briefly described in the next subsection. Then, we introduce ZB-splinets as an effective tool for orthogonalizing ZB-splines. Finally, we discuss the quality of these three approaches in terms of locality and computational efficiency.

3.1. Splinets

Splinets as an efficient approach for orthogonalization of the B-splines were first introduced in [14]. The main advantage is that splinets preserve some of positive properties of B-splines, including locality and computational efficiency. The locality is defined by the small size of the spline support and computational efficiency results from the moderate number of orthogonalization steps required for the B-splines to be orthogonal and, mainly, from the sparsity of corresponding collocation and Gramian matrices. Due to its ability to maintain locality, the preferred property of original splines, splinets are prioritized over the other two methods mentioned earlier, namely one-sided GS orthogonalization and two-sided GS orthogonalization.

A better way to visualize this orthogonal basis of splines would be as a net rather than a sequence of orthogonalized functions. Splinets are performed recursively using a dyadic structure of support levels. Figure 3 shows the dyadic structure for the first degree B-splines (left) and the corresponding splinets (right). The smallest support intervals are at the lower level and increasing support intervals over different layers in the dyadic structure. For B-splines of degree k, we bind every k adjacent B-splines into a group called a k-tuplet, then build the dyadic structure. For more details about the splinet algorithm, see [14,23]. We note that these references describe the splinet approach applied to B-spline with zero boundary conditions.

Figure 3.

Figure 3.

B-splines of degree 1 left, the corresponding B-splinets right.

A fascinating result of the splinet algorithm is that one can get a partially orthogonal basis if the algorithm stops at some sufficiently large number l of iterations. In a sense, the partially orthogonal basis is similar to the fully orthogonal basis with a minor error. However, it has the advantage of smaller support and a reduction in the number of inner products required compared to the fully orthogonal basis.

3.2. ZB-splinets

In this section, we apply the splinet approach to ZB-spline basis. As ZB-splines of degree k are defined as first derivatives of B-splines of degree k + 1, it is easily realized that for kN0

suppZik+1(x)=suppBik+2(x)=[λi,λi+k+2).

In the sequel, we will limit our discussion to the fully dyadic case and equispaced knots to simplify computations and facilitate a clear discussion. To construct the dyadic structure, we determine the number of equispaced inner knots

g=(2N1)(k+1)k, (6)

where NN denotes the number of support levels. First, we bind every k + 1 adjacent ZB-splines of degree k into a  (k+1)-tuplet (as we bind k + 1 adjacent B-splines of degree k + 1). Second, we structure the support level N such that the tuplets at each level have disjoint supports. Figure 4 is an example of the dyadic structure of first-degree ZB-splines, that we bind every two adjacent ZB-splines into a tuplet, and the support level N = 4. A similar dyadic structure of quadratic ZB-splines is presented in Figure 5. The general schematic figure of how the dyadic structure builds the splinets is presented in Figure 6. The total number of g + k basis functions Zik+1,i=k,(2N2)(k+1) are divided into (k+1)-tuplets. The positions of resulting 2N1 tuplets in the dyadic structure are shown in the schematic diagram.

Figure 4.

Figure 4.

ZB-splines of degree 1 (left), the corresponding ZB-splinets (right).

Figure 5.

Figure 5.

ZB-splines of degree 2 (left), the corresponding ZB-splinets (right).

Figure 6.

Figure 6.

Schematic diagram for the dyadic structure of splinets.

3.2.1. Dyadic orthogonalization

The algorithm for constructing ZB-splinets follows these steps:

  • Step 1:

    At the lower level, the tuplets have disjoint supports, i.e. each tuplet is orthogonal with respect to the other tuplets at the lower level. It remains to orthogonalize the splines in each tuplet with the symmetric two-sided GS algorithm described below.

  • Step 2:

    Orthogonalize all the upper levels with respect to the lower one. With the dyadic structure in place, each tuplet only needs to be orthogonalized with respect to the two adjacent tuplets (one from each side).

  • Step 3:

    The splines at the lower level are orthogonal to all other splines and form the lower level of the ZB-splinets structure. By removing the lower level and repeating these steps for the remaining N−1 support levels, the entire set of ZB-splines can be orthogonalized in a recursive manner.

Although an alternative approach can be used to construct this dyadic structure, we found that our choice is a natural way to imitate the splinets algorithm. The pseudocode of this dyadic orthogonalization is given in Algorithm 1.

3.2.1.

3.2.2. Symmetric two-sided Gram-Schmidt orthogonalization

A modification of the Gram-Schmidt orthogonalization method that preserves the symmetric property was proposed in [26]. We performed the symmetric two-sided GS to ordered ZB-splines within the dyadic orthogonalization with respect to a central point, which is here chosen as a midpoint of the interval determined by their supports.

  • Step 1:

    Perform one-sided GS orthogonalization on ZB-splines with supports on each of the two halves of the interval separately. The left-to-right orthogonalization is performed to the ZB-splines which have their supports on the left half, and the right-to-left orthogonalization is applied to the ZB-splines with their supports in the right half of the interval, respectively.

  • Step 2:
    Orthogonalize remaining central ZB-splines. Bind the first and the last of the remaining ZB-splines and orthogonalize them with respect to already orthogonalized ones. Then, perform the pairwise symmetric orthogonalization as follows
    z~i=(11+zi,zj+11zi,zj)zi2+(11+zi,zj11zi,zj)zj2z~j=(11+zi,zj11zi,zj)zi2+(11+zi,zj+11zi,zj)zj2
    where the pair zi,zj is the first pair of the remaining central splines, and zi,zj denotes their scalar product. Repeat this procedure with a pair of second and last-to-one remaining ZB-splines and so forth with other pairs of central ZB-splines until either all central ZB-splines are orthogonalized (the total number of splines is even) or only one ZB-spline remains (the total number of splines is odd) and orthogonalize the last one with respect to all other previously orthogonalized splines.

The corresponding pseudocode is given in Algorithm 2. For further details on the splinet approach in the context of B-splines orthogonalization we refer to [14].

3.2.2.

3.3. Efficiency of the ZB-splinets

The main advantage of ZB-splinets is that they acquire the locality and computational efficiency of splinets. The locality is indicated by the small size of the total support of ZB-splinets. For the three different orthogonalizations we have already introduced, we compare the relative total supports, which is the ratio of the total support size of a basis over its domain. Computational efficiency of ZB-splinets can be assessed in terms of the number of inner products one has to evaluate in the orthogonalization process or in terms of the computational cost of smoothing spline representation, i.e. in terms of the number of multiplications needed for its construction. In the sequel, we will limit our discussion to the fully dyadic case and equispaced knots to simplify computations and facilitate a clear discussion. The following propositions state these two properties, and the proofs provided in the Appendix follow those in [14] derived for B-splines.

Proposition 3.1

Let ΔΛ={λi}i=kg+k+1 be a dyadic set of knots with equispaced g inner knots (2), g=(2N1)(k+1)k, NN. Then the relative total support of orthogonalized ZB-splines of degree kN0

  • (i)

    for one-sided Gram-Schmidt orthogonalization is g2+k+11g+1,

  • (ii)

    for symmetric two-sided Gram-Schmidt orthogonalization is g4+k+742g+1,

  • (iii)

    for splinet orthogonalization is (k+1)log2(g+kk+1+1).

Remark 3.1

It is worth noting that two-sided GS orthogonalization produces a relative total support that is approximately half compared to the one-sided orthogonalization. Both approaches result in a support size on the order of g, while the relative total support of the ZB-splinet is on the order of log(g), where g is the number of inner knots. The sizes of the relative total support for specific values of ZB-spline degree k and level supports N are compared in Table 1.

Table 1.

Relative total support for different orthogonalization approaches.

one-sided GS N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 2 4.3333 8.4286 16.4667 32.4839 64.4921
k = 2 3 6.3750 12.4500 24.4773 48.4891 96.4947
k = 3 4 8.4000 16.4615 32.4828 64.4918 128.4960
k = 4 5 10.4167 20.4688 40.4861 80.4934 160.4968
symmetric two-sided GS N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 2 3.6667 5.8571 9.9333 17.9677 33.9841
k = 2 3 5.2500 8.4000 14.4545 26.4783 50.4894
k = 3 4 6.8000 10.9231 18.9655 34.9836 66.9920
k = 4 5 8.3333 13.4375 23.4722 43.4868 83.4936
ZB-splinet N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 2 4 6 8 10 12
k = 2 3 6 9 12 15 18
k = 3 4 8 12 16 20 24
k = 4 5 10 15 20 25 30

Remark 3.2

From the previous proposition, it is evident that the total relative support of orthogonalized ZB-splines, whether obtained through one-sided or two-sided symmetric orthogonalization, depends on the location of the knots. In contrast, with splinets, the total relative support does not depend on the knot placement.

Proposition 3.2

Consider the dyadic structure case for the ZB-splines of degree k. Then the number of evaluations of inner products needed

  • (i)

    for one-sided orthogonalization is (k+1)g+k(k1)/21,

  • (ii)

    for two-sided orthogonalization is (k+1)(2g+k2)/2=(k+1)g+k(k1)/21,

  • (iii)
    for splinet orthogonalization is
    (5k+42)(g+k)2log2(g+kk+1+1)(k+1)2.

Table 2 gives the number of inner products needed for the orthogonalization of a  ZB-spline basis with different approaches for selected number of spline degrees k and support levels N in their dyadic structure. It turned out that the ZB-splinet approach requires about twice the inner product evaluations that the Gram-Schmidt orthogonalization. However, all approaches keep the same asymptotic order.

Table 2.

Number of inner products for different orthogonalization approaches.

one-sided/two-sided GS N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 1 9 25 57 121 249
k = 2 3 21 57 129 273 561
k = 3 6 38 102 230 486 998
k = 4 10 60 160 360 760 1560
ZB-splinet N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 1 11 39 103 239 519
k = 2 3 27 93 243 561 1215
k = 3 6 50 170 442 1018 2202
k = 4 10 80 270 700 1610 3480

Remark 3.3

The number of inner products required for Gram-Schmidt orthogonalization and ZB-splinet orthogonalization are both of order O(g), where g is the number of inner knots. It is worth emphasizing that while both methods share the same asymptotic order, ZB-splinet orthogonalization offers the advantage of smaller support. Consequently, the data decomposition into ZB-splinets involves fewer multiplications compared to the Gram-Schmidt process. This reduction in computational cost, represents a significant practical improvement. We compare the computational complexity in terms of the number of multiplications needed for PDF representation in more details in the sequel.

When an orthogonalized basis of ZB-splines is available, the computational cost of a smoothing spline depends on the number of nonzero elements in matrices Nkl and Ok+1(x) as they are involved in the corresponding coefficients computations (5). The sparser the matrices are, the more efficient the computation is. The number of nonzero elements of matrix Nkl can be derived as follows.

Proposition 3.3

Consider the dyadic structure case for the ZB-splines of degree k. Then the number of nonzero elements of matrix Nkl

  • (i)

    for one-sided orthogonalization is (g+k)2,

  • (ii)

    for two-sided orthogonalization is (g+k)2(g1)2/2,

  • (iii)
    for splinet orthogonalization is
    (g+k)2(k+1)2(2N+22N2N+1N2).

The number of nonzero elements of matrices Nkl for concrete choices of the spline degree k and the support level N are given in Table 3. In contrast to the matrix Nkl, the number of nonzero elements of the collocation matrix Ok+1(x) depends on the number and position of data points n. The number of nonzero elements of the collocation matrix is proportional to the relative total support of spline basis as the nonzero elements appear only if the particular data point lies within the support of basis functions. Therefore, the number of nonzero elements is approximately n times the size of the relative total support of the corresponding orthogonalized spline basis given in Proposition 3.1 when equidistant knot sequence and uniform distribution of data points is considered. We demonstrate this comparison on two empirical datasets in Section 4.

Table 3.

Number of nonzero elements of matrices Nkl.

one-sided GS N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 4 36 196 900 3844 15876
k = 2 9 81 441 2025 8649 35721
k = 3 16 144 784 3600 15376 63504
k = 4 25 225 1225 5625 24025 99225
symmetric two-sided GS N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 4 28 124 508 2044 8188
k = 2 9 63 279 1143 4599 18423
k = 3 16 112 496 2032 8176 32752
k = 4 25 175 775 3175 12775 51175
ZB-splinet N = 1 N = 2 N = 3 N = 4 N = 5 N = 6
k = 1 4 28 108 332 908 2316
k = 2 9 63 243 747 2043 5211
k = 3 16 112 432 1328 3632 9264
k = 4 25 175 675 2075 5675 14475

The choice of the number and position of knots is an issue of its own. In this paper, we consider the equidistant and dyadic set of knots. Their number is arbitrary as long as it follows the level structure of ZB-splinets. For large datasets, the number of knots can be chosen proportionally to the number of data points e.g. g=O(n1/5) (as this is the default choice in package stats in the statistical software R for cubic smoothing spline approximation). In this setting, the number of nonzero elements of the collocation matrix Ok+1(x) is O(n6/5) for one-sided and two-sided Gram-Schmidt orthogonalization and O(log(n)) for the ZB-splinet approach.

4. Applications

To showcase the benefits of using an orthogonal basis for representation of clr transformed densities and to compare the different orthogonalizing approaches, two empirical datasets from the fields of demography and geology are analysed in this section, respectively. First, however, a short introduction to functional principal component analysis is offered to the reader as it is a tool of FDA where the orthogonality of the underlying basis plays a key role.

4.1. Functional principal component analysis

In many applications of multivariate and/or high-dimensional statistics, it is necessary to start with reducing the dimensionality of the data, i.e. with expressing the data in a new reduced vector space while keeping a sufficient amount of the original information intact. Principal component analysis (PCA) handles this by constructing new variables as linear combinations of the original ones which better capture the data variability. These variables, commonly called principal components, are obtained as eigenvectors of the sample covariance matrix and therefore create a sequence of mutually orthogonal vectors. This further ensures the interpretability of the respective eigenvalues as the amount of variability captured by the corresponding principal component. This way, it is possible to decrease the number of considered variables (in this case, principal components) while securing the main sources of variability and possibly filtering out potential noise.

In the context of FDA, one refers to functional principal component analysis (FPCA, [25]). For PDFs, a simplicial functional principal component analysis (SFPCA) was introduced in [10], where the analysis is performed directly on clr transformed PDFs. In this case, the newly constructed functional variables (called functional principal components) are derived as the eigenfunctions obtained from the sample covariance operator. Using a spline representation of functional data enables a simplification of this procedure since the matrix of spline coefficients serves as the discrete representation of the data with respect to the given spline basis [25]. Analogously, the eigenfunctions of the sample covariance operator can be represented equally as the linear combinations of the basis functions. For orthogonal spline basis, the vectors of coefficients correspond to the eigenvectors of the sample covariance matrix built on the data matrix of coefficients. Thus, using orthogonal basis such as ZB-splinets directly for the representation of the transformed PDFs reduces the problem to well-known standard multivariate PCA [11] of the coefficient matrix without any additional corrections and computations directly using ZB-splinet coefficients are possible [10,22].

In the multivariate case, a modification of PCA, called sparse principal component analysis, enables further simplification of the data structure. This approach alters the eigenvectors of the covariance matrix in such a way that the low absolute values are forced to 0 via a penalized regression using the elastic net penalty [8]. Essentially, the choice of sparsity parameter indirectly determines the sparsity of the resulting principal components (i.e. the number of zero-valued elements in individual eigenvectors), which helps with the search for the true patterns. This can be seen as the trade-off between the simplicity of the component structure and the amount of explained variability maintained, which naturally decreases with higher values of the sparsity parameter. While the functional counterpart of sparse PCA in the classical sense aims to produce functional principal components which are only nonzero in subregions [21], we focus here instead on simplifying the linear combination of the basis functions, where ZB-splinets can profit naturally from their local domain. Considering the data matrix of spline coefficients, this approach can be extended again even for the case of functional data such as PDFs.

4.2. Demographic example

In this section, different basis selections are showcased on an empirical demographic example through functional principal component analysis, where an orthogonal basis is beneficial and enables to reduce the FPCA problem to a standard multivariate PCA of a corresponding coefficient matrix. The aim is to perform a comparison of different orthogonalization approaches applied on the ZB-spline basis rather than a detailed statistical analysis and interpretation of its results as done in [10] for the same dataset. The original dataset describing population age distributions in Upper Austria was obtained as pre-aggregated set in the form of histograms representing the relative frequency of men and women in 19 time intervals with centers xi, xi=2+5(i1), i=1,,19, living in 57 municipalities. For illustration, two observations corresponding to one selected municipality are depicted in Figure 7 (right), where the red and blue symbols represent relative frequencies for women and men, respectively. The histogram data were transformed with discrete clr transformation and then smoothed by a quadratic smoothing spline (k = 2) described in Section 2.6. The smoothing was performed with respect to the first derivative of the spline, i.e. l = 1, the parameter α was fixed to α=0.5 and a sequence of nine equidistant knots on [a,b]=[0,95]: Δλ={λi}i=08, λi=95i/8 was used.

Figure 7.

Figure 7.

Example of two approximated functional observations in L02 (left) and their B2 counterparts (right).

The resulting smoothing splines in L02[0,95] for the clr transformation of the two selected instances are depicted in Figure 7 (left). Using inverse clr transformation (see [32] for more details) the smoothing splines in the Bayes space are obtained, see Figure 7 (right). The corresponding smoothing splines for the whole dataset consisting of 114 observations are shown in Figure 8.

Figure 8.

Figure 8.

Age distribution dataset – spline representation of the data in L02 (left) and their B2 counterparts (right).

4.2.1. Spline representation of the main principal component in SFPCA

To investigate the proposed approach for constructing the orthogonal basis within the real example setting, the impact of different factors was examined. At first, the emphasis was given on the type of orthogonalization approach for ZB-spline basis and the effect of the individual basis functions on the principal functional components in SFPCA. Orthogonal spline basis obtained through the symmetric two-sided and both one-sided Gram-Schmidt orthogonalizations and ZB-splinets dyadic orthogonalization were all considered here. Furthermore, two equidistant sequences of knots of different lengths were chosen for further comparison:

Δλ1={λi1}i=08,whereλi1=95i/8,i=0,,8,Δλ2={λi2}i=020,whereλi2=95i/20,i=0,,20.

The first principal component and the behaviour of all eigenvalues are shown in Figure 9 for both Δλ1 and Δλ2. The left figure confirms that for both choices of knots, the shape of the first functional principal component stays consistent. Note that the principal components do not depend on the type of (orthogonal) basis, only on the chosen sequence of knots. The screeplot of eigenvalues (Figure 9, right) suggests the dominance of the first component as it is responsible for 56.8% and 48.0% of the original variability contained in the data (where the explained variability is obtained as the ratio of the first eigenvalue to the sum of all eigenvalues) for Δλ1 and Δλ2, respectively. Moreover, the shape of the first functional principal components also corresponds to the natural expectation about the main source of variability in the data, leading to a quite straightforward interpretation. Due to the densities being measured separately for men and women, one can expect the main mode of variability to be present in the higher age region while capturing both differences between men and women life expectancy and variability in life expectancy across regions. This is confirmed by the larger deviation of the eigenfunction from zero in the age region from ca 75 to 95.

Figure 9.

Figure 9.

The first principal component (left) and explained variability of eigenvalues (right) obtained through SFPCA for Δλ1 (green), Δλ2 (black).

Regardless of the type of orthogonal basis, we always get the same smoothing spline corresponding to the main principal component (eigenfunction), because it is determined uniquely. However, for the computations of its coefficients o from (5) we need to enumerate the matrix Nkl and the collocation matrix Ok+1(x). Here the effect of the chosen basis is already visible in the numbers of nonzero elements that are given in Table 4. The computational efficiency of the ZB-splinet approach is noticeable for the second choice of knots Δλ2 with nearly half of the nonzero elements of matrix Nkl and considerable lower number of nonzero elements of matrix Ok+1(x) compared to the one-sided GS approaches. ZB-splinet approach is comparable with two-sided GS approach for Δλ1, but better in efficiency for Δλ2. The sparsity patterns of the considered matrices for the choice of knots Δλ2 are plotted in Figure 10 for all four orthogonalization approaches.

Table 4.

Number of nonzero elements of matrices Nkl, Ok+1(x) for both choices of knots Δλ1, Δλ2.

  Nkl Ok+1(x)
approach Δλ1 Δλ2 Δλ1 Δλ2
one-sided GS from left to right 81 441 122 238
one-sided GS from right to left 81 441 121 234
two-sided GS 63 279 100 162
ZB-splinet 63 243 114 170
Figure 10.

Figure 10.

Nonzero elements of matrices Nkl (first row), Ok+1(x) (second row) for the choice of knots Δλ2.

4.2.2. Importance of basis functions in SFPCA for different orthogonalization approaches

The focus of this subsection is on the contribution of individual basis functions constructed through different orthogonal approaches to the final form of the first principal component in SFPCA. The importance of basis functions corresponds to the absolute value of their respective coefficient, an element of the eigenvector of the sample covariance matrix of spline coefficients. Naturally, the higher the absolute value of the coefficient, the larger the effect of the corresponding basis function on the final estimate of the functional principal component.

The idea of this exploratory procedure is to reduce the number of basis functions used in the expansion and remove those that do not contribute significantly to the first principal component. The rule to select a basis function for the approximation is set based on the absolute value of its coefficients to be greater than 0.1 (which is a default choice, e.g. when working with package stats in statistical software R [24]). We mark the selected basis functions as active with respect to the first principal component. The number of active basis functions indicates how well the spline basis is able to characterize the eigenfunction. The lower number of active basis functions suggests that the basis is well-suited for the expansion and can capture the source of data variability by fewer elements.

The active basis functions for different orthogonal approaches are displayed in Figure 11 for both choices of knots Δλ1 (left) and Δλ2 (right). The corresponding numbers of active basis functions for each approach are displayed in Table 5. In all cases, some inactive basis functions were present as the total number of basis functions is 9 for Δλ1 and 21 for Δλ2. Since the goal is to decrease the number of active basis functions, the one-sided Gram-Schmidt orthogonalization from right to left is not convenient as the number of used active basis functions is the highest, leading to a more complex structure. The difference among the other orthogonalization approaches is more evident for the sequence of more knots where the spline basis is composed of more elements. For Δλ2, the number of active basis functions decreases from 21 to 7 for the one-sided Gram-Schmidt approach from left to right and for ZB-splinets.

Figure 11.

Figure 11.

The functional bases used for SFPCA with Δλ1 (left) and Δλ2 (right). The first row corresponds to one-sided GS from left to right, the second to one-sided GS from right to left, the third to two-sided GS and the last one to ZB-splinets.

Table 5.

Summary of the number of active basis functions for the first principal component for Δλ1, Δλ2.

approach Δλ1 Δλ2
one-sided GS from left to right 4 7
one-sided GS from right to left 8 18
two-sided GS 5 10
ZB-splinet 5 7

In order to explain the dominance of the one-sided Gram-Schmidt approach from left to right over its opposite counterpart from right to left, we can argue as follows. The observed concentration of variability in the higher age group implies a need for more significant basis functions carrying this local information, i.e. functions having large variations in the higher age group interval while having small variations in the lower age group interval. Since this is exactly the case for the one-sided Gram-Schmidt approach from left to right, this orthogonal basis is well-suited for the considered dataset. The opposite would be true when the variability in the data would be concentrated in the lower age group. Therefore, a versatile orthogonalization approach for a general case, such as two-sided Gram-Schmidt or ZB-splinet approach, is required. When comparing the two-sided Gram-Schmidt orthogonalization approach and ZB-splinets, there is not a big difference in the number of active basis functions. However, the ZB-splinet approach gains in efficiency due to the local support of the corresponding basis functions. Moreover, the active basis functions for ZB-splinet approach follow the variability in the higher age group interval. Only those are selected that have support in this interval, as observed from the dyadic structure in Figure 12.

Figure 12.

Figure 12.

Dyadic structure of active basis functions for ZB-splinets constructed on Δλ1 (left), Δλ2 (right).

For comparison of the resulting active basis expansions and also to admit possible limitations of our approach, we provide the corresponding approximations of the first principal component, see Figure 13. Clearly, the approximation power of the reduced expansion declines with a decreasing number of active basis functions. Therefore, we cannot expect to profit from the significant reduction and improve the accuracy of the approximation simultaneously. However, the relative errors measured in L2 norm remain within a reasonable range for all orthogonalization approaches, see Table 6. Although the relative error for the ZB-splinet approach is the highest observed, we must note that it mainly corresponds to the low number of active basis functions; hereby, the one-sided GS from left to right clearly benefits from specific data structure, which would however hardly be so in the general case. The latter one-sided GS approach from right to left suffers from overparametrization (cf. Table 5). Thus, the real competitor here is two-sided GS, which better approximates the first principal component at the cost of a higher number of active basis functions.

Figure 13.

Figure 13.

Approximation of the main principal component by one-sided GS from left to right (top left), one-sided GS from right to left (top right), two-sided GS (bottom left) and ZB-splinets (bottom right) for Δλ1 (green) and Δλ2 (black).

Table 6.

Relative errors of approximated first principal component (Figure 13).

approach Δλ1 Δλ2
one-sided GS from left to right 0.0542 0.0982
one-sided GS from right to left 0.0818 0.1158
two-sided GS 0.1001 0.0854
ZB-splinets 0.1001 0.1424

Rather than analysing the L2-error and its sources, based on the default criterion of active basis functions, we prefer to introduce a more sophisticated tool to reduce the basis expansion and to compare the suitability of each orthogonalization approach in terms of explained variability in the next section.

4.2.3. Effect of sparsity on explained variability

Comparison of functions, resulting from the aggregation of selected (active) basis functions, gives us a first insight into the quality of approximation of the first functional principal component. However, their effect can be explored in a different sense by using sparse principal component analysis (SPCA) [8], applied in the case of an orthogonal spline basis simply to spline coefficients. Accordingly, the principal components are estimated using a lower number of basis functions by suppressing the redundant ones. These components then can be, in the above sense, considered as sparse (or reduced) principal components (sparse FPCA is actually defined a bit differently [21]). Generally, the higher the sparsity parameter in SPCA, the lower number of active basis functions is used. The price to pay for the simpler structure of the principal component is the natural decline of the explained variability of the given component. Here, SPCA is used as a more sophisticated tool to show the significance of the effect of the basis functions on the principal components, resp. how well the spline basis is able to capture the specific functional data structure. To see the effect of using a sparse approach, the results were investigated for different values of sparsity parameter on the interval [0,1]. The evolvement of the number of active basis functions is depicted in Figure 14. From this, one can see a relatively fast decline in the number of active basis functions for all approaches, meaning that the full structure of the first functional principal component (FPC), i.e. eigenfunction, is soon replaced by its estimate using a significantly lower number of basis functions. While for Δλ1, the one-sided Gram-Schmidt approach from left to right holds up the longest with at least one active basis function (i.e. before the method breaks down) compared to the other approaches, for Δλ2 both ZB-splinets and one-sided Gram-Schmidt from left to right offer comparable results as the most stable approaches in the sense of resistance to sparsity.

Figure 14.

Figure 14.

The number of active basis functions forming the first principal component for Δλ1(left) and Δλ2(right) with respect to different values of sparsity parameter depicted for different orthogonalization approaches.

Figure 15 offers further insight and comparison of the methods as it shows the proportion of the overall variability, explained by the first sparse principal component, depending on the choice of the sparsity parameter. Here, the sparsity parameter is restricted to the values where all approaches lead to a nonzero first principal component. It is interesting to see that, while for the first FPC one-sided Gram-Schmidt orthogonalization from left to right dominates significantly with its maintained variability for Δλ1, both two-sided Gram-Schmidt and especially ZB-splinets offer comparable results for Δλ2. We note that the dominance of one-sided GS from left to right, especially for the choice of Δλ1, is not surprising since the nature of original data corresponds to the nature of this orthogonalization approach as observed also earlier. However, the ZB-splinet approach which gives similar results, is a more versatile tool for general data.

Figure 15.

Figure 15.

Fraction of explained variability by the first principal component for Δλ1(left) and Δλ2(right) depicted for different orthogonalization approaches.

4.3. Geological example

In order to present a diverse application, we compare the different orthogonalization approaches on a geological example. The data set contains the particle size distribution of 96 soil samples in the Moravia region, Czechia. The location of sampling, Brodek u Přerova, represents an area with a relatively flat topography and a prominent loess deposition, resulting in relatively high proportion of silt and sand [27]. In comparison to the example in previous section, the number of data points is higher than the number of observations available: each observation contains 102 values of proportions of the given particle size over the sample, where the particle sizes are determined by the measuring device and distributed over the interval 0.08,1998μm. In further analysis, the particle sizes are processed in a logarithmic scale as it enables for better visualisation of the behaviour of the curves.

The soil samples were smoothed using quadratic splines, k = 2 with respect to the first derivative of the spline, i.e. l = 1, the parameter α was fixed to α=0.5 and the sequence of twenty-one equidistant knots on [a,b]=[2.5,7.6]: Δλ={λi}i=020, λi=2.5+10.1i/20 was used. Figure 16 shows the obtained smoothed particle size distributions corresponding to individual samples. Figure 17 illustrates one particular sample and the respective smoothing spline. The representation of the data set in the L02 space allows to recognize patterns with rather small absolute differences that are difficult to distinguish in the B2 space (especially in the finest and coarsest grain sizes).

Figure 16.

Figure 16.

Particle size distribution – spline representation of the data in L02 (left) and their B2 counterparts (right).

Figure 17.

Figure 17.

Particle size distribution – smoothing spline for one data object in L02 (left) and its B2 counterparts (right).

Within this application, we focus mainly on the computational efficiency of different orthogonalization approaches for the first principal functional component representation. For the comparison, three equidistant sequences of knots of different lengths were considered:

Δλ1={λi1}i=08,whereλi1=2.5+10.1i/8,i=0,,8,Δλ2={λi2}i=020,whereλi2=2.5+10.1i/20,i=0,,20,Δλ3={λi3}i=044,whereλi3=2.5+10.1i/44,i=0,,44.

The first principal component and the explained variability of all eigenvalues are depicted in Figure 18 for the three choices of knots. Comparing the two application examples, we see that here the proportion of explained data variability, shown in Figure 18 (right), is distributed more evenly over several functional principal components, meaning that the effect of the first component is not as prevalent. However, for consistency with the previous application, we will look at the first principal component, which reflects well the main mode of variability of particle size distributions. Specifically, for all knot choices, we see that its mode corresponds to the significant change in the particle size distribution approximately over the interval [2,6], i.e. in the silt-size and sand-size region.

Figure 18.

Figure 18.

The first principal component (left) and explained variability of eigenvalues (right) obtained through SFPCA for Δλ1 (green), Δλ2 (black) and Δλ3 (blue).

In order to compare different orthogonalization techniques, we start with the computational efficiency of the corresponding spline basis to obtain a smoothing spline through the complete basis expansion with orthogonal g + k basis functions. We measure the computational complexity of orthogonalization approaches in terms of the number of nonzero elements of matrices Nkl and Ok+1(x) needed for construction of the corresponding smoothing spline. The computational cost is compared in Table 7 for all the choices of Δλ and for the symmetric two-sided, both one-sided Gram-Schmidt orthogonalizations and ZB-splinets dyadic orthogonalization approach. The results correspond to the theoretical findings. The ZB-splinet approach shows significant savings in computations, especially for the finest sequence of knots Δλ3 where the number of nonzero elements is reduced substantially compared to the other approaches.

Table 7.

Number of nonzero elements of matrices Nkl, Ok+1(x) for all three choices of knots.

  Nkl Ok+1(x)
approach Δλ1 Δλ2 Δλ3 Δλ1 Δλ2 Δλ3
one-sided GS from left to right 81 441 2025 650 1272 2499
one-sided GS from right to left 81 441 2025 650 1268 2494
two-sided GS 63 279 1143 536 860 1747
ZB-splinet 63 243 747 612 918 1266

Beside the computational efficiency, the accuracy of truncated basis expansion is an important question. Therefore, we would like to show that the efficiency is not compensated for the low approximation quality when only active basis functions are used. Here, the threshold for active basis functions is that their coefficients in absolute value are greater than 0.1 (again, the same as for the demographic application). The number of active basis functions for representation of the first functional principal component for different orthogonalization approaches is summarized in Table 8, the corresponding truncated approximations are depicted in Figure 19 and their relative errors are presented in Table 9. We observe that the reduction in the number of active basis functions is noticeable in all the cases. The two-sided GS approach and the one-sided GS from left to right orthogonalization are slightly better than ZB-splinet, whereas one-sided GS from right to left is not the appropriate choice. We focus on the comparison of the symmetric two-sided GS orthogonalization and the proposed ZB-splinet as these approaches are versatile tools for different datasets. In particular, for the finest knot selection, the number of active basis functions is reduced from 45 to 9 and 10 active basis functions for two-sided GS and ZB-splinet, respectively. The corresponding relative errors are comparable 0.1797 and 0.1647 for two-sided GS and ZB-splinet, respectively. Thus, these two orthogonalization methods compete with each other in terms of accuracy. However, the ZB-splinet approach is much more computationally effective. From this comparison one can see once again that the ZB-splinet approach clearly outperforms its alternatives in terms of computation efficiency while still maintaining a reasonable accuracy.

Table 8.

Summary of the number of active basis functions for the first principal component.

approach Δλ1 Δλ2 Δλ3
one-sided GS from left to right 3 4 8
one-sided GS from right to left 7 14 22
two-sided GS 5 6 9
ZB-splinet 5 6 10

Figure 19.

Figure 19.

Approximation of the main principal component by one-sided GS from left to right (top left), one-sided GS from right to left (top right), two-sided GS (bottom left) and ZB-splinets (bottom right) for Δλ1 (red), Δλ2 (blue) and Δλ3 (black).

Table 9.

Relative errors of approximated first principal component (Figure 19).

approach Δλ1 Δλ2 Δλ3
one-sided GS from left to right 0.0651 0.1452 0.1614
one-sided GS from right to left 0.0754 0.1378 0.2901
two-sided GS 0.0396 0.1437 0.1797
ZB-splinets 0.0845 0.1635 0.1647

4.4. Results and discussion

In this application part, four approaches for construction of the orthogonal spline basis were compared from several points of view. In the first application, where we focused rather on structural properties of all approaches for spline approximation in case of concrete empirical PDF data, we concluded that both the ZB-splinets and one-sided Gram-Schmidt from left to right have individual basis functions that better correspond to the first FPC than the other approaches. This means that a lower number of (active) basis functions is needed for a relatively accurate description of the component. This phenomenon is even more prominent in the case with a higher number of knots ( Δλ2). While the left-to-right Gram-Schmidt clearly takes advantage of the shape of the first component, i.e. varied behaviour on the right side of the overall support, ZB-splinets utilize the basis functions with a local support within the same area. As there are more locally defined basis functions with increasing number of knots, it can be expected that the latter approach should prevail even more in such a setting. Furthermore, the existence of locally defined basis functions on the whole domain for functional data ensures that the efficiency of ZB-splinets does not depend on the shape of the (first) principal component. The dominance of the left-to-right Gram-Schmidt and ZB-splinets in this example is further emphasized as with the increasing value of the sparsity parameter both approaches endure the longest without breaking down and therefore keeping a nonzero amount of explained variability within the first FPC. While the other aspects put the two discussed approaches side by side, the indisputable advantage of the ZB-splinets is in its computational efficiency. This aspect was further expanded in the second application with geological data where the sparsity of matrices incorporated in the construction of the smoothing spline representation of PDFs was examined. The optimal spline approximations confirm that the ZB-splinet approach is clearly computationally more efficient while keeping comparable accuracy. This is of utmost importance for statistical processing of large-scale functional data which occur more and more frequently with automated data collection.

5. Conclusions

We have introduced a new orthogonal spline basis for the functional representation of probability density functions. Due to their specific properties of scale invariance and relative scale, probability density functions need a suitable functional representation that respects their relative nature. This is achieved with the Bayes spaces methodology which also enables to convert processing of PDFs to the standard space of square-integrable Lebesgue functions by the centred log-ratio transformation. Since this transformation induces a zero integral constraint, a new ZB-spline basis reflecting this condition was recently developed for a suitable basis expansion of probability density functions. In practical applications such as PCA, the orthogonality of the basis plays a crucial role in the proper function representation and interpretability of the results.

The construction of an orthogonal basis for the representation of probability density functions was addressed in this paper and an effective approach for orthogonalization of ZB-splines was introduced. The proposed ZB-splinet approach has shown to maintain several advantages. Above all, ZB-splinets were proved to be beneficial in terms of

  1. computational efficiency,

  2. locality of spline supports.

In particular, computational efficiency was measured in terms of sparsity of matrices incorporated in the smoothing spline construction. The locality was defined as small relative total support. This enables ZB-splinets to have the potential for flexible adaptation for functions with local characteristics. These aspects of ZB-splinets were compared to one-sided Gram-Schmidt (from left to right and from right to left) and with a two-sided Gram-Schmidt orthogonalization approach. ZB-splinet approach has shown to surpass the others in both criteria: having the lowest number of multiplications for smoothing spline computation and the smallest relative total support. The proposed approach was demonstrated on two empirical examples using functional PCA as one of the statistical tools where orthogonality of the basis is essential. Focusing on the first principal component, the presented results show that ZB-splinets offer comparable results concerning the preservation of data variability while maintaining both advantages of flexibility and computational efficiency.

A program package for constructing ZB-splinets in R is available at https://github.com/HibaNassarDTU/Z-Splinet. To enhance the use of ZB-splinets among broader audience in functional data community, we plan to include statistical tools for advanced data analysis to provide a comprehensive program package.

Appendix. Proofs.

In the appendix we state proofs of the propositions from the Section 3.

Proof of Proposition 3.1.

  1. The first ZB-spline, obtained through orthogonalization using a one-sided Gram-Schmidt approach and defined on ΔΛ with coinciding additional knots, always has a relative support equal to 2/(g+1). The relative support of the subsequent elements in the basis are 3/(g+1),4/(g+1),,1,1,,1, where the entire interval is a support for the last k + 1 ZB-splines orthogonalized with one-sided Gram-Schmidt approach. Then the relative total support size is g/2+k+11/(g+1).

  2. The two-sided orthogonalized ZB-splines can be obtained by applying one-sided orthogonalization from both sides until the midpoint of the interval. This process yields two sets of splines with an individual relative support sequence of 2/(g+1),3/(g+1),,1/2. Additionally, we need to account for the final k + 1 splines in the center, which have a relative support of one. Thus, the total support for the two-sided orthogonalized ZB-splines can be calculated as follows:
    2(12(g+1)(g+12+2)(g12))+(k+1)=g4+k+742g+1.
  3. The relative total support of the ZB-splinets can be determined based on the number of levels N. At each level, the relative total support of the splinet is always equal to k + 1, which results in (k+1)N. Taking into consideration the relation N=log2(g+kk+1+1), the formula for the relative total support follows.

Proof of Proposition 3.2.

  1. In one-sided orthogonalization, the first k + 1 splines require 0,1,,k inner products, respectively. Since the supports of ZB-splines are disjoint, the remaining g−1 splines all require k + 1 inner products. Therefore, the total count of inner products can be calculated as (g1)(k+1)+1++k=(k+1)g+k(k1)/21.

  2. The number of inner products required for two-sided orthogonalization involves the inner products needed for one-sided orthogonalization of splines from both sides of the interval until the midpoint, followed by the orthogonalization of k + 1 splines located in the center of the interval. In one-sided orthogonalization, the first k + 1 splines require k(k+1)/2 inner products. The remaining (g+k)3(k+1)=g2k3 splines need k + 1 inner products for orthogonalization due to the disjointness of supports discussed in (i). The central k + 1 splines are gradually orthogonalized as described in Step 2 of the Algorithm 2 using pairwise symmetric orthogonalization. This means that each of the central splines is orthogonalized with respect to those splines that were already orthogonalized with one-sided approach. Due to the length of their support, the number of inner products at this point is (k+1)(k+2).

    The pairs of central splines are then orthogonalized successively with respect to all pairs of central splines processed in the previous steps, which number is increasing by two. This counts for (k1)(k+1)/2 inner products for an even number or k(k2)/2 inner products for an odd number of central splines, respectively. At the end of each step, the symmetric orthogonalization of the processed pair is performed. The number of central pairs indicates the number of additional inner products that is (k+1)/2 for even and k/2 for odd number of central splines. In the odd case, we need to add k inner products as the last odd spline needs to be orthogonalized with respect to the all other k central splines. Altogether, the orthogonalization of central splines needs
    (k+1)(k+2)+k(k+1)/2=(k+1)(3k+4)/2
    inner products for both even and odd numbers of central splines. Finally, the total number of inner products needed for two-sided orthogonalization is equal to
    k(k+1)+(g2k3)(k+1)+(k+1)(3k+4)/2=(k+1)(2g+k2)/2.
    Thus the number of inner products is the same for one-sided and symmetric two-sided GS orthogonalization.
  3. For a splinet, orthogonalization of each (k+1)-tuplet requires k(k+1)/2 inner products. Thus, the bottom row in the dyadic structure with N rows needs 2N1k(k+1)/2 inner products. There are 2N11 (k+1)-tuplets above the bottom row, and each of them needs to be orthogonalized only with respect to two neighbouring (k+1)-tuplets taken from the top row of the already orthogonalized portion of the dyadic net of splines. Hence, each of these non-orthogonalized (k+1)-tuplets requires 2(k+1)2(2N11) inner products. The total number of the inner products that is required at this step of the recurrence is
    2N1k(k+1)/2+2(k+1)2(2N11)=2N2(k+1)(5k+4)2(k+1)2.
    As a result of each loop, the dimensions of the dyadic structure are reduced by one, see Algorithm 4 in [14]. Thus
    j=1N(2Nj1((k+1)(5k+4))2(k+1)2)==(k+1)(5k+4)2j=1N2j2N(k+1)2=(k+1)(5k+4)2(2N1)2N(k+1)2.
    The final result follows from the relations 2N=g+kk+1+1 and N=log2(g+kk+1+1).

Proof of Proposition 3.3.

The elements of matrix Nkl correspond to inner products of the l-th derivative of orthogonalized basis functions. In contrast to the basis functions, their derivatives are not orthogonal in general. However, basis functions with disjoint supports are orthogonal naturally. The basis functions and their derivatives share the same supports. Thus, we can argue for each orthogonalization approach as follows.

  1. The one-sided Gram-Schmidt orthogonalization produces functions with growing size of their supports that overlap for each pair of the resulting basis. The matrix Nkl is full and contains no zero elements, in general.

  2. In two-sided Gram-Schmidt orthogonalization, the noncentral (g+k)(k+1)=g1 splines are divided into two parts having their supports only on the left half or right half of the entire interval, respectively. Thus these two subsets, each consisting of (g1)/2 splines, have disjoint supports which gives 2((g1)/2)2 zero elements of matrix Nkl. Thus the total number of nonzero elements of matrix Nkl is (g+k)2(g1)2/2.

  3. The construction of ZB-splinets produces a net of functions with mutually disjoint supports. In particular, when the support is divided into two half, there are two sets of ((g+k)(k+1))/2 functions with disjoint supports leading to ((g+k)(k+1))2/2 zero elements of matrix Nkl. In what follows, we divided the support into quarters, eights and so forth. When the support is divided into 2j parts, the are 2j sets of ((g+k)(2j1)(k+1))/2j functions with disjoint supports. Thus, the total number of zero elements is
    j=1N11/2j((g+k)(2j1)(k+1))2.
    For a dyadic net of functions where g=(2N1)(k+1)k, the result follows
    (g+k)2(k+1)2(2N+22N2N+1N2).

Correction Statement

This article has been corrected with minor changes. These changes do not impact the academic content of the article.

Funding Statement

We gratefully acknowledge the support of this research and researchers by the following grants: IGA PrF 2024 006 Mathematical models, IGA PrF 2025 015 Mathematical models, and the Czech Science Foundation grants 22-15684L Generalized relative data and robustness in Bayes spaces and 25-15447S Distributional data analysis for geochemical mapping; This work was supported by Univerzita Palackého v Olomouci.

Disclosure statement

No potential conflict of interest was reported by the author(s).

References

  • 1.Alavi J. and Aminikhah H., Orthogonal cubic spline basis and its applications to a partial integro-differential equation with a weakly singular kernel, Comput. Appl. Math. 40 (2021), Article No. 55. [Google Scholar]
  • 2.Basna R., Nassar H., and Podgórski K., Data driven orthogonal basis selection for functional data analysis, J. Multivar. Anal. 189 (2022), pp. 104868. [Google Scholar]
  • 3.Basna R., Nassar H., and Podgórski K., Splinets–orthogonal splines and FDA for the classification problem, arXiv preprint. (2023), arXiv:2311.17102.
  • 4.Bracewell R.N., The Fourier Transform and Its Applications, McGraw-Hill, New York, 1978. [Google Scholar]
  • 5.De Boor C., A Practical Guide to Splines, Springer-Verlag, New York, 1978. [Google Scholar]
  • 6.Dierckx P., Curve and Surface Fitting with Splines, Monographs on Numerical Analysis, Clarendon Press, Oxford, 1993. [Google Scholar]
  • 7.Egozcue J., Díaz–Barrero J., and Pawlowsky–Glahn V., Hilbert space of probability density functions based on Aitchison geometry, Acta Math. Sin. 22 (2006), pp. 1175–1182. [Google Scholar]
  • 8.Erichson N.B., Zheng P., Manohar K., Brunton S.L., Nathan Kutz J., and Aravkin A.Y., Sparse principal component analysis via variable projection, SIAM J. Appl. Math. 80 (2020), pp. 977–1002. [Google Scholar]
  • 9.Flickner M., Hafner J., Rodriguez E., and Sanz J.L.C., Periodic quasi-orthogonal spline bases and applications to least-squares curve fitting of digital images, IEEE Trans. Image Process. 5 (1996), pp. 71–88. [DOI] [PubMed] [Google Scholar]
  • 10.Hron K., Menafoglio A., Templ M., Hruzová K., and Filzmoser P., Simplicial principal component analysis for density functions in Bayes spaces, Comput. Stat. Data Anal. 94 (2016), pp. 330–350. [Google Scholar]
  • 11.Johnson R. and Wichern D., Applied Multivariate Statistical Analysis, 6th ed., Prentice Hall, Upper Saddle River, 2007. [Google Scholar]
  • 12.Kamada M., Toraichi K., and Mori R., Periodic spline orthonormal bases, J. Approx. Theory 55 (1988), pp. 27–34. [Google Scholar]
  • 13.Kokoszka P. and Reimherr M., Introduction to Functional Data Analysis, CRC Press, Boca Raton, 2017. [Google Scholar]
  • 14.Liu X., Nassar H., and Podgórski K., Dyadic diagonalization of positive definite band matrices and efficient B-spline orthogonalization, J. Comput. Appl. Math. 414 (2022), pp. 114444. [Google Scholar]
  • 15.Liu X., Nassar H., and Podgórski K., Splinets: Functional data analysis using splines and orthogonal spline bases; (2023). R package version 1.5.0; Available at https://CRAN.R-project.org/package=Splinets.
  • 16.Machalová J., Optimal interpolating and optimal smoothing spline, J. Electr. Eng. 53 (2002), pp. 79–82. [Google Scholar]
  • 17.Machalová J., Hron K., and Monti G., Preprocessing of centred logratio transformed density functions using smoothing splines, J. Appl. Stat. 43 (2016), pp. 1419–1435. [Google Scholar]
  • 18.Machalová J., Talská R., Hron K., and Gába A., Compositional splines for representation of density functions, Comput. Stat. 36 (2021), pp. 1031–1064. [Google Scholar]
  • 19.Mason J., Rodriguez G., and Seatzu S., Orthogonal splines based on B-splines – with applications to least squares, smoothing and regularisation problems, Numer. Algorithms 5 (1993), pp. 25–40. [Google Scholar]
  • 20.Nassar H. and Podgórski K., Empirically driven orthonormal bases for functional data analysis, in Numerical Mathematics and Advanced Applications ENUMATH 2019, Lecture Notes in Computational Science and Engineering; Vol. 139; Springer, Egmond aan Zee, the Netherlands, 2021. pp. 1–12.
  • 21.Nie Y. and Cao J., Sparse functional principal component analysis in a new regression framework, Comput. Stat. Data Anal. 152 (2020), pp. 107016. [Google Scholar]
  • 22.Pavlů I., Machalová J., Tolosana-Delgado R., Hron K., and Bachmann K., Principal component analysis for distributions observed by samples in Bayes spaces, Math. Geosci. (2023). Submitted. [Google Scholar]
  • 23.Podgórski K., Splinets–splines through the Taylor expansion, their support sets and orthogonal bases, arXiv preprint. (2021), arXiv:2102.00733.
  • 24.R Core Team , R: A language and environment for statistical computing, Vienna, Austria: R Foundation for Statistical Computing; (2021). Available at: https://www.R-project.org/.
  • 25.Ramsay J. and Silverman B.W., Functional Data Analysis, Springer, New York, 2005. [Google Scholar]
  • 26.Redd A., A comment on the orthogonalization of B-spline basis functions and their derivatives, Stat. Comput. 22 (2012), pp. 251–257. [Google Scholar]
  • 27.Šimíček D., Bábek O., Hron K., Pavlu I., and Kapusta J., Separating provenance and palaeoclimatic signals from geochemistry of loess-paleosol sequences using advance statistical tools: Central European loess belt, Sediment. Geol. 419 (2021), pp. 105907. [Google Scholar]
  • 28.Talská R., Hron K., and Matys Grygar T., Compositional scalar-on-function regression with application to sediment particle size distributions, Math. Geosci. 53 (2021), pp. 1667–1695. [Google Scholar]
  • 29.Talská R., Menafoglio A., Hron K., Egozcue J.J., and Palarea-Albaladejo J., Weighting the domain of probability densities in functional data analysis, Stat 9 (2020), pp. e283. [Google Scholar]
  • 30.Talská R., Menafoglio A., Machalová J., Hron K., and Fišerová E., Compositional regression with functional response, Comput. Stat. Data Anal. 123 (2018), pp. 66–85. [Google Scholar]
  • 31.van den Boogaart K.G., Egozcue J., and Pawlowsky-Glahn V., Bayes linear spaces, Stat. Oper. Res. Trans. 34 (2010), pp. 201–222. [Google Scholar]
  • 32.van den Boogaart K.G., Egozcue J., and Pawlowsky-Glahn V., Hilbert Bayes spaces, Aust. N. Z. J. Stat. 54 (2014), pp. 171–194. [Google Scholar]

Articles from Journal of Applied Statistics are provided here courtesy of Taylor & Francis

RESOURCES