Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2024 Feb 16.
Published in final edited form as: Biometrics. 2023 Mar 13;79(4):3778–3791. doi: 10.1111/biom.13846

A Nonparametric Test of Group Distributional Differences for Hierarchically-Clustered Functional Data

Alexander S Long 1,*, Brian J Reich 1,**, Ana-Maria Staicu 1,***, John Meitzen 2,****
PMCID: PMC10695330  NIHMSID: NIHMS1945190  PMID: 36805970

Summary:

Biological sex and gender are critical variables in biomedical research, but are complicated by the presence of sex-specific natural hormone cycles, such as the estrous cycle in female rodents, typically divided into phases. A common feature of these cycles are fluctuating hormone levels which induce sex differences in many behaviors controlled by the electrophysiology of neurons, such as neuronal membrane potential in response to electrical stimulus, typically summarized using a priori defined metrics. In this paper, we propose a method to test for differences in the electrophysiological properties across estrous cycle phase without first defining a metric of interest. We do this by modeling membrane potential data in the frequency domain as realizations of a bivariate process, also depending on the electrical stimulus, by adopting existing methods for longitudinal functional data. We are then able to extract the main features of the bivariate signals through a set of basis function coefficients. We use these coefficients for testing, adapting methods for multivariate data to account for an induced hierarchical structure that is a product of the experimental design. We illustrate the performance of the proposed approach in simulations and then apply the method to experimental data.

Keywords: Bivariate functional data, Functional data analysis, hierarchically-clustered data, multivariate testing

1. Introduction

Biological sex and gender are critical variables for biomedical research, especially for addressing underserved aspects of women’s health (Tannenbaum et al., 2019; Arnegard et al., 2020; Galea et al., 2020). Complicating this consideration is the presence of sex-specific natural hormone cycles in both females and males, such as the menstrual cycle in female humans and the estrous cycle in female rodents, which can influence experimental outcomes (Proaño et al., 2018; Mamlouk et al., 2020). These cycles can be divided into phases featuring different hormone concentrations. Hormone level fluctuations induce sex differences in many behaviors, including those related to motivation and disorders such as depression and addiction. These behaviors are controlled by the electrophysiology of specific neurons which communicate with each other between designated brain regions via electrical impulses called action potentials. Thus, it is of high research interest to determine if the properties of neurons change throughout the estrous cycle.

The most prominent and widely employed experimental procedure is the whole-cell patch clamp (WHPC), which can analyze how the neuron membrane potential changes with various current voltage injected during a fixed period of time. These electrophysiological properties can be studied in vitro by measuring the membrane potential of a neuron in response to artificial stimulus like an electrical current. Proaño et al. (2018) and Cao et al. (2018) are two examples of these experiments. An example of the observed membrane potential of a neuron from such an experiment is shown in the left most plot in Figure 1; the membrane potential is depicted in response to a constant amount of current being applied to the neuron starting at 0 seconds and ending at 0.6 seconds for an increasing amount of current. Application of higher current results in the membrane potential increasing, and eventually generating action potentials seen in the plot as spikes.

Figure 1.

Figure 1.

Left: The observed membrane potential curves are shown for all currents (0 to +0.14nA) injected for one replicate of one medium spiny neuron from a rat in the diestrus phase of the estrous cycle. Curves corresponding to currents of +0.04nA,+0.09nA, and +0.14nA are shown in black, red, and green, respectively. Right: The log-periodogram of the membrane potential curves shown in the left panel. The colored curves correspond to the membrane potential curves shown on the left in the same color. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

Existing approaches to analyze such data rely heavily on summaries of the rich data produced by WHPC experiments. One example of this is the heavy emphasis on a priori-defined experimental metrics; while they have strong neurological justification, they limit analysis to only those assessed metrics. For example, the observed membrane potential curves can be summarized using features like action potential frequency and amplitude. A one-way ANOVA or Kruskal-Wallis test can be used to test for group differences with an adjustment for multiple comparisons if necessary. Alternatively, principal components analysis (PCA) of these experimental metrics has been used in the analysis of similar membrane potential data (Druckmann et al., 2012; Hernáth et al., 2019). Developing methods that better account for the complex dependence and structure of these data may enhance discovery beyond what is possible using these neurologically relevant metrics.

In this paper, we describe a method to test for differences in the distribution of the membrane potential behavior in response to stimulus between phases of the estrous cycle in rats that does not require defining a priori metrics of interest. We accomplish this by working with the periodogram of the observed membrane potential and viewing it as the realization of a random function observed at a finite grid of timepoints. There has been a considerable amount of development of methods for testing for distributional differences between groups in independent functional data. Testing for equality of the mean function has been proposed by Cuesta-Albertos and Febrero-Bande (2010); Horváth et al. (2013); Zhang and Liang (2014); and Zhang et al. (2019). Testing for equality of the covariance function has been proposed by Fremdt et al. (2013) and Paparoditis and Sapatinas (2016). More generally, Pomann et al. (2016) and Wynne and Duncan (2020) tested for differences in distribution.

A limitation of these testing procedures is that they require independent functional data. In the motivating application, multiple neurons are observed from each rat, and further multiple observations are made on each neuron. This experimental design naturally induces a hierarchical structure on the data, and thus an assumption of independence is not reasonable. Such clustered data are commonly modeled using functional mixed effects models. For example, Di et al. (2009) developed multilevel functional PCA (MFPCA); Li et al. (2015) and Xu et al. (2018) proposed extensions to MFPCA for the analysis of three-level hierarchies. Testing procedures for clustered functional data have been proposed as well. Abramovich and Angelini (2006) and Antoniadis and Sapatinas (2007) consider testing for mean differences in a mixed model framework. Staicu et al. (2015) proposed an L2-norm based testing procedure for the group mean differences in clustered data. Xu et al. (2018) introduced a testing procedure for hierarchically clustered functional data, however only considered tests of the form of a smooth mean function.

An additional complication of this motivating dataset is the importance of the applied electrical stimulus. As Figure 1 shows, the membrane potential curves can vary significantly depending on the applied current. We could incorporate the effect of the stimulus using function-on-scalar regression (Ramsay and Silverman, 2005). Statistical inferences for such models have been studied without (Fan and Zhang, 2000) and with a hierarchical structure (Zhu et al., 2012) as in the motivating dataset. Such models require assumptions about the relationship between the stimulus and the observed membrane potential that may not be supported by the biological processes responsible for these data. Alternatively we estimate this relationship using a set of empirical basis functions to characterize this relationship. Specifically, we consider the membrane potential to be the realization of a latent process that depends on the stimulus and we will use the eigenfunctions of an appropriate covariance matrix to describe the variation of the membrane potential associated with the stimulus. The membrane potentials are observed densely in time, but for comparatively few unique levels of current. We note the similarity to longitudinal functional data; rather than functional observations being made at several times for each subject, we make functional observations at several levels of applied current. Thus, we utilize existing methods for longitudinal functional data (Chen and Müller, 2012; Park and Staicu, 2015; Chen et al., 2017).

We present the membrane potential variation using the same data driven basis with coefficients that depend on the applied current. The basis coefficients are recovered as the inner product between the response function and the estimated basis functions. As a result they will preserve the dependence of the response profiles. We utilize methods from longitudinal functional data analysis to estimate the basis system. A multivariate testing procedure is then applied to the coefficients of the basis function expansion. To account for the known hierarchical structure in the data, we approximate the null distribution of the test statistic by bootstrapping over independent observational units.

2. Data description

The dataset that motivates this work is from an experiment employing WHPC technique to assess the electrophysiological properties of medium spiny neurons in the acute brain slice preparation of the nucleus accumbens core of adult female rats [see Proaño et al. (2018) for details]. The overall goal of this experiment was to test the hypothesis that these properties change across phases of the estrous cycle. We focus on data generated when the recorded neurons were injected with excitatory current for 0.6 seconds and the membrane potential was measured for the duration. The time series of the measured membrane potentials while current was being injected was observed. The amount of current injected started at 0nA, to provide a reading of the baseline resting membrane potential of the neuron, and the current was then increased until there was an observed decrease in the number of action potentials as measured by the scientist performing the experiment.

The experiment included 26 rats, observed across 3 phases of the estrous cycle: diestrus (11), proestrus (8), and estrus (7). From each rat, 1-4 neurons are collected and additionally there are multiple replications per neuron. Thus, the data have a natural nested hierarchical structure: (i) estrous cycle phase, (ii) rat, (iii) neurons, and (iv) replicates. For a single replicate within a neuron, the increase in current, typically +0.01nA, continues until an observed decrease in action potential frequency, indicating the limit of the physiological range of the neuron’s response properties. All neurons had at least seven different, non-zero currents injected with over half of the neurons receiving at least 18 different currents.

An example of the data collected from a single replication of a neuron from a rat in the diestrus phase is provided in Figure 1. The left plot shows the membrane potential response to all levels of current injected into the neuron. During current injection, there is an increase in the membrane potential until it plateaus, typical of medium spiny neurons but not all neuron types. After 0.6 seconds, the current injection stops and the membrane potential returns to the resting membrane potential. If sufficient current is applied, causing a large enough depolarization in membrane potential, action potentials can be generated, seen as rapid spikes in the membrane potential. Once an initial action potential is generated for fixed level of current, in this neuron type, they typically repeat at an approximately constant frequency while current is being applied.

Due to data having both smooth and spiky features across varying currents, it is reasonable to represent the data in the frequency domain. We use a Fourier transform to decompose the current-specific curves into their constituent frequencies and estimate the spectral density for each curve. Before taking the Fourier transform, all the current-specific curves measured on a fixed neuron and replicate are truncated to focus on a time-interval that has scientific interpretation (see vertical lines in Figure 1). By restricting to this region, the process is appropriately stationary in which case the Fourier transformation retains all the information in the original data. To assess for sensitivity of the results to the selection of this truncation point, multiple points were considered and had minimal impact on the subsequent results.

On the right panel in Figure 1, we show the periodogram of each current-specific membrane potential curve for a single replicate from a single neuron, on a log scale. When the current injected to the neuron is such that no action potentials are generated, the log-periodogram has a spike at a frequency of 0 with very small values at all other frequencies. With increasing current causing a larger depolarizing change in the membrane potential, and eventually causing the generation of action potentials, the value of the log-periodogram at higher frequencies increased.

As seen in the periodogram plot in Figure 1, the fixed-current spectral profiles look like a noisy realization of smooth monotone decreasing signals. To account for the different current levels, we view the log-periodogram to be a realization of a bivariate function depending on both the frequency and the current applied to the neuron. In Figure 2, the log-periodogram is shown as a bivariate function for three neurons in the diestrus and estrus phases. It appears that the log-periodograms from the diestrus group have higher values at lower currents and low frequencies than those from the estrus group. If there are differences in the electrophysiological properties of the neuron across the phases of the estrous cycle, we expect those to be exhibited by differences in the bivariate log-periodogram.

Figure 2.

Figure 2.

Log-periodogram across current and frequency for three neurons from a single rat in the diestrus (top row) and estrus phase (bottom). This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

3. Statistical framework

3.1. Model framework

Consider the following hierarchical data: for each group g=1,,G, we observe measurements on a number of units r=1,,ng, and for each subunit i=1,,ngr within a unit the observed data are tgri,k,ugri,,Ygri,k(k,)i, where k=1,,Kgri and =1,,Lgri. We assume that Ygri,k is an evaluation of a bivariate function observed with noise at tgri,k,ugri,; in other words let Xgri(,):𝒯×𝒰R, such that Ygri,k=Xgritgri,k,ugri,+ϵgri,kl, where ϵgri,kl is measurement error. We assume Kgri is large and the grid of points tgri,k:k is fine in 𝒯 and consider the case when Lgri is small for each i, but ugri,:,i,r,g is dense in 𝒰. In our data application, g denotes estrous cycle phase, r denotes rat, i denotes replicate within rat, and Ygri,k is the log-periodogram at frequency tk and current u; we do not explicitly account for the neuron level to simplify notation. Regarding current, we directly use a normalized current based on the maximum current for each observation and take 𝒰=[0,1]. By an abuse of notation, we assume that Xgri(,)=dXg(,) for all r,i, where the notation =d denotes that the random quantities have the same distribution. This is justified because all neurons belong to the same region of the brain and information such as relative location of neurons is lost due to the collection methods. Our objective is to develop a testing procedure to formally assess

H0:X1(,)=d=dXG(,), (1)

versus the alternative that Xg(,)dXg(,) for some gg=1,,G.

Testing the equality of a group of curves is not new; for example Staicu et al. (2014); Pomann et al. (2016); Zhang et al. (2019) study this problem for independent and/or univariate curves. In our situation, the bivariate structure of the curves, with mixed dense/sparse sampling design, along with the complex hierarchical dependence increase the challenge.

We propose to model Xgri(t,u)=μ(t,u)+Vgri(t,u), where μ(t,u) is the overall mean function and Vgri(t,u) is the subunit deviation. Let ϕp()p1 be an orthonormal basis in L2(𝒯) and represent the deviations as Vgri(t,u)=p=1ξgri,p(u)ϕp(t) where the ξgri,p(u)=𝒯Vgri(t,u)ϕp(t)dt are the corresponding basis coefficients that have mean zero and are uncorrelated across g, r, and p. As in Chen and Müller (2012), Park and Staicu (2015), and Chen et al. (2017), we then propose a similar decomposition of the ξgri,p(u)’s. That is, ξgri,p(u)=q=1ζgri,pqψpq(u) where ψpq()q1 is an orthonormal basis in L2(𝒰) and the ζgri,pq’s are the corresponding basis coefficients that are mean zero. Thus, combining all components, we obtain Vgri(t,u)=p=1q=1ζgri,pqϕp(t)ψpq(u).

There are typically two possible ways to select the basis system for the above representation. One option is to use a pre-specified set of basis functions. We pursue a different option: we select ϕp()p1 to be the eigenbasis of the marginal covariance function Σ𝒯t,t=𝒰Σt,t,u,uf(u)du, where f() is a the sampling density of u; similar to Park and Staicu (2015). We also select ϕp,q()q1 to be the eigenbasis of the covariance of the coefficients of the initial decomposition, Σ𝒰,pu,u=Covξgri,p(u),ξgri,pu. This representation allows us to explain the variation in the bivariate functional data by sets of eigenfunctions for each argument, t and u, separately. Furthermore, this framework allows us to extract the main features of the bivariate signals through the set of basis function coefficients. This approach has recently been considered by Scheffler et al. (2018).

In practice, we truncate the infinite basis functions; let P and Q1,,QP denote the truncation for the bases ϕp() and ψp,q(), respectively. It follows that the vector ζgri=ζgri,1T,,ζgri,PTT, where ζgri,p=ζgri,p1,,ζgri,pQpT, represents a feature extraction of the bivariate signal, Xgri(,). We thus reduce the testing the null hypothesis (1) to the hypothesis that the distribution of the ζgri is not varying across the groups. That is, assume ζgrifg where fg is any probability distribution with sample space Rp=1PQp that depends on the group, g; the null hypothesis of interest is reduced to

H0:f1==fG. (2)

In this regard, we consider testing procedures from the multivariate statistics literature; to account for the hierarchical dependence in the data we propose a bootstrap-based null distribution approximation.

In the next section we discuss estimation of the model components, including selection of the number of basis functions and estimation of the basis function coefficients. In Section 4 we describe the testing procedure.

3.2. Estimation

The roadmap of the estimation procedure is: first, estimate the marginal mean function. Using the centered data, we then estimate the marginal covariance function Σ𝒯t,t and its eigencomponents. The coefficients of this initial decomposition are then used to estimate the marginal covariance functions, Σ𝒰,pu,u, and their eigencomponents. We utilize existing methods for the estimation of all model components; additional details of these methods are provided in the supporting information.

We estimate the marginal mean function μ(t,u) by using the bivariate sandwich smoother (Xiao et al., 2013) and a working independence assumption. In the numerical investigation we use the sandwich smoother constructed using cubic B-spline basis functions for t and u and select the tuning parameters by generalized cross validation (GCV).

Let Y~gri,k=Ygri,kμ^tk,u be the demeaned data. We use the demeaned data to first estimate the marginal covariance function Σ𝒯t,t. To estimate this covariance function, and subsequently the eigenfunctions of this covariance, we use the FACE estimator (Xiao et al., 2016) which is a smoothing of the traditional sample covariance,

S(t,t)=g=1Gr=1ngi=1ngr=1LgriY˜gri(t,u)Y˜gri(t,u)/(g=1Gr=1ngi=1ngrLgri).

This estimator is a special case of the sandwich smoother used to estimate the mean function. As with the sandwich smoother, this method depends on a smoothing parameter that can be selected using GCV. The final estimator is adjusted to be symmetric and positive definite by zeroing the negative eigenvalues. Let ϕ^p(),λ^pp1 be the pairs of estimated eigenfunctions and eigenvalues obtained by spectral decomposition of the smoothed estimate of Σ𝒯(,). The truncation parameter, P, can be determined based on a pre-specified percentage of variance explained (PVE) using the estimated eigenvalues (Di et al., 2009).

Let ξ^gri,pu=Y~grit,uϕ^p(t)dt be the estimated coefficient of the pth eigenfunction for the th current applied to the ith neuron in the gth group and rth rat; ξ^gri,pu can be approximated well via numerical integration because tgri,k:k is dense in 𝒯. We use these estimated coefficients separately for each p to estimate Σ𝒰,pu,u and its eigencomponents. As the data need not be observed on a regular grid in 𝒰 as in 𝒯, we estimate Σ𝒰,pu,u and its eigencomponents using methods for sparse functional data (Yao et al., 2005) instead of using the same method for estimating Σ𝒯t,t. This approach also obtains a smoothed estimate of Σ𝒰,pu,u by smoothing the raw covariances, here calculated as Sp,griu,u=ξ^gri,puξ^gri,pu. Let ψ^pq(),γ^pqq1 be the pairs of estimated eigenfunctions and eigenvalues obtained by spectral decomposition of the smoothed estimate of Σ𝒰,p(,). As when choosing P, the truncation parameters, Qp, can be determined using PVE. Upon estimation of the eigenfunctions of Σ𝒰,p(,), the scores, ζgri,pq, can be estimated using a mixed model framework as described in Yao et al. (2005).

4. Testing procedure

In this section, we describe the testing procedure. Recall the null hypothesis of interest (1) is simplified to the null hypothesis that the vector of basis function coefficients have the same distributions across groups; see null hypothesis (2). When considering the alternative hypothesis, while we emphasized modeling the marginal mean and covariance functions (Section 3.2), we make no restrictions on how the distributions may differ between groups.

To test this hypothesis, k-sample multivariate testing procedures can be used. Examples of such testing procedures are Bathke et al. (2008) and Heller et al. (2013). We use the Heller-Heller-Gorfine (HHG) test (Heller et al., 2013) because of its minimal assumptions and sensitivity to many forms of deviations from the null hypothesis; we do note that many other multivariate tests can be used similarly depending on the objectives of the analysis. This test statistic is based on all pairwise norm differences of the data. We describe this test as if the vector of basis coefficients were known; in practice we replace the basis coefficients by their estimates obtained as described in Section 3.2. Consider a fixed pair of observations, indexed by (1) g,r and i and (2) g,r, and i, with ii; denote the norm difference between the estimated coefficients for these two observations, R0=ζgriζgri. This value R0 depends on the indices g,g,r,r,i,i, but we suppress this dependence until the end for notational simplicity. Using R0, we can create and summarize a 2×G contingency table using the remaining data as follows. For g*=1,,G, let

A1g*=g=1Gr=1ngi=1ngrI(ζgriζgri>R0)I(g=g*)and
A2g*=g=1Gr=1ngi=1ngrI(ζgriζgriR0)I(g=g*).

Additionally, denote by Ai*. and A.g* the row and column sums. Lastly, denote by Tgri;gri the Pearson’s score for this partition; that is,

T(gri;gri)=i*=12g*=1G(Ai*g*Ei*g*)2Ei*g*whereEi*g*=Ai*Ag*g=1Gr=1ngngr

The overall test statistic for the sample can be found by summing over all pairs; that is, THHG=g,g=1Gr,r=1ngi,i=1;iingrTgri;gri.

Heller et al. (2013) developed the null distribution, and considered an approximation based on random permutations of the group assignments, of the classical HHG test under the assumption that the observations within a group are independent and identically distributed. This assumption does not hold in our case, where recall we only assume independence of Xgri(,) over r; applying the testing procedure while ignoring the dependence results in an inflated type I error. We propose a bootstrap-based approach to approximate the null distribution of the HHG test by modifying the permutation procedure to account for the hierarchical structure of the data; see Algorithm 1. For each permutation iteration, the observed data are resampled with replacement by unit-level identifier. We briefly comment on step 5 of Algorithm 1; re-estimation of model components with each iteration is computationally burdensome, although is necessary to prevent the results of the test from being conditional on the estimated eigenfunctions and better accounts for the uncertainty of that estimation step. The test statistic calculated using the observed data is then compared to the distribution of test statistics after resampling; a p-value is estimated by the sample proportion of observing a value of the test statistic as large or larger in the bootstrap set of statistics.

Algorithm 1.

Resampling of the unit level data

1: for b{1,,B} do
2:   Re-sample the group-unit index pairs from {(1,1),,(1,n1),,(G,nG)} with replacement. Let R(b) be the resulting sample
3:   Define the bth bootstrap data by:
    data(b)=[{Ygri,k}(k,):(g,r)R(b),i=1,,ngr]
4:   Reassign the group indices by unit, so that g=1 for the first n1 units, g=2 for the next n2 units, and so on. Re-define the bth bootstrap data accordingly.
5:   Using data(b), estimate the model components and recover the estimated coefficient vectors, ζ^gri(b), as described in Section 3.2
6:   Calculate the HHG test statistic and denote it by THHG(b)
7: end for
8: Calculate the p-value as b=1BI(THHG(b)>THHG)/B

5. Simulation study

In this section, we present simulation studies to illustrate the performance of our proposed approach described in Section 3. We consider two distinct frameworks. First, we generate multivariate data to isolate the performance of the resampling method in a simpler setting. Then we generate functional data to assess the performance of the entire method as presented. For both frameworks, we describe the scenarios used to assess the performance of our method, introduce comparative approaches, and present the results.

5.1. Framework 1: Multivariate data

5.1.1. Generation of multivariate data.

We evaluate the performance of the proposed approach by first generating data such that the null hypothesis of interest, that the distributions of the responses are the same across groups, is true to evaluate the type I error rate; we also generate data under different forms of deviations from the null hypothesis to assess power.

We generate data YgijkRP as

Ygijk,p=αgi,p+βgij,p+ϵgijk,p,p=1,P (3)

where g=1,,3,i=1,,n1,j=1,,n2, and k=1,,3 are indices for the observations that induce the hierarchical structure. The model components are generated according to αgi,pN0,σα,p2,βgij,pN0,σβ,p2, and ϵgijk,pN0,σϵ,p2. Further, αgi,p,βgij,p, and ϵgijk,p are mutually independent and are independent across all indices. The hierarchical structure induced by this model is analogous to the structure of the motivating dataset described in Section 2: g denotes estrous cycle phase, i denotes rat, j denotes neuron, and k denotes replicate. The sample sizes included for the simulations are reflective of the size of the motivating dataset; n1=7,10 and n2=3,5.

For the dimensions of Ygijk, we consider P=2 and 5. When P=2, we borrow from Scenario 1 in Xu et al. (2018) to specify the variances of the components in (3). Thus, we let σα,12,σα,22=(1,0.25),σβ,12,σβ,22=(0.5,0.25), and σϵ,12,σϵ,22=(5,0.5). When considering P=5, we instead let σα,12,,σα,52=(1,0.5,0.33,0.25,0.2) and σβ,p2=σϵ,p2=σα,p2.

To assess power performance, we generate data from three types of alternative hypotheses. First, we consider a shift in the mean: Y~gijk,p=μg+Ygijk,p where Ygijk,p is as in (3), μ1=0,μ2=δ, and μ3=δ, and δ>0 controls the difference in the element-wise mean between groups. Second, we consider a shift in the second moment which we do in two ways. We slightly modify model (3) by generating α1i,pN0,σα,p2+δ. Alternatively, we instead modify model (3) by generating β1ij,pN0,σβ,p2+δ. In the first setting, δ controls the difference in the inter-subject variance between the first group and the remaining two groups whereas in the second setting δ controls the difference in the intra-subject variance. Finally, we consider a shift in the third moment, modifying model (3) by first generating α1i,pχδ2 and α2i,pχδ2 and then standardizing these variables so they are mean 0 with variance σα,p2, that is so the mean and variance are the same across groups. Concurrently, we make analogous changes for βgij,p and ϵgijk,p. Overall for this setting, the coefficients for one group are generated to be positively-skewed, the coefficients for another are to be negatively-skewed, and the coefficients for the final group are to directly follow model (3) and thus are not skewed while δ controls the difference in the skewness between groups.

5.1.2. Competing methods and metrics.

To evaluate the resampling based testing methodology when using multivariate data, we implement the proposed method, denoted by HHG-CB, using the hhg.test.k.sample() function in the R package HHG to calculate the HHG test statistic (Brill and Kaufman, 2019).

As comparative methods, we also consider the classic multivariate analysis of variance (MANOVA), implemented using the manova() function in the R package stats (R Core Team, 2019) with Pillai’s trace statistic for its robustness properties. MANOVA relies on independence across observations, which is obviously violated in this setting. Thus we also consider an extension of the MANOVA by approximating the null distribution using the same resampling based approach used for the primary method (denote by MANOVA-CB). We also borrow from the approach described in Pomann et al. (2016) and use an element-wise Anderson-Darling (AD) test with a Bonferroni adjustment for multiple comparisons; this is a conservative adjustment, and we consider it due to the small number of comparisons. The AD test is implemented using the ad.test() function in the R package kSamples (Scholz and Zhu, 2019). Lastly, we also consider the clustered Wilcoxon rank sum test (CW), separately for each element in the random vector (Datta and Satten, 2005). As with the AD test, we use a Bonferroni adjustment for multiple comparisons. This is implemented using the clusWilcox.test() function in the R package clusrank (Jiang, 2018).

The performance of each method is evaluated using the estimated type I error rates and power, each calculated as the average proportion of rejections of the null hypothesis across Monte Carlo replicates. When assessing type I error rates, we use 5000 Monte Carlo replicates; when assessing power, we use 1000 Monte Carlo replicates. When necessary, we use 1000 replicates to approximate the null distribution by resampling.

5.1.3. Results.

In the interest of space, all tables and figures for this section are provided in the supporting information; the results and interpretation are very similar to those presented in Section 5.2.3 when considering functional data.

5.2. Framework 2: Functional data

5.2.1. Generation of functional data.

We generate hierarchically-clustered functional data according to the model

Ygijk(t,u)=p=12q=1Qpζgijk,pqϕp(t)ψpq(u)+εgijk(t,u), (4)

where Q1=3 and Q2=2 and the indices are as described in the multivariate setting. The vector of coefficients ζgijk=ζgijk,11,ζgijk,12,ζgijk,13,ζgijk,21,ζgijk,22 are generated according to model (3) under the null hypothesis. The functions ϕp(t) and ψpq(u) are taken to be the leading eigenfunctions estimated using the motivating dataset so that the data used in the simulations mimic the data in the motivating dataset. Additionally, εgijk(t,u) is a white noise process, independent of ζgijk, with zero mean and variance equal to σWN2;σWN2 is chosen to correspond to a signal-to-noise ratio (SNR) of 5. We use an equispaced grid of 100 locations for t[0,1] and an equispaced grid of 10 locations for u[0,1]; the domains 𝒯 and 𝒰 are re-scaled after estimating the eigenfunctions for simplicity.

As in the multivariate setting, we also generate functional data when the null hypothesis is not true to assess statistical power. We generate these functional data by modifying the generation of the coefficients as described in Section 5.1.1 for the multivariate setting. We again consider differences in the mean, inter- and intra-subject variance, and skewness.

In this functional data framework, we also consider the performance of the proposed method under a non-additive data generating mechanism. In this setting, we again consider functional data generated using model (4), however, we modify the generative model for ζgijk. In lieu of model (3), we instead generate the coefficients as ζgijk,p=αgi,p+βgij,p+ϵgijk,p+αgi,pβgij,p, where αgi,p,βgij,p, and ϵgijk,p are as defined above. To assess power in this setting, we consider differences in the mean similar to those described above.

5.2.2. Competing methods and implementation.

For the purposes of testing, after estimation of the coefficients ζgijk,pq, we utilize the multivariate testing methods and implement them as described in Section 5.1.2; we again assess the type I error rates and power. We now discuss the implementation of the modeling step to estimate the necessary eigenfunctions and coefficients using functions available in the R package refund (Goldsmith et al., 2018). We first estimate the common bivariate mean function μ(,) using the fbps() function and center the data. Then, we estimate ϕp()p1 and coefficients ξgijk,p(u) using the fpca.face() function. We select the truncation parameter P using a 95% PVE threshold. Finally, we estimate ψpq()q1 and coefficients ζgijk,pq using the fpca.sc() function. The truncation parameters Qp are also selected using a 95% PVE threshold.

5.2.3. Results.

We first consider the simulation results when data are generated so that there are no distributional differences across groups. The estimated type I error rates are shown in Table 1. The estimated type I error rate for the HHG-CB and MANOVA-CB methods are close to the nominal rate, whereas the CW method is much more conservative.

Table 1.

Estimated type I error rates for the methods described in Section 5.2.2 when applied to functional data based on 5000 replicates. Nominal type I error rate is 0.05. Standard errors < 0.004.

n1 n2 HHG-CB MANOVA-CB CW
7 3 0.042 0.058 0.002
10 3 0.042 0.055 0.004

7 5 0.040 0.058 0.003
10 5 0.043 0.055 0.005

We next consider the estimated power of each method; the estimated power curves for each method are shown in Figure 3. In the interest of space, we do not include the estimated power curves for the CW method as the power is significantly lower than with the other methods; see the supporting information for these figures. We start by considering a difference in the mean. Both the HHG-CB and MANOVA-CB approaches perform similarly, although the MANOVA-CB method has moderately higher power. That this approach performs well in this setting is not surprising as MANOVA is designed to detect differences in the mean. We also see from this Figure that increasing the sample size, either by increasing the number of rats or neurons, resulted in an increase in power. It does appear that adding additional rats, and therefore adding observations that are independent from the rest of the data, is of greater benefit than adding additional neurons per rat, which is to be expected.

Figure 3.

Figure 3.

Estimated power curves for the HHG-CB and MANOVA-CB methods for detecting differences in (a) the mean, (b) the variance of the rat-level effect, (c) the variance of the neuron-level effect, and (d) the skewness when applied to functional data. Estimated power curves for both methods for detecting differences in the mean under a non-additive model (e). All estimates based on 1000 replicates. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

The interpretation of the results when considering other forms of group distributional differences are generally similar to those discussed above, with the key exception being that only the proposed approach is shown to detect these differences. Neither of the other two approaches is sensitive to second-order moment differences between groups. When considering differences in skewness, all methods perform poorly, although the proposed approach does exhibit the highest, albeit still small, power. In preliminary simulations, we saw substantial improvement by the proposed method to detect differences in skewness when the data are generated without noise, which is consistent with results seen in the multivariate framework.

6. Analysis of a WHPC Experiment

We now discuss the analysis of the motivating dataset described in Section 2, the objective of which is to test for distributional differences in medium spiny neuron electrophysiological properties between phases of the estrous cycle. First, we look at the estimated mean function for each estrous cycle phase, shown in Figure 4; the estimated bivariate mean functions and the univariate trajectories conditional on different frequencies and currents are displayed. To estimate the group specific mean, we use the sandwich smoother estimate (Xiao et al., 2013) described in Section 3.2 on the data separately by group. Generally, we see that for a fixed current, the mean log-periodogram decreases with increasing frequency. Further, for a fixed phase of the estrous cycle, the rate of decay decreases as current increases. The mean log-periodogram for the three phases are very similar when no current is injected. Differences between the phases become apparent as the amount of current applied increases. The mean log-periodogram for the diestrus phase is noticeably larger than the mean for the other two phases when a low (e.g. +0.05nA) amount of current is applied. This suggests an important difference between phases is the amount of current necessary to generate an action potential. As current increases, the mean log-periodogram increases until it plateaus, the magnitude of which depends on the frequency but not the phase of the estrous cycle. While neurons in the diestrus phase differ from those in the other two phases, neurons in the estrus and proestrus phases of the cycle appear to behave similarly. We note that for the majority of the data, the applied current is less than +0.2nA; this explains why the estimated mean functions are less smooth for higher current. Also, due to the comparatively few observations at currents higher than +0.2nA, it is likely that the estimated mean (and other estimates) have larger standard errors than for lower currents. However, because we are ultimately focused on hypothesis testing rather than estimation, and because the observations are projected onto a common set of estimated eigenfunctions that do not differ by group, we consider the relative uncertainty caused by few observations for high current to not have a meaningful impact on the proposed method or the eventual results.

Figure 4.

Figure 4.

The estimated mean of the log-periodogram of the membrane potential for each phase in the estrous cycle. (Top) The estimated bivariate mean function for the diestrus (left), estrus (center), proestrus (right) phases. (Bottom left) The different phases of the cycle are indicated by line style. The mean trajectories corresponding to different amounts of current are indicated by color. (Bottom right) The estimated mean of the log-periodogram for fixed frequency and changing current are shown for each phase in the estrous cycle. The mean trajectories corresponding to different frequencies are indicated by color. This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

To estimate the eigenfunctions of the marginal covariance function, we first estimate the mean function using the sandwich smoother and then use it to obtain the centered data. We then estimate the eigenfunctions as described in Section 3.2. To select the truncation parameters, P and Qp, we use a 0.95 PVE threshold, separately for each parameter. This choice results in selecting P=2,Q1=4, and Q2=4. The estimates of ϕp()p=1P=2, the eigenfunctions of the marginal covariance function of frequency from the initial decomposition are shown in the left-most panel in Figure 5. The leading eigenfunction indicates a deviation from the mean that is approximately constant across frequency. The second eigenfunction indicates a large positive deviation from the mean at low frequencies and a small negative deviation from the mean at higher frequencies. A large positive coefficient of this second eigenfunction would likely have action potentials occurring with higher frequency than an average observation. The estimates of ψ1q()q=1Q1=4 and ψ2q()q=1Q2=4 are shown in the middle and right-most panel of Figure 5, respectively. The estimate of the leading eigenfunction of the covariance of coefficients of the leading eigenfunction from the initial decomposition, ψ^1,1(), is shown in black in the middle panel of Figure 5. This eigenfunction indicates a large negative deviation from the mean at all frequencies when the current applied to the neuron is low. When the current is high, this eigenfunction indicates little deviation from the mean across all frequencies.

Figure 5.

Figure 5.

(Left) The estimated eigenfunctions in frequency from the initial decomposition of the data; ϕ^1() in black, ϕ^2() in red. (Middle) The estimated eigenfunctions ψ^1q(), for q=1 (black), 2 (red), 3 (green), and 4 (blue). (Right) The estimated eigenfunctions ψ^2q(), identified similar to ψ^1q(). This figure appears in color in the electronic version of this article, and any mention of color refers to that version.

We use the HHG test with the proposed bootstrap based procedure to test for differences in the log-periodogram across phases of the estrous cycle, while accounting for the hierarchical structure of the observed data. As in the simulations, we use 1000 bootstrap samples to estimate the null distribution of the test statistic. The p-value of the proposed testing method is 0.003, indicating that there is a significant difference in the log-periodogram across phase. As a sensitivity analysis, we considered the impact of the number of components selected using PVE on the results; across all considered scenarios, the results are robust to changes in the number of components with the p-value always 0.005 (see supporting information).

Since we detect a significant difference in the log-periodogram between phases of the estrous cycle, we can use the estimated coefficients of the eigenfunctions to explore how the groups differ. We do this by testing each coefficient one at a time. Rather than resampling the functional data to account for the hierarchical structure, we instead resample the coefficients themselves to approximate the null distribution of the test statistic. From this analysis, we see the differences in the log-periodogram across phase of the estrous cycle are explained by different coefficients for the leading eigenfunctions. For example, we see that the coefficient ζ1,1 differs significantly across phase. By examining the distribution of coefficient estimates by estrous cycle phase, we see that observations from the diestrus phase tend to have large negative values for this coefficient. This indicates that neurons in the diestrus phase exhibit above-average log-periodogram values across all frequencies when the current applied is low; this is similar to what was seen from the plots of the mean functions in Figure 4.

Our analysis provides new insights into the understanding of the neurophysiological properties originally described in Proaño et al. (2018). From our new analysis of this dataset, we found that neurons from rats in the diestrus phase required less current to generate an action potential than those in either of the other two phases of the estrous cycle. This was one of the properties of interest in the original analysis of this dataset; our findings are consistent with those described in Proaño et al. (2018). Despite the similar results between the two analyses, with our novel method we did not have to pre-specify this parameter of interest and process the data accordingly.

7. Discussion

In this paper, we propose a testing procedure to detect group distributional differences in hierarchically-clustered functional data. We applied this method to the motivating dataset to show that the electrophysiologial properties of certain neurons in adult female rats differ across phases of the estrous cycle. While the focus of this paper was on this specific application, the proposed method can be applied in other settings in which hierarchically clustered functional data are observed. To that point, we evaluated the performance of the proposed method in various simulations and illustrated the advantages against alternative methods. A limitation of the proposed method is that the resampling method to approximate the null distribution of the test statistic can be computationally intensive, particularly with larger datasets. However, this step can easily be done in parallel to shorten the necessary run-time.

Supplementary Material

suppl

Acknowledgements

The authors thank Dr. Stephanie Proaño who was crucial in generating the motivating dataset.

Footnotes

Supporting Information

Web Appendices, Tables, and Figures referenced in Sections 3-6, the data analyzed in Section 6, and R code for the analyses and simulations described in Sections 5 and 6 are available with this paper at the Biometrics website on Wiley Online Library.

Data Availability Statement

The data that supports the findings in this paper are available in the supporting information section of this paper.

References

  1. Abramovich F and Angelini C (2006). Testing in mixed-effects FANOVA models. Journal of Statistical Planning and Inference 136, 4326–4348. [Google Scholar]
  2. Antoniadis A and Sapatinas T (2007). Estimation and inference in functional mixed-effects models. Computational Statistics & Data Analysis 51, 4793–4813. [Google Scholar]
  3. Arnegard ME, Whitten LA, Hunter C, and Clayton JA (2020). Sex as a Biological Variable: A 5-Year Progress Report and Call to Action. Journal of Women’s Health 29, 858–864. [DOI] [PMC free article] [PubMed] [Google Scholar]
  4. Bathke AC, Harrar SW, and Madden LV (2008). How to compare small multivariate samples using nonparametric tests. Computational Statistics & Data Analysis 52, 4951–4965. [Google Scholar]
  5. Brill B and Kaufman S (2019). HHG: Heller-Heller-Gorfine Tests of Independence and Equality of Distributions. R package version 2.3.2. [Google Scholar]
  6. Cao J, Dorris DM, and Meitzen J (2018). Electrophysiological properties of medium spiny neurons in the nucleus accumbens core of prepubertal male and female Drd1a-tdTomato line 6 BAC transgenic mice. Journal of Neurophysiology 120, 1712–1727. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Chen K, Delicado P, and Müller H-G (2017). Modelling function-valued stochastic processes, with applications to fertility dynamics. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 79, 177–196. [Google Scholar]
  8. Chen K and Müller H-G (2012). Modeling repeated functional observations. Journal of the American Statistical Association 107, 1599–1609. [Google Scholar]
  9. Cuesta-Albertos JA and Febrero-Bande M (2010). A simple multiway ANOVA for functional data. TEST 19, 537–557. [Google Scholar]
  10. Datta S and Satten GA (2005). Rank-Sum Tests for Clustered Data. Journal of the American Statistical Association 100, 908–915. [Google Scholar]
  11. Di C-Z, Crainiceanu CM, Caffo BS, and Punjabi NM (2009). Multilevel functional principal component analysis. The Annals of Applied Statistics 3, 458–488. [DOI] [PMC free article] [PubMed] [Google Scholar]
  12. Druckmann S, Hill S, Schürmann F, Markram H, and Segev I (2012). A Hierarchical Structure of Cortical Interneuron Electrical Diversity Revealed by Automated Statistical Analysis. Cerebral Cortex 23, 2994–3006. [DOI] [PubMed] [Google Scholar]
  13. Fan J and Zhang W (2000). Simultaneous confidence bands and hypothesis testing in varying-coefficient models. Scandinavian Journal of Statistics 27, 715–731. [Google Scholar]
  14. Fremdt S, Steinbach JG, Horváth L, and Kokoszka P (2013). Testing the Equality of Covariance Operators in Functional Samples. Scandinavian Journal of Statistics 40, 138–152. [Google Scholar]
  15. Galea LA, Choleris E, Albert AY, McCarthy MM, and Sohrabji F (2020). The promises and pitfalls of sex difference research. Frontiers in Neuroendocrinology 56, 100817. [DOI] [PMC free article] [PubMed] [Google Scholar]
  16. Goldsmith J, Scheipl F, Huang L, Wrobel J, Gellar J, Harezlak J, McLean MW, Swihart B, Xiao L, Crainiceanu C, and Reiss PT (2018). refund: Regression with Functional Data. R package version 0.1-17. [Google Scholar]
  17. Heller R, Heller Y, and Gorfine M (2013). A consistent multivariate test of association based on ranks of distances. Biometrika 100, 503–510. [Google Scholar]
  18. Hernáth F, Schlett K, and Szücs A (2019). Alternative classifications of neurons based on physiological properties and synaptic responses, a computational study. Scientific Reports 9, 13096. [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Horváth L, Kokoszka P, and Reeder R (2013). Estimation of the mean of functional time series and a two-sample problem. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 75, 103–122. [Google Scholar]
  20. Jiang Y. (2018). clusrank: Wilcoxon Rank Sum Test for Clustered Data. R package version 0.6-2. [Google Scholar]
  21. Li H, Kozey Keadle S, Staudenmayer J, Assaad H, Huang JZ, and Carroll RJ (2015). Methods to assess an exercise intervention trial based on 3-level functional data. Biostatistics 16, 754–771. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Mamlouk GM, Dorris DM, Barrett LR, and Meitzen J (2020). Sex bias and omission in neuroscience research is influenced by research model and journal, but not reported NIH funding. Front Neuroendocrinol 57, 100835. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Paparoditis E and Sapatinas T (2016). Bootstrap-based testing of equality of mean functions or equality of covariance operators for functional data. Biometrika 103, 727–733. [Google Scholar]
  24. Park SY and Staicu A-M (2015). Longitudinal functional data analysis. Stat (International Statistical Institute) 4, 212–226. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Pomann G-M, Staicu A-M, and Ghosh S (2016). A Two Sample Distribution-Free Test for Functional Data with Application to a Diffusion Tensor Imaging Study of Multiple Sclerosis. Journal of the Royal Statistical Society. Series C, Applied statistics 65, 395–414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Proaño SB, Morris HJ, Kunz LM, Dorris DM, and Meitzen J (2018). Estrous cycle-induced sex differences in medium spiny neuron excitatory synaptic transmission and intrinsic excitability in adult rat nucleus accumbens core. Journal of Neurophysiology 120, 1356–1373. [DOI] [PMC free article] [PubMed] [Google Scholar]
  27. R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. [Google Scholar]
  28. Ramsay JO and Silverman BW (2005). Functional Data Analysis. Springer. [Google Scholar]
  29. Scheffler A, Telesca D, Li Q, Sugar CA, Distefano C, Jeste S, and Şentürk D (2018). Hybrid principal components analysis for region-referenced longitudinal functional EEG data. Biostatistics 21, 139–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Scholz F and Zhu A (2019). kSamples: K-Sample Rank Tests and their Combinations. R package version 1.2–9. [Google Scholar]
  31. Staicu A-M, Lahiri SN, and Carroll RJ (2015). Significance tests for functional data with complex dependence structure. Journal of Statistical Planning and Inference 156, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Staicu A-M, Li Y, Crainiceanu CM, and Ruppert D (2014). Likelihood Ratio Tests for Dependent Data with Applications to Longitudinal and Functional Data Analysis. Scandinavian Journal of Statistics 41, 932–949. [Google Scholar]
  33. Tannenbaum C, Ellis RP, Eyssel F, Zou J, and Schiebinger L (2019). Sex and gender analysis improves science and engineering. Nature 575, 137–146. [DOI] [PubMed] [Google Scholar]
  34. Wynne G and Duncan AB (2020). A kernel two-sample test for functional data. [Google Scholar]
  35. Xiao L, Li Y, and Ruppert D (2013). Fast bivariate P-splines: the sandwich smoother. Journal of the Royal Statistical Society. Series B (Statistical Methodology) 75, 577–599. [Google Scholar]
  36. Xiao L, Zipunnikov V, Ruppert D, and Crainiceanu C (2016). Fast covariance estimation for high-dimensional functional data. Statistics and Computing 26, 409–421. [DOI] [PMC free article] [PubMed] [Google Scholar]
  37. Xu Y, Li Y, and Nettleton D (2018). Nested Hierarchical Functional Data Modeling and Inference for the Analysis of Functional Plant Phenotypes. Journal of the American Statistical Association 113, 593–606. [Google Scholar]
  38. Yao F, Müller H-G, and Wang J-L (2005). Functional Data Analysis for Sparse Longitudinal Data. Journal of the American Statistical Association 100, 577–590. [Google Scholar]
  39. Zhang J-T, Cheng M-Y, Wu H-T, and Zhou B (2019). A new test for functional one-way ANOVA with applications to ischemic heart screening. Computational Statistics & Data Analysis 132, 3–17. Special Issue on Biostatistics. [Google Scholar]
  40. Zhang J-T and Liang X (2014). One-Way ANOVA for Functional Data via Globalizing the Pointwise F-test. Scandinavian Journal of Statistics 41, 51–71. [Google Scholar]
  41. Zhu H, Li R, and Kong L (2012). Multivariate varying coefficient model for functional responses. The Annals of Statistics 40, 2634–2666. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

suppl

Data Availability Statement

The data that supports the findings in this paper are available in the supporting information section of this paper.

RESOURCES