Abstract
We define conditions under which sums of dependent spatial data will be approximately normally distributed. A theorem on the asymptotic distribution of a sum of dependent random variables defined on a 3-dimensional lattice is presented. Examples are also presented.
Keywords: Central Limit Theorem, spatial data, dependent data
1. Introduction
Imaging techniques provide non-invasive tools to track the development and progression of chronic pathologic processes. In early phase clinical trials, imaging may be used on animals to evaluate the usefulness of potential new therapeutic drugs. For example, the efficacy of a new cancer drug may be assessed by measuring the reduction in tumor size as seen on an image, negating the need to sacrifice the animals or at least extending the time until the sacrifice is conducted and allowing for multiple assessments on the same animal. Human imaging studies are also useful in the study of cancer as well as neurologic disorders such as dementia and schizophrenia where abnormalities of tissue are known to occur. Imaging enables the study of these abnormalities without the need of surgery, and is particularly useful in the context of neurodegenerative diseases where slicing into the brain is an unlikely strategy. Therefore, imaging has become a critical tool for the study of these chronic processes and the evaluation of new treatments for those conditions. However, imaging is incredibly expensive, so studies tend to be of small or moderate size.
Each image, itself, is an extremely rich data source. The images are broken down into hundreds of thousands if not millions of volume elements, or voxels, each of which contains information about the region or tissue being studied. These voxels may be thought of as data points on a 3-dimensional lattice. Due to the underlying anatomy and biology of the disease process, these data are likely to be highly correlated locally with the correlation decreasing as the Euclidean distance between points increases. There is, therefore, a need to perform sensible data reduction in such a way that under reasonable conditions, the reduced summary measures will yield approximately normally distributed measures for use in statistical analyses. Demonstration of the approximate normality of summary data from high-dimensional imaging would support use of standard linear model techniques for analysis even with small numbers of patients or animals, for example, use of two-sample t-tests to compare two treatments in small pre-clinical experiments.
Asymptotic distributions of sums of dependent data in one dimension have been studied extensively in the literature, using various dependence schemes. Many researchers have proved asymptotic normality for m-dependent random variables, in which random variables m units apart in sequence are assumed to be independent. Hoeffding and Robbins (1948) assumed a fixed value for m while Berk (1973) and Romano and Wolf (2000) allowed for m to change with the sample size and to grow infinitely large. Serfling (1968) examined asymptotic properties of sums of random variables under less stringent dependence structures.
Sajjan (2000) developed a central limit theorem for a stationary dependent process on the 2-dimensional lattice. For the lattice case, the idea of m-dependence is extended to (m1, m2)-dependence. Christofides and Mavrikiou (2003) also focused on the two-dimensional case, but relaxed the assumption of stationarity. In doing so, they made strong assumptions that were difficult to meet in practice. They also defined a dependence structure for the lattice, called ρ-radius dependence, that will be discussed below.
Use of linear combinations of data located on a lattice offers promise for the study of spatially distributed pathological processes, which may be observed through structural magnetic resonance images (MRI) or positron emission tomography (PET) scans as well as other imaging modalities. We present notation followed by a theorem describing conditions necessary for asymptotic normality of the sum of variables located on a 3-dimensional lattice. We provide the proof of the theorem as well as several examples for which the conditions of the theorem hold or the theorem might be useful.
2. Theoretical Considerations
Christofides and Mavrikiou (2003) defined the following concept of local dependence appropriate for the issue considered here:
Definition: For a positive integer r let Nr denote the r-dimensional positive integer lattice and let {Xi, i ∈ Nr} be an array of random variables defined on a common probability space (Ω, 𝒜, P). Let ρ ≥ 0. The random variables {Xi, i ∈ Nr} are said to be ρ-radius dependent if Xi1 and Xi2 are independent whenever d(i1, i2) > ρ, where d(i1, i2) is the Euclidean distance between i1 and i2.
Let {Xi, i = (i1, i2, i3) ≤ (n1, n2, n3)} be an array of ρ-radius dependent three-dimensionally indexed random variables. Xi represent either individual values or weighted values, and we are interested in asymptotic properties of their sum. n1 defines the vertical dimension (back to front), n2 defines the horizontal dimension (left to right) and n3 defines the spatial dimension (bottom to top).
Let ρ* = ⌈ρ⌉, the smallest integer greater than or equal to ρ.
Let νn be a positive integer greater than ρ*, which is allowed to change with n = (n1, n2, n3).
Let Ti1,i2,i3 = {(j1, j2, j3) : ik − νn ≤ jk ≤ ik + νn, k = 1, 2, 3} so that Ti1,i2,i3 is the (2νn + 1) × (2νn + 1) × (2νn + 1) cube whose center is the point (i1, i2, i3).
Let
the total (or weighted total) in the cube.
The lattice may be divided into a set of independent cubes and borders separating the cubes, beginning in the back left corner of the bottom layer. The first cube is centered at (kn, kn, kn) where kn = νn + 1. Cube centers will be regularly spaced at intervals of λn = 2νn + ρ* + 1, where λn represents the width of the cube and the adjacent border. Because the variables Xi1,i2,i3 are ρ-radius dependent and the cubes are separated by a border of width ρ* > ρ, the cube sums are independent. We assume that ni + ρ* may be evenly divided by λn, so that there are no partial cubes in the ith dimension (i = 1: back to front, i = 2: left to right, i = 3: bottom to top). The cubes are centered at (i1, i2, i3) = (kn + j1λn, kn + j2λn, kn + j3λn), where 0 ≤ ji ≤ Di − 1 and Di is the number of cubes in each dimension.
The regions of points that do not belong to any of the cubes may be divided into seven zones for each cube (Figure 1). Three regions are the “flat” rectangular areas adjacent to the right, front, and top of a cube, but not symbolically extending past the edges of the cube. Next we identify three strips adjacent to the edges but not extending past the corners of the cube. Finally, the remaining region is the small cube adjacent to the top right front corner.
Figure 1.
The different components of ; “A” = above, “R”=right, “F” =front, “RF” =right and front, “AR” =above and right, “AF” =above and front, “RFA” = right, front, and above.
For each cube, Tk where k = (i1, i2, i3), define to be the sum of the points in the seven boundary regions surrounding it.
The proof of the theorem relies on taking limits in such a fashion that the normalized sum of the sums of the random variables in the cubes is asymptotically normal, by standard central limit theorem arguments, while the normalized sum of the random variables in the boundary regions goes to zero in probability.
2.1. Theorem
The main theorem for the asymptotic distribution of the sum of random variables located on a spatial lattice is given below.
Theorem 1. Let {Xi1,i2,i3, (i1, i2, i3) ≤ (n1, n2, n3)} be an array of ρ-radius dependent three dimensionally indexed random variables. Without loss of generality, assume that these variables have mean zero. Let νn be a positive integer greater than ρ* = ⌈ρ⌉, as described above in the notation, and λn = 2νn + ρ* + 1. Let nc = D1D2D3 be the total number of cubes. Let be the variance of and let also . Assume the finiteness of these quantities, and define
Assume that n1, n2, n3 → ∞ monotonically, that nc → ∞, and that νn → ∞ at a rate slower than n1, n2, n3, such that
| (1) |
| (2) |
Let
Then
where .
Proof. The proof of this theorem uses a method first introduced by Bernstein (1927). We first note that we can decompose the sum of the components of the lattice as follows:
so that
We prove the Theorem using Slutsky’s Theorem, which requires showing that
and that
where .
We begin by showing the first part. As mentioned above, the cubes defined by
are independent. They have mean zero, and a finite variance, . Lyapounov’s condition (Billingsley (1995)) is satisfied for δ = 1, since
Note that
where . By the Lyapounov version of the Central Limit Theorem,
We now show the second part:
By Chebyshev’s inequality, if we can show that
then
| (3) |
| (4) |
where (3) follows because each cube and its border regions only touch borders of the 26 adjacent cubes that share faces, edges, or corners. Also, Var(A + B) ≤ VarA + VarB + 2σAσB and . Line (4) follows from the assumptions and the proof is complete.
2.2. Notes
The assumptions of the theorem are intuitive and are likely to be met by measures of interest related to image data. They are also similar to assumptions used in other versions of the Central Limit Theorem. In particular, Assumption (2) is the standard Central Limit Theorem assumption that the distribution does not have extremely long tails. Assumption (1) guarantees that the lattice may be decomposed into independent blocks that contribute the dominant share of information (variance) when we sum over the lattice, while the share of the border regions can be made arbitrarily small by choosing large enough n = n1n2n3. One way to allow for the blocks to provide the majority of the information in the lattice is to allow the block size to grow as the overall dimension of the lattice grows (something that had not been considered by Christofides and Mavrikiou (2003)).
Researchers may question the assumption of zero mean in the theorem, since an image made up of zero-mean voxels would not be very interesting. However, if all of the voxels had the same mean, the theorem would still hold. Segmentation strategies for structural images often assume that intensities of particular tissue type, such as grey matter in the brain, come from a mixture of Gaussian distributions (Ashburner and Friston (2005)). Therefore, it is assumed that voxels belonging to a particular class of tissue all have the same mean. If voxels have different means, subtracting a three-dimensional array of the voxel means from the image would result in an array of zero-mean variables in which the correlation structure has been preserved. Therefore, as long as the conditions of the theorem still hold on this transformed array, the linear combinations of the voxel-level data will be approximately normally distributed (see Example 2.2). Finally, patterns in the image may be explained by demographic or clinical information. If linear regression models are used to predict voxel-level data, the array consisting of residuals from these models would have zero mean at each position. Summaries based on this residual array would be helpful for residual diagnostics to determine if there is any additional structure remaining in the data. Therefore, there are many contexts in which the image data do not have zero mean in which the theorem may still be applied.
It is important to note that in this imaging setting, n = n1n2n3 represents the size of a single image that generates the desired spatial process, not the number of individuals imaged. Taking the limit as the size of the lattice grows infinitely large may be thought of as looking at larger and larger images or regions. Therefore, the theorem states that under suitable conditions, linear summaries of imaging data will be approximately normal if the image had enough voxels, rather than requiring a large number of subjects (as is the usual interpretation of n → ∞ in other Central Limit Theorems.) This theorem supports the use of standard normal-theory approaches for analysis of studies with even fairly modest numbers of people using a variety of common-sense linear combinations of voxel-level data. In addition, it provides a rationale for use of standard linear model techniques with normality assumptions to examine the role of clinical or environmental variables in explaining variation in image data
However, the theorem says that summary measures derived by adding combinations of the voxel-level data will be approximately normally distributed, since in practice, the number of voxels will always be finite. Given a fixed number of voxels, if the spatial dependence is too strong, this approximation may not be appropriate, so it is important to understand the properties of the underlying data. Some preliminary work we have done with structural images suggest that the spatial correlation of error terms drops off drastically as the distance between voxels increases (data not shown), but that cannot be guaranteed for all data generated from images. Researchers should also be careful if summaries are based on small regions of the brain, since again, the approximation may not be appropriate. However, many brain regions of interest, even though anatomically small, may still contain relatively large numbers of voxels. For example, a recent publication by Schuff et al. (2009) presents average volumes of the hippocampus, a small region of interest in aging studies, that range from about 1600 mm3 to over 2000 mm3. Since voxels in MRI are often 1 mm3, these volumes translate to well over 1000 voxels on which the summaries are based. In situations of strong spatial dependence or small regions, more robust statistical analytic techniques that do not require the assumption of exact normality, such as resampling or permutation methods, should be considered.
2.3. Examples
Example 2.1 In the simplest case of independent, identically distributed random variables (ρ− radius dependent with ρ = 0), by definition, the border regions are non-existent, since the width of the border region is the smallest integer greater than or equal to ρ (0). Therefore, the lattice may be decomposed entirely into independent cubes and it is easy to show that the conditions of the theorem are met as long as the distribution has finite mean, variance, and absolute third moment.
Example 2.2 Now consider a slightly more difficult case, which incorporates non-normal data as well as dependence. Many uses of imaging involve the identification of abnormalities (e.g. abnormal tissue or function). Therefore, it would be useful to identify conditions under which the sum of a set of dependent Bernoulli random variables is approximately normal. Let {Xi1,i2,i3, (i1, i2, i3) ≤ (n1, n2, n3)} be an array of ρ-radius dependent three dimensionally indexed Bernoulli (pi1,i2,i3) random variables with pi1,i2,i3 bounded away from zero and 1 (0 < pmin ≤ pi1,i2,i3 ≤ pmax < 1 for all (i1, i2, i3)), where pmin and pmax are the smallest and largest probabilities of a success across the voxels. Assume that the overall proportion of successes (abnormalities) in the entire lattice remains constant as the lattice grows and that the random variables are positively correlated. The assumption of positive correlation is reasonable, since damage is thought to spread locally, so that if one voxel is damaged, surrounding voxels are more likely to also be damaged. All conditions of the theorem are met in this case (see the sketch of the proof below), so that summaries consisting of simple sums of the voxel-level data over sufficiently large images will be approximately normal.
Sketch of proof: We first transform the variables so that they have zero mean: Yi1,i2,i3 = Xi1,i2,i3 − pi1,i2,i3. Each cube has dimension (2νn + 1) × (2νn + 1) × (2νn + 1) and consists of a total of (2νn + 1)3 random variables (voxels). The border regions between cubes consist of a total of 3(2νn + 1)2ρ* + 3(ρ*)2(2νn + 1) + (ρ*)3 random variables. It is then easy to show that for a given cube size, are both finite.
To show Assumption (1),
| (5) |
| (6) |
| (7) |
In the above, C1 and C2 are constants that do not depend on n1, n2, or n3. To show (5), use the definition of the variance of a sum of random variables and note that . (6) follows from the definition of the variance of a sum of random variables and from the assumption of positively correlated random variables. Finally, (7) follows from the assumption that νn grows at a rate slower than n1, n2, n3, and, therefore, slower than nc.
To show Assumption (2),
| (8) |
| (9) |
Once again, C3 is a constant that does not depend on n1, n2, or n3. By using the triangle inequality and then expanding the cubic polynomial in , (8) can be shown. Finally, because we assume that νn grows at a rate slower than n1, n2, n3 (and therefore, nc), (9) follows and the assumption is met.
Example 2.3 Many biologically driven summaries of image data will be linear combinations of the voxel-level data with scalar multipliers different from one (the case in Example 2.2). However, a minor constraint on the multipliers guarantees that the conditions of the theorem are met. Let {Xi1,i2,i3, (i1, i2, i3) ≤ (n1, n2, n3)} be an array of ρ-radius dependent three dimensionally indexed Bernoulli (pi1,i2,i3) random variables, as described in the previous example. Consider the new array {Ci1,i2,i3Xi1,i2,i3, (i1, i2, i3) ≤ (n1, n2, n3)}, where for some constant M. This new array satisfies the conditions of the theorem. Therefore, as long as the multipliers used in the linear combination are bounded (and the conditions of the theorem are met for the original voxel-level data), these summaries will also be approximately normal.
Example 2.4 As a practical example in neuroimaging research, consider a set of positron emission tomography (PET) scans, measuring the glucose metabolism of the brain. Suppose the voxel values have been standardized to a normal, healthy population by centering at the normal mean and dividing by the standard deviation of the normal population. The hypothesis of interest is that the scans of these subjects show reduced glucose metabolism relative to the healthy population. Under the null hypothesis of metabolism similar to the healthy individuals, the standardized voxel values (Xi1,i2,i3) have zero mean. If we further assume that E|Xi1,i2,i3|3 is finite and that only those voxels that are close by (for example, sharing a side) are positively correlated, the conditions of the theorem hold and the sum of the voxel values will be approximately normally distributed. The proof for the assumptions follows similar to that given above for Example 2.2.
3. Conclusion
We present an asymptotic result for spatially correlated data and provide examples for which the theorem applies. The theorem suggests that linear combinations of voxel-level data over a large enough region (obtained through an image) may be used as outcomes in a regression setting. We have shown that linear combinations generated from images of people or animals with similar underlying characteristics will be approximately normally distributed even for small to moderate numbers of individuals, provided the images themselves cover a large enough region with a high enough resolution (large number of voxels). Such linear combinations include simple sums of the image data or weighted sums with weights defined according to an underlying biological hypothesis. Applications where this theorem might be useful include understanding patterns of tissue or metabolic abnormalities seen on structural MRI or PET scans or activation patterns observed through functional MRI, all imaging approaches featuring high-resolution images, provided the region of interest is substantially larger than the local correlation radius.
Acknowledgments
The authors would like to thank the two referees for their extremely helpful comments, which greatly improved the manuscript.
Footnotes
This work was supported by NIH/NIA grant # 5 P30 AG010129
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
References
- Ashburner J, Friston KJ. Unified segmentation. Neuroimage. 2005;26:839–851. doi: 10.1016/j.neuroimage.2005.02.018. [DOI] [PubMed] [Google Scholar]
- Berk KN. A central limit theorem for m-dependent random variables with unbounded m. The Annals of Probability. 1973;1:352–354. [Google Scholar]
- Bernstein S. Sur l’ extension du thèorém limite du calcul des probabilités aux sommes de quantités dépendantes. Mathematische Annalen. 1927;97:1–59. [Google Scholar]
- Billingsley P. Probability and Measure. New York: John Wiley & Sons; 1995. [Google Scholar]
- Christofides TC, Mavrikiou PM. Central limit theorem for dependent multidimensionally indexed random variables. Statistics & Probability Letters. 2003;63:67–78. [Google Scholar]
- Hoeffding W, Robbins H. The central limit theorem for dependent random variables. Duke Mathematical Journal. 1948;15:773–780. [Google Scholar]
- Romano JP, Wolf M. A more general central limit theorem for m-dependent random variables with unbounded m. Statistics & Probability Letters. 2000;47:115–124. [Google Scholar]
- Sajjan SG. A note on central limit theorems for lattice models. Journal of Statistical Planning and Inference. 2000;83:283–290. [Google Scholar]
- Schuff N, Woerner N, Boreta L, Kornfield T, Shaw LM, Trojanowski JQ, Thompson PM, Jack CR, Jr, Weiner MW The Alzheimer’s Disease Neuroimaging Initiative. MRI of hippocampal volume loss in early Alzheimer’s disease in relation to ApoE genotype and biomarkers. Brain. 2009;132:1067–1077. doi: 10.1093/brain/awp007. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Serfling RJ. Contributions to central limit theory for dependent variables. The Annals of Mathematical Statistics. 1968;39:1158–1175. [Google Scholar]

