Wu et al. present a distributional independent component analysis (DICA), providing a generalized framework that has the ability to extract features from different imaging modalities. We commend the authors on this highly innovative and useful development. ICA is at the forefront of neuroimaging analyses, recognized for its ability to evaluate the hidden spatiotemporal structure contained within brain imaging data (Calhoun et al., 2009). However, as Wu et al. point out, integrating information across different imaging modalities is nontrivial. The framework that Wu et al. present has the flexibility to extract source signals from diverse types of data by taking a fundamentally different approach compared to that of classical ICA. Instead of performing source separation directly on the observed data, they model the observed data with a mixture distribution where the component distributions are tailored to the data representations from a particular imaging modality. The mixture distribution model contains the component distributions, as well as a set of weight parameters that correspond to the loading on each component distribution. DICA then performs the ICA decomposition on the posterior weights. This setup allows for the component distribution to depend on the imaging modality, with the weights being comparable across modalities.
There are several exciting advantages to this new modeling approach, one of which is that it is the first ICA method that allows for source separations for diffusion tensors that are based on single subject diffusion-weighted magnetic resonance imaging (DWI) scans. Diffusion tensor magnetic resonance imaging (DTI) is a popular approach for studying normal brain development and aging, as well as changes in various brain disorders, due to its unique ability to identify microstructural abnormalities. To date, much of the DTI studies focus on multiple-subject group comparisons. This is largely due to the lack of statistical techniques available to perform single-subject analyses. Although a group level analysis is sufficient when the effects of interest are located in the same anatomic structures across subjects, it is not ideal in situations in which effects are expected to be focal with spatial distributions that are specific to individual subjects (Chung et al., 2008). For example, the study of traumatic brain injury (TBI) has been slow moving in recent decades. Although DTI has provided an important means to study the pathophysiology of TBI, challenges in proposing effective treatment strategies largely stem from the heterogeneity of pathology, which make it difficult to understand the effects of TBI on an individual basis in clinical settings (Ware et al., 2017). Therefore, statistical techniques that can be applied to DTI data on an individual basis are needed for the discovery of reliable biomarkers of injury. The innovative DICA approach proposed by Wu et al. is an important step in this direction.
As briefly mentioned earlier, the other major advantage of DICA is its ability to analyze data from different imaging modalities. Often, it is of interest to compare findings obtained from multiple types of data (e.g., EEG, MEG, fMRI, PET, sMRI, DTI, etc.), as they each have their own advantages and disadvantages related to resolution, safety, and feasibility. Therefore, multiple techniques can be combined to help compensate for the limitations of any one modality (Tulay et al., 2019). There are various ways to combine data obtained from different unimodal modalities. Calhoun and Sui (2016) categorized these approaches as follows: (a) visual inspection: the unimodal analysis results are visualized separately; (b) data integration: the data derived from each unimodal technique are analyzed individually and then overlaid; and (c) data fusion: either one modality constrains another or all modalities are analyzed jointly. As Wu et al. point out, most existing methods have needed to adopt different analytic frameworks that are adapted to each modality. These differing frameworks are not ideal for integrating and comparing results across modalities because it is difficult to discern whether disagreements between modalities suggest true differences in brain mechanisms driving the observed data and results or whether the differences are simply an artifact of differing analysis approaches. Therefore, a general framework such as that presented in Wu et al. is desirable in that it allows for a cleaner interpretation of the comparison of results across different data types, aiding in (a) and (b) categorized by Calhoun and Sui. Wu et al. plan to address category (c), that is, data fusion, in future work, as they discuss an extension of DICA to perform joint decomposition across multiple modalities.
In addition to the clear advantages of the method, we would like to discuss some potential disadvantages. The authors propose an expectation-maximization (E-M) algorithm to estimate the parameters in the mixture distribution model. A point of caution with this approach is that it is only guaranteed to find a set of parameter estimates that gives a local maximum of the data likelihood and not necessarily the absolute maximum. Therefore, it is often recommended to run the algorithm multiple times, initializing the estimation routine with different starting values. The parameter set that produces the maximum likelihood over the series of runs is then chosen as the final set. There have been many proposals on how to choose starting values, including a grid search (Laird, 1978), principal components analysis (Karlis & Xekalaki, 2003), clustering approaches (Leroux, 1992; Woodward et al., 1984), and estimates from other estimation schemes such as method of moments (Lindsay & Basak, 1993). The choice of starting values greatly affects not only the ability of the algorithm to detect the global maximum, but also the speed at which it converges (Karlis & Xekalaki, 2003). Therefore, this choice should be well thought out, and multiple starting points should be explored.
A second point of caution with the proposed framework regards the selection of the number of components in the mixture model, as well as the number of independent components in the ICA decomposition. The choices made for each of these are importance ones. Choosing too many components in the mixture model may cause the mixture to overfit the data, leading to poor interpretations, while not choosing enough components may lead to the inability to approximate the true underlying data structure (Huang et al., 2017). Wu et al. suggest choosing the number of components in the mixture model according to the Bayesian information criteria (BIC). The computational burden of this can be heavy, especially when one considers the need to initialize the algorithm at multiple starting points to ensure a global maximum. As shown by Kim and Seo (2014), the presence of multiple local maximizers, which are common in Gaussian mixture models, affects the performances of model selection techniques used to choose the number of components. Therefore, we recommend that future work explores techniques such as those presented in Kim and Seo (2014), Seo and Kim (2012), or Chen and Tan (2009) to efficiently choose the number of mixture components. Likewise, as with all ICA approaches, one should choose the number of independent components with caution. Although statistical criteria exist to optimally select the number of independent components, there is no single “best” dimensionality for the underlying neurophysiology of multiple distributed systems (Cole et al., 2010). There will always be multiple valid separations of the data depending on the level of granularity at which one wishes to study the system.
Despite the limitations described above, we are delighted by the work presented by Wu et al. The DICA method fills an important need for a unified ICA approach that can analyze diverse imaging modalities, as well as analyze single subject DTI data. As discussed, this approach lays a foundation for future work in fusing data from multiple imaging modalities and opens doors for exciting work to be done on individual level analyses that may lead to better brain disease and injury detection, as well as better treatment plans, in clinical settings.
ACKNOWLEDGMENTS
This work was supported by the National Institute of Biomedical Imaging and Bioengineering, Grant/Award Number: R01EB024559 (Simpson).
REFERENCES
- Calhoun VD, Liu J & Adali T (2009) A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. Neuroimage, 45(1), S163–S172. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Calhoun VD & Sui J (2016) Multimodal fusion of brain imaging data: a key to finding the missing link(s) in complex mental illness. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging, 1(3), 230–244. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Chen J & Tan X (2009) Inference for multivariate normal mixtures. Journal of Multivariate Analysis, 100(7), 1367–1383. [Google Scholar]
- Chung S, Pelletier D, Sdika M, Lu Y, Berman JI & Henry RG (2008) Whole brain voxel-wise analysis of single-subject serial DTI by permutation testing. Neuroimage, 39(4), 1693–1705. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Cole DM, Smith SM & Beckmann CF (2010) Advances and pitfalls in the analysis and interpretation of resting-state FMRI data. Frontiers in Systems Neuroscience, 4, 8. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Huang T, Peng H & Zhang K (2017) Model selection for Gaussian mixture models. Statistica Sinica, 27, 147–169. [Google Scholar]
- Karlis D & Xekalaki E (2003) Choosing initial values for the EM algorithm for finite mixtures. Computational Statistics & Data Analysis, 41(3–4), 577–590. [Google Scholar]
- Kim D & Seo B (2014) Assessment of the number of components in Gaussian mixture models in the presence of multiple local maximizers. Journal of Multivariate Analysis, 125, 100–120. [Google Scholar]
- Laird N (1978) Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73(364), 805–811. [Google Scholar]
- Leroux BG (1992) Consistent estimation of a mixing distribution. The Annals of Statistics, 20, 1350–1360. [Google Scholar]
- Lindsay BG & Basak P (1993) Multivariate normal mixtures: a fast consistent method of moments. Journal of the American Statistical Association, 88(422), 468–476. [Google Scholar]
- Seo B & Kim D (2012) Root selection in normal mixture models. Computational Statistics & Data Analysis, 56(8), 2454–2470. [Google Scholar]
- Tulay EE, Metin B, Tarhan N & Arikan MK (2019) Multimodal neuroimaging: basic concepts and classification of neuropsychiatric diseases. Clinical EEG and Neuroscience, 50(1), 20–33. [DOI] [PubMed] [Google Scholar]
- Ware JB, Hart T, Whyte J, Rabinowitz A, Detre JA & Kim J (2017)Inter-subject variability of axonal injury in diffuse traumatic brain injury. Journal of Neurotrauma, 34(14), 2243–2253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Woodward WA, Parr WC, Schucany WR & Lindsey H (1984) A comparison of minimum distance and maximum likelihood estimation of a mixture proportion. Journal of the American Statistical Association, 79(387), 590–598. [Google Scholar]
