Abstract
Unsupervised latent variable models—blind source separation (BSS) especially—enjoy a strong reputation for their interpretability. But they seldom combine the rich diversity of information available in multiple datasets, even though multidatasets yield insightful joint solutions otherwise unavailable in isolation.
We present a direct, principled approach to multidataset combination that takes advantage of multidimensional subspace structures. In turn, we extend BSS models to capture the underlying modes of shared and unique variability across and within datasets. Our approach leverages joint information from heterogeneous datasets in a flexible and synergistic fashion. We call this method multidataset independent subspace analysis (MISA).
Methodological innovations exploiting the Kotz distribution for subspace modeling, in conjunction with a novel combinatorial optimization for evasion of local minima, enable MISA to produce a robust generalization of independent component analysis (ICA), independent vector analysis (IVA), and independent subspace analysis (ISA) in a single unified model.
We highlight the utility of MISA for multimodal information fusion, including sample-poor regimes (N = 600) and low signal-to-noise ratio, promoting novel applications in both unimodal and multimodal brain imaging data.
Index Terms—: BSS, MISA, multidataset, fusion, ICA, ISA, IVA, subspace, unimodal, multimodality, multiset data analysis, unify
I. Introduction
BLIND source separation (BSS) [1], [2] is the recovery of unknown latent source signals from their observed mixtures without knowing the mixing process. It is widely adopted in signal, image, and video processing areas, including chemometrics [3], speech [4], multispectral imaging [5], [6], medical imaging [7], [8], and video processing [9], [10]. The “blind” property (unknown source and mixing) is highly effective, especially in applications lacking a precise model of the measured system(s) and with data confounded by noise of unknown or variable characteristics.
In our recent review [1], we introduced a unified multidataset multidiversity multidimensional framework for subspace modeling. It provided a fresh perspective on BSS, identifying both single-dataset multidimensional (SDM) and multidataset unidimensional (MDU) research as subproblems, and outlining a path to reconcile them. In turn, a new class of multidataset multidimensional (MDM) problems became apparent, emphasizing the potential benefits of general latent subspace correspondence across datasets.
Models designed for MDM problems are extremely flexible. A single joint model not only encodes higher complexity through features of flexible dimensionality (the subspaces yk) but also accommodates arbitrary links among these features over multiple datasets/modalities (xm). To illustrate (Fig. 1), we consider a multivariate information functional I(y) that operates simultaneously on the joint probability density function (pdf) of all subspaces p(yk). It captures the association modes underlying multidatasets while adaptively learning multiple linear transformations Wm (dashed lines). When datasets represent modalities, this directly leverages multimodal joint information and lets it guide the decompositions naturally. Combining different multimodal views of the same system, this generalized approach to multimodal fusion offers broader, unique insights into its underlying properties and behavior.
Fig. 1. Subspace identification from multidatasets with MISA.

We consider the general case of M datasets/modalities (xm) jointly decomposed, without loss of generality, into C sources ymi each, via linear transformations Wm. Here, each xm would be either audio or video streams, indicating fusion via the joint analysis of all datasets. Sources are combined into dk-dimensional subspaces yk and all-order statistics is utilized to gauge their associations and pursue subspace independence. Only a single correspondence “axis” is required, e.g., time, meaning there is a video frame for each audio sample in audio/video (a/v) data fusion, although the method is not limited to a/v, fusion, nor temporal synchrony specifically. Subspaces establish links among groups of sources across different datasets/modalities. Therefore, multidataset independent subspace analysis (MISA) blindly recovers hidden linked features of flexible dimensionality from multiple datasets and modalities.
Aiming at generality, we pursue statistical independence among subspaces yk to achieve joint BSS for MDM. Initial investigation of this approach [11]–[13] indicated the presence of critical issues. These included premature convergence to local minima, rigid hard-coded subspace distribution parameters, and a restricted orthogonal regularization for Wm.
Here, we propose a vastly improved expansion to address these issues. We use combinatorial optimization to search over subspace configurations P (Fig. 2) and escape local minima, all-order statistics (i.e., both second- and higher-order statistics—SOS and HOS, respectively) to model p(yk) via the more general Kotz distribution [14], and a scale-controlled formulation for numerical stability. We also generalize usage to non-orthogonal Wm sans data reduction. We refer to this robust, performant approach simply as multidataset independent subspace analysis (MISA) (Fig. 1). In the formulation below, p(y) represents the joint pdf of all sources, and p(yk) the pdf of the k-th subspace.
Fig. 2. General architecture of linear MDM problems.

The lower layer corresponds to one Vm × 1 observation of each input data stream xm. The middle layer represents the Cm sources. The top layer establishes the K subspaces yk, which are collections of statistically dependent sources (indicated by same-colored connections), following the compositions laid out in the assignment matrix P. This architecture suggests a natural hierarchy among models [1], [15, Ch. 8] in accordance with the number of datasets comprising x and the occurrence of multidimensional sources within any single dataset. MDU and SDM problems include the simpler SDU case, and the most general MDM problem contains all others as special cases.
Let I(y) be the Kullback-Leibler (KL) divergence, an information functional useful for comparing two pdfs p(y) and q(y), where, here, is the desired factor pdf of p(y). Then let h(·) be the joint differential entropy, , for a random vector z with pdf p(z), being the expected value operator, and let Pk be the subset of P assigning specific sources into subspace k. Consequently,
| (1) |
We propose to estimate a collection of linear transformations y = Wx simultaneously from all datasets by solving:
| (2) |
for any W, subspace assignments P, and data streams x. This convenient formulation, which gives mutual information (MI) when the random vector y is two-dimensional, only attains its lower bound of I(y) = 0 when p(y) = q(y), implying that the identified subspaces are indeed statistically independent. A sketch of the convergence proof for this approach is provided as supplemental material.
With MISA, direct study of the interactions and associations among multiple datasets and modalities becomes feasible, in a truly synergistic way. Consequently, joint sources yk emerge naturally as a direct result of the shared variability estimated from all-order statistical dependences among datasets. Breaking from the limited, rigid paradigm of MDU models dominating current multimodal research [15, Ch. 8], it allows general subspace associations and even absent features in specific datasets. As a unifying toolkit, MISA can execute many general unconventional BSS tasks as well as classical special cases such as independent component analysis (ICA) [16], independent subspace analysis (ISA) [17], and independent vector analysis (IVA) [18]. Also, it outperforms several algorithms in each of these tasks, successfully achieving generalized subspace identification from multidatasets. This uniform implementation yields user accessibility and intuition thanks to the umbrella formulation and methodologies introduced here.
In the current paper, we demonstrate that MISA (our proposed method) outperforms algorithms such as Infomax [19], [20], Laplace IVA (IVA-L) [18], and Gaussian-Laplace IVA (IVA-GL) [21] in challenging experiments and realistic scenarios satisfying the requisites outlined in [22]. MISA’s remarkable performance and stability in certain extremely noisy cases (signal-to-noise ratio (SNR) of 0.0043dB) highlights the benefit of careful multidataset subspace dependence modeling with all-order statistics. Likewise, MISA with greedy permutations (MISA-GP) clearly outperforms joint blind diagonalization with SOS (JBD-SOS) [23] and EST_ISA [24] even at low SNR levels (SNR of 3dB). This shows the benefit of combinatorial optimization to escape local minima in subspace analyses.
Hybrid data results on representative biomedical imaging features and realistic data dimensionality further support the high estimation quality and flexibility of MISA. These include novel applications in high-temporal-resolution functional magnetic resonance imaging (MRI), and multimodal fusion of heterogeneous neurobiological images and signals. The latter also demonstrates feasibility of data fusion even at low SNR and sample-poor regimes (number of observations N = 600), with examples involving functional, structural, and diffusion MRI, as well as electroencephalography (EEG) data. Subspace analysis in its general MDM form has not yet been conducted in a multimodal fusion setting. To the best of our knowledge, MISA is the only approach which can directly investigate this use-case using all-order statistics. Original code and data are available at https://github.com/rsilva8/MISA, with examples to accompany the descriptions in supplemental material (Sections II-B and II-D therein), and detailed derivation of the gradients.
In the following, Section II states the general MDM problem. Section III puts our contributions in context with related works, followed by our methodology description in Section IV. Finally, Sections V and VI present our results and conclusions, respectively. Frequently used acronyms are listed in Table I.
TABLE I.
Frequently used acronyms.
| SDU single dataset unidimensional | SOS second-order statistics |
| SDM single dataset multidimensional | HOS higher-order statistics |
| MDU multidataset unidimensional | pdf probability density function |
| MDM multidataset multidimensional | GP greedy permutations |
| ICA independent component analysis | PCA principal component analysis |
| ISA independent subspace analysis | IVA independent vector analysis |
II. Background
The MDM problem can be formally stated as follows. Given N observations of M ≥ 1 datasets, identify an unobservable latent source random vector , with (Cm sources per dataset), from an observed random vector , with (Vm-dimensional datasets), generated via a mixture vector function f (y, θ) with unknown parameters θ. The m-th Vm × N data matrix containing N observations of xm along its columns is denoted Xm, and the matrix concatenating all Xm is denoted simply as X (likewise for Y and Ym). Both y and f (y, θ) have to be learned blindly, i.e., without knowledge of either of them. For tractability, assume:
the number of latent sources Cm, which may differ in each dataset, is known to the experimenter;
f (y, θ) = Ay is a linear transformation, with θ = A;
A is a block diagonal matrix with M blocks, describing a separable layout structure [1] representing xm = Amym, m = 1…M, where , , each block Am is Vm × Cm, and Vm is the intrinsic dimensionality of each dataset;
some latent sources ymi ∈ y are statistically related to each other, and this dependence is undirected (non-causal), occurring within and/or across datasets;
related sources establish dk-dimensional subspaces1 yk, k = 1…K, with K and the subspace compositions laid out by the experimenter in sparse assignment matrices , such that is a permutation matrix;
subspaces do not relate to each other, i.e., either or the cross-correlations ρk,k′ = 0, k ≠ k′.
Under these assumptions, recovering sources y amounts to finding a linear transformation W for the unmixing vector function y = Wx. This occurs when W = A−, the pseudoinverse of A, implying W is also block diagonal and satisfies ym = Wmxm. The experimenter’s priors on the subspace structure within/between one or more datasets, plus the type of statistics describing within/between subspace relation, determines how P is set and, thus, whether and how the model simplifies to the classical special cases [1]. Our focus will be on MDM models driven by statistical independence among subspaces and dependence within subspaces, namely MISA, in the case of an overdetermined system with Vm ≥ Cm, without implying W is square via the typical principal component analysis (PCA). Table II summarizes our key notations.
TABLE II.
Key notations.
| M, m | Number of datasets/modalities, counter |
| N, n | Number of observations, counter |
| K, k | Number of subspaces, counter |
| C, c | Number of sources, counter |
| Vm | intrinsic dimension of m-th dataset |
| dk | dimension of k-th subspace |
| dmk | dimension of k-th subspace in m-th dataset |
| X, x | Data matrix, and vector |
| Y, y, y | Source matrix, vector, and element |
| ym | Sources from m-th dataset |
| ymc | c-th source in m-th dataset |
| yk | k-th subspace |
| A, Am | mixing matrix, and its m-th block |
| W, Wm | unmixing matrix, and its m-th block |
| P, Pk | full and k-th subset assignment matrices |
| , | covariance and correlation matrices of yk |
| Patterns for Special Cases | |
| SDU: | M = 1, K = C, dk = 1 ∀k |
| SDM: | M = 1, K < C, 1 ≤ dk < C ∀k |
| MDU: | M > 1, K = Cm ∀m, dmk = 1 ∀k, m |
In multimodal brain imaging research, various types of data can be utilized. MRI scans (e.g., structural, diffusion, functional, etc.) typically consist of 3D images, sometimes with an extra dimension. EEGs record the temporal evolution of scalp electric potentials, typically dozens of electrodes at the same time. After collecting two or more such modalities on the same subject, the information is often summarized to a single 3D image and/or time series for each modality. These summary features are obtained from multiple subjects and jointly analyzed with data fusion. Usually, only in-brain signal is considered from 3D images. Those in-brain voxels (volume pixels) are stacked into a single 1D vector prior to fusion. Other modalities, data preparation, and feature generation approaches exist but will not be discussed in this work.
III. Related Work
A. Applications
MDM problems permeate many fields and yet are largely undeveloped. In multimodal fusion of heterogeneous data [25], [26], robust identification of flexible joint features (yk) originating from all data modalities (xm) can yield one-of-a-kind views into a system’s properties. This is a prominent direction in mental health research for biomarker identification and early diagnosis, with potential to convey new strategies for disease severity assessment and translation into personalized treatments [1]. In classification, the association/dependence inherent to multimodal features yk means that good separability in one dataset promotes features with similar property in other datasets, and vice-versa.
The benefits of model flexibility are also notable in various multiset analyses. In the case of multisubject unimodal data (xm) [27]–[32], it would better preserve subject specificity. In analyses that combine multi-site datasets (xm) from different scanners/devices, it could naturally mitigate harmonization issues [33], [34] since site/device-variability would seldom explain multidataset associations. In sensor fusion [5], [25], [35], [36], where noise characteristics can be similar if multiple sensors (xm) share the same environment, it would allow better detection (and potential removal) of noise. For hyperspectral imaging [37]–[39], hyperspectral features (yk) of higher complexity could be identified in time-lapse studies. For domain-adaptive image recognition [40]–[44], enhanced common and unique representations (yk) could be identified across image domains (xm). For multi-view image and video processing [9], [45], [46], objects with complex temporal patterns could be better characterized using (unimodal) higher-dimensional yk, not to mention potential fusion with audio features [47]–[50] via multimodal yk.
B. Methods
Our review of BSS in brain imaging [1] studied the underlying strategies of many methods. It offered a general, broad view of how different methods relate to each other by defining a common hierarchical taxonomy to accurately describe them. The unified framework introduced in that work provided a clear path for general MDM model development, which we adopted here to break from current MDU paradigms [15, Ch. 8]. However, it did not consider any of the issues addressed here, including combinatorial optimization, scale control, and non-orthogonal Wm. These were also missing from our early investigations in [11]–[13]. Besides the vastly expanded methodology—which also introduces the general Kotz distribution for MISA—the current work presents a large number of new experiments and realistic applications.
Notably, the Kotz distribution was first introduced for BSS in [51] but applications were limited to MDU problems (IVA specifically). Consequently, that work cannot be applied to cases where dmk > 1. In addition, its implementation treated the iteratively updated subspace covariances as constant with respect to W (previously, [30], [52] had hard-coded ). The gradients derived for our MISA implementation do not make that assumption and, thus, yield a different search direction than [51] at each step during optimization, even for the IVA case. Also, the optimization approach in [51] was based on simple line search, which is rather different from the interior-point barrier optimization (with bounds and option for non-linear constraints) we utilize here. We also note the use of our novel scale control formulation for numerical stability.
Another work [53] also explores identification of subspace structures in the general MDM setting. However, it is limited to subspaces with Gaussian distribution and, thus, can only leverage SOS to identify subspaces. In contrast to our approach with the Kotz distribution, the approach in [53] cannot leverage HOS for subspace identification. Moreover, our option for the Kotz distribution implies that it suffices to set the parameters in (4) to ψG (Section IV-B) and our model simplifies to the same model in [53], highlighting the generality of MISA. The same argument applies to [23], [54].
Finally, premature convergence to local minima due to the mis-assignment of sources to subspaces is a known challenge for SDM model fitting [55]. However, general MDM problems have drastically more intricate within- and cross-dataset subspace-to-subspace interactions. When subspaces span multiple datasets, a combinatorially higher amount of possible local minima (upwards of in Section V-B4) undermines the numerical optimization performance (here, d0 ≜ 0). While combinatorial issues are common in other research areas [56]–[58], they have been largely neglected in BSS literature because of how simple (and often irrelevant) they are for ICA.
In Sections IV-D and IV-E we propose novel combinatorial optimization algorithms for evasion of local minima in the numerical optimization of (1). To the best of our knowledge, this is the first attempt at disentangling these permutation ambiguities in the general MDM case. In contrast to [59], our approach serves only to move a particular solution out of a local minima so that the numerical optimization may resume. Plus, the structural subspace priors contained in Pk guide our combinatorial procedures without relying on ancillary objective functions to determine residual source dependences.
IV. Methodology
A. Scale Control
An inherent property of independence is invariance to arbitrary scaling of each or any source (i.e., multiplication by a non-zero scalar value), which is why ICA sources have scale ambiguity. This has an important implication on the geometry of the resulting objective function we seek to optimize. First, visualize the elements of W into a -dimensional vector w = vec(W) as would be done in a typical numerical optimization setting. Due to scale invariance, evaluation of the objective function on either w or aw, where a is a non-zero scalar, yields the same value.
Since the objective function evaluates to the same values along the line2 spanned by w, only certain changes in the direction of w incur changes in the objective function. Consequently, it suffices to look for a solution on the surface of the hypersphere associated with a given a, since the landscape of objective function values would be identical across concentric (hyper) shells (Fig. 3 (a)). Moreover, scale invariance induces a “star” shape to the contour lines of the objective function in this scenario (Fig. 3 (b)). Since gradients are orthogonal to contour lines, they also ought to be orthogonal to w and lie on the tangent hyperplane of any given hypersphere (Fig. 3 (c)).
Fig. 3. Geometry of the independence-driven objective function in SDU problems.

(a) Due to the scale invariance property of statistical independence, evaluation of the objective function on either w or aw, where a is a non-zero scalar, yields the same value. Only certain changes in the direction of w incur changes in the objective function. Thus, the solution space of independence-driven SDU problems lies on a hypersphere. (b) Scale invariance induces a “star” shape on the contour lines. (c) Consequently, the gradient of a scale invariant function must lie on the tangent hyperplane of the hypersphere associated with a given w.
The main implication is that stepping in the (negative) direction of the gradient towards a local minimum will likely inflate w and lead the search direction in an outward spiral with respect to w. This can be a problem if the norm of w grows indefinitely and eventually becomes numerically unstable. More importantly, as the norm of w increases toward outer shells, the landscape of the objective function starts to stretch (because its values are kept the same while the surface area of the hypersphere grows). Consequently, the gradient grows shorter regardless of its proximity to any local minimum. The smaller gradient will then lead to shorter step lengths, likely yielding very little improvement at latter stages of the numerical optimization and deterring convergence.
This issue is often disregarded in the literature (incidentally, the Infomax algorithm [19] is free of this issue) and should be addressed prior to evaluation of the efficient relative gradient [2, Ch. 4]. One simple approach to address it is to constrain the norm of w. While direct, implementing this approach can be quite inefficient. Rather, since any scale is equally acceptable (at least in theory), we propose to control the estimated source scales by fixing them in the model. Specifically, this is accomplished by assigning the estimated subspace correlation matrix as the model dispersion matrix Dk in the Kotz distribution, effectively making the objective function scale selective rather than scale invariant (Section IV-B). Therefore, whenever the source estimates from the data do not support the model variances associated with this choice of , the mismatch induces changes in W that lead their variances towards the prescribed ones. In summary, the proposed scale selective formulation eliminates scaling issues without the need for a formal constraint.
B. Objective Function
Equation (1) admits some simplifications following a few manipulations. First, we note that h(y) = h(Wx) = h(x) + ln|det(W)|, and h(x) can be discarded since it is constant with respect to W. Second, since W is block diagonal. Finally, when Vm ≠ Cm, for any m, the determinant of Wm is undefined. In order to circumvent this issue, we propose to substitute the determinant by the product of the singular values of Wm, i.e., , where σmi are the diagonal elements of originating from the singular value decomposition . We note that when W is non-singular and square. Altogether, we can recast (1) as:
| (3) |
where , and yk = PkWx.
This formulation is still incomplete because p(yk) is undefined. Here we choose to model each subspace pdf as a multivariate Kotz distribution [14], [60]:
| (4) |
where dk is the subspace dimensionality, βk > 0 controls the shape of the pdf, λk > 0 the kurtosis (i.e., the degree of peakedness), and the hole size, while and for brevity. Γ(·) denotes the gamma function. The positive definite dispersion matrix Dk is related to the covariance matrix by .
This is a good choice of pdf since it includes the multivariate power exponential family, particularly the classical multivariate Gaussian and multivariate Laplace distributions when the parameter set ψk = [βk, λk, ηk] is set to and , respectively.
Minimizing (3) is equivalent to maximizing the (log−) likelihood of yk. In the following, we estimate from the data. This is appealing because the sample average is readily available and can be conveniently combined with W to produce an approximation of for substitution in Dk. This simple choice permits the reparameterization of as a function of W, specifically .
Two well-conceived dispersion matrix parameter choices are proposed for the Kotz distribution, one emphasizing invariance to source scales and the other not, resulting in two useful objective functions. Firstly, we let Yk = PkWX and use n to index each of the N observations used in the sample mean approximation of the expected value in (3). Secondly, based on the log-likelihood ln p(yk), we define , , and . Then, we let for the standard scale invariant case:
| (5) |
where
with gradient given by:
| (6) |
where ik represents all source indices (rows of ∇Ĭ(W)m) assigned to subspace k, ◦ is the Hadamard product, and
For the scale-controlled approach, we let , and the correlation matrix , and . In this case, only correlations are estimated from the data, while variances are fixed at αk. The advantage of this choice is that it controls the scale of the sources rather than letting them be arbitrarily large/small.
In the scale-controlled case, Ĭ(y) is identical to (5), except , and , with gradient:
| (7) |
where
While the equations presented above are general and support any choice of subspace specific parameters ψk, in the examples presented here, we opted to use the same set ψk = ψL for all subspaces, modeling subspaces as multivariate Laplace distributions with correlation estimation. The derivation of the gradients can be found in supplemental material along with a description of the relative gradient update ∇Ĭ(W)W⊤W [2, Ch. 4] [61] we used together with the L-BFGS algorithm with bounds (L-BFGS-B) [62], [63] available in the nonlinear constraint optimization function fmincon of MATLAB’s Optimization Toolbox. Nonlinear constraints such as those shown next can be easily incorporated in fmincon’s interior-point barrier method [64, Ch. 19] [65].
C. Pseudoinverse Reconstruction Error
In the overdetermined case, i.e., when Vm > Cm and W is wide, it is necessary to constrain W in order to evade ill-conditioned solutions. The error incurred by W in reconstructing the data samples can indirectly guide and constrain W. The mean squared error (MSE) between x and gives the following formulation of the reconstruction error (RE):
| (8) |
Firstly, the optimal linear estimator of x based on y for a system with estimation error e’, such as y = Wx + e’, is Ây, where  is the minimizer of MSE:
| (9) |
and Σx is the data covariance. In the high SNR regime, diag (WΣxW⊤) ≫ diag (Σe’) element-wise and, as discussed in [66], yields
| (10) |
This choice of  always minimizes the error no matter how far W is from the true W⋆ and serves little as a constraint.
Assuming unit source variances and data whitened such that , in ICA problems W⋆ must be row orthonormal, i.e., . Our previous work [12] utilized  = W⊤ to reconstruct x as instead. Under the whitening assumption, this can be implemented in (8) as a soft regularizer provably equivalent to regularization by either the Frobenius norm or , when the regularizer constant approaches infinity [67]. Therefore, this approach effectively penalizes non-orthogonal W.
Here, our investigation of the singular value decomposition (SVD) of W reveals that, if the matrix has orthonormal rows, then its singular values are all 1 and W = USV⊤ = UV⊤, where S = I, U are the left singular vectors of W, and V its right singular vectors. Therefore, W⊤W = VU⊤UV⊤ = VV⊤. Since W is wide, V is tall, which implies VV⊤ ≠ I, in general. Thus, using , the RE simplifies as:
| (11) |
This clearly shows that RE with  = W⊤ implicitly acts as a constraint on the right singular vectors of W, selecting those whose outer product approximates the identity matrix I.
If not orthonormal, W⊤W = VS2V⊤ since S ≠ I. Thus, we propose to use the pseudoinverse W− = W⊤(WW⊤)−1 in lieu of W⊤, with . Then, this pseudoinverse RE (PRE) (E−) also simplifies as (11). This result follows from the SVD of the pseudoinverse W− = VS−1U⊤ and W−W = VS−1U⊤USV⊤ = VV⊤. Unlike before, this formulation effectively constrains V in the general case. Note that since Σx = I in the case of white data, the optimal estimator (10) simplifies to  = W⊤ (WW⊤)−1 = W−, i.e., the pseudoinverse gives the least error when the data is white (if the SNR is high), regardless of the values contained in W. Thus, for white data, we conclude that the RE formulation (E⊤) is more appropriate than PRE (E−). Our experience, however, suggests that W is far more likely non-orthogonal in real noisy, non-white data, justifying our preference for E−.
Furthermore, we introduce a normalization term, dividing E− by xnorm, the average power in the data, and we get the proportion of power missed:
| (12) |
where . Its gradient has the form:
| (13) |
where
Since X and W are block-diagonal, these operations can be computed separately on each dataset by replacing X with Xm and W with Wm. This can be used both as a data reduction approach or a nonlinear constraint for optimization.
Finally, in MDU problems, when there is prior knowledge supporting linear dependence (i.e., correlation) within subspaces, then one useful and popular approach is to use group PCA projection to initialize all blocks of W [68]. It works by performing a single data reduction step on datasets concatenated along the V dimension. We have investigated this approach in a separate work [69], offering efficient algorithms to enable this procedure when the number of datasets is very large (M > 10000). For comparison purposes, we also considered the use of group PCA (gPCA) as an alternate initialization approach for W in our experiments.
D. MISA with Greedy Permutations (SDM Case)
We present a greedy optimization approach to counter local minima resulting from arbitrary source permutations. To illustrate, consider a single dataset and assume Pk is a user-specified prior. Using abbreviated notation throughout, suppose P1 = [1 1 1 0 0] and P2 = [0 0 0 1 1] define a partitioning of five sources into two subspaces: p(y) = p(yk=1)p(yk=2) = p(y1, y2, y3)p(y4, y5), where p(·) is a joint pdf. It would be equally acceptable if the data supported either p(y) = p(y4, y5)p(y1, y2, y3) (entire subspace permutation) or p(y) = p(y1, y3, y2)p(y5, y4) (within-subspace permutation) or even some combination of these two cases. However, if the data supported p(y) = p(y1, y4)p(y2, y3, y5), then that would not be equivalently acceptable, denoting a local minimum.
When these occur, the numerical optimization in Section IV-B stops early, at the newly found local minimum. At that point, we propose to check whether another permutation of sources would attain a lower objective value. This entails two challenges: 1) given the combinatorial nature of the task, even mild numbers of sources lead to huge numbers of candidate permutations, and 2) when the optimization stops early, most sources are still mixed and there is not enough refinement to establish which sources are dependent and belong in the same subspace. The low refinement precludes the combinatorial problem since it hinders the ability to distinguish between dependent and independent sources in the first place.

Firstly, therefore, we propose to transform the single-dataset multidimensional (SDM) ISA task into single-dataset unidimensional (SDU) ICA. We do that by temporarily voiding and replacing subspaces of size dk ≥ 2 by multiple sources (each with dk = 1), and then restarting the numerical optimization from the current W estimate (local minimum). This pushes all sources towards being independent from each other. However, dependent sources will only be as independent as possible and will retain some of their dependence. Partly motivated by [59], this approach secures enough refinement to distinguish among subspaces. Thus, given sources that are as independent as possible, we propose a greedy search for any residual dependence among them. The greedy solution is valid because the specific ordering within subspaces is irrelevant. Unlike [59], our approach does not require accessory objective functions to detect dependent sources. Instead, it uses the same scale invariant objective defined in (5).
The procedure is 1) switch to the ICA model (effectively, make P = I), 2) numerically optimize it, 3) reassign sources into subspaces one at a time. In the latter, as indicated in Algorithm 1 (GP), each source is assigned sequentially to each subspace (if two or more are assigned to the same subspace, they are reassigned together thereafter). Thus, the model changes with every assignment, and simple evaluation of the objective cost(·) (without numerical optimization) produces a value for each particular assignment. The scale invariant formulation ensures source variances do not influence the estimation. The assignment minimizing the objective function determines to which subspace a source belongs. Here, assume that k = K + 1 inserts one more row in P for a new subspace; [:, p] are the contents of columns indexed by p (conversely for rows); find(·) recovers the indexes of all non-zero elements; remove_empty_rows(P) removes rows from P containing only zero entries; eps is the machine’s precision.

After repeating this procedure for all sources, in an attempt to solve the original model, we order the identified subspaces so as to match the original prescribed subspace structure P as closely as possible. This final sorting (match(·)) defines a specific permutation of the sources, which we then use to reorder the rows of the local minimum solution W for the original ISA problem, effectively moving that solution out of the local minimum. After that, we resume the numerical optimization of the original ISA problem until another minimum is found. In our experiments, repeating this procedure just twice in a row (T = 2) and taking the best out of three solutions sufficed to drastically improve results. In Algorithm 2 (MISA-GPSDM), MISA(·) represents the numerical optimization (Section IV-B).
A direct benefit of this approach is that more dependence tends to be retained within subspaces as compared to [59]. That is a desirable property because it leaves room for further post-processing and investigation. Another advantage of our approach is that it can match source assignments to user-prescribed subspace priors (P) when they are available.
E. MISA with Greedy Permutations (MDM Case)
The previous approach addresses cross-subspace interference issues due to incorrect allocation of the sources and, therefore is appropriate for SDM problems. However, it is not sufficient to perform such procedure in MDM problems since ambiguities may also occur at the subspace level, i.e., incorrect allocation of the dataset-specific subspaces.
Consider the following example for a model with three subspaces spanning two datasets, each dataset containing five sources. Assume the correct assignment of sources is as follows: p1(y11, y21, y22)p2(y12, y13, y23)p3(y14, y15, y24, y25), where the notation ymi refers to source i from dataset m, and pk(·) is the joint pdf of subspace k. Since MISA-GPSDM is designed for single datasets, at best, it produces p1(y11)p2(y12, y13)p3(y14, y15) for m = 1 and p1(y21, y22)p2(y23)p3(y24, y25) for m = 2. Then, from a global perspective, these solutions would yield the correct subspace assignment above, thus solving the MDM problem. However, it is equally acceptable for SDM solvers to produce either p1(y11)p2(y14, y15)p3(y12, y13) for m = 1 or p1(y24, y25)p2(y23)p3(y21, y22) for m = 2 if the datasets are evaluated separately (notice the bold subscripts). Together they imply p1(y11, y24, y25)p2(y14, y15, y23)p3(y12, y13, y21, y22), which does not match the correct assignment and, thus, fails to produce a solution for the MDM problem. What we have illustrated here is that within-dataset permutations of equal-sized subspaces may induce mismatches across datasets if the datasets are processed separately. Another complicating factor are subspaces absent from a particular dataset.

Borrowing from the ideas in Section (IV-D), we propose three approaches to address these issues. The first, extends the greedy search to all datasets by sequentially assigning each source (in every dataset) to every subspace and accepting the assignments that reduce the objective function. This would yield a complexity of at least , and in the (unlikely) worst case of . The second, processes each dataset separately (as in the previous example) and then applies the same greedy strategy at the level of subspaces instead. Effectively, this approach cycles through each subspace sequentially, trying to determine which of them can be combined to form a larger subspace. This yields a complexity of O(CmKM) + O(K2M). The final approach is to test all possible permutations of subspaces with the same size, after processing each dataset separately, which yields O((K!)M). While this can quickly become computationally prohibitive, it can also identify better solutions since it evaluates all subspace permutations of interest. In this work, we elected to use the third approach when the number of sources is small and the second when that number becomes larger (subspace_perm(·)). Full procedures are indicated in Algorithm 3 (MISA-GP).
V. Results
We present results on multiple experiments satisfying the requisites outlined in [22], including a summary of various controlled simulations on carefully crafted synthetic data, as well as hybrid data and comparisons with several algorithms.
A. General Simulation Setup and Evaluation
In the following, we consider the problem of identifying statistically independent subspaces. Thus, in all experiments, each subspace yk is a random sample with N observations from Laplace distribution. Subspace observations are linearly mixed via a random A as x = Ay + e, where e is additive sensor white noise. A is generated from a standard Gaussian distribution. Its singular values are then adjusted to yield the condition number cond (A) prescribed in Table III. Also, the white Gaussian noise e (zero mean and unit variance) is multiplied by a scalar value in order to attain the SNR prescribed in Table III. The SNR is the power ratio between the SNRdB noisy signal x and the noise e. The equality permits decibel (dB) specifications.
TABLE III.
Summary of simulation results. (a, b) Median (over 10 dataset instances) of best MISI (over 10 initializations per dataset). (c, d) Median MISI (over 10 initializations, 1 dataset instance).
| SNRdB | 30 | 10 | 3 | 0.46 | 0.0043 | |
|---|---|---|---|---|---|---|
| ICA1 | PRE+Infomax | 0.0222 | 0.0292 | 0.0685 | 0.0928 | 0.2576 |
| PRE+MISA | 0.0145 | 0.0165 | 0.0261 | 0.1932 | 0.2743 | |
| IVA2 | PRE+IVA-L | 0.0111 | 0.0136 | 0.0197 | 0.0277 | 0.5158 |
| PRE+MISA | 0.0059 | 0.0088 | 0.0113 | 0.0151 | 0.0338 | |
| gPCA+IVA-L | 0.0081 | 0.0090 | 0.0095 | 0.0094 | 0.1271 | |
| gPCA+MISA | 0.0044 | 0.0045 | 0.0049 | 0.0065 | 0.0205 | |
| ISA3 | PRE+JBD-SOS | 0.2700 | 0.2804 | 0.2996 | 0.3255 | 0.3712 |
| PRE+MISA | 0.1153 | 0.1275 | 0.1320 | 0.1495 | 0.3404 | |
| PRE+MISA-GP | 0.0366 | 0.0670 | 0.0794 | 0.1140 | 0.3404 | |
| (a) Varying SNRdB, Fixed cond(A) = 7 | ||||||
| cond(A) | 1 | 3 | 7 | 15 | |
|---|---|---|---|---|---|
| ICA1 | PRE+Infomax | 0.0804 | 0.0188 | 0.0216 | 0.0493 |
| PRE+MISA | 0.1934 | 0.0148 | 0.0161 | 0.0267 | |
| IVA2 | PRE+IVA-L | 0.1923 | 0.1013 | 0.0749 | 0.0505 |
| PRE+MISA | 0.0052 | 0.0045 | 0.0049 | 0.0078 | |
| gPCA+IVA-L | 0.0090 | 0.0086 | 0.0092 | 0.0095 | |
| gPCA+MISA | 0.0052 | 0.0045 | 0.0049 | 0.0067 | |
| ISA3 | PRE+JBD-SOS | 0.2905 | 0.2792 | 0.2815 | 0.2962 |
| PRE+MISA | 0.1008 | 0.1065 | 0.1202 | 0.1351 | |
| PRE+MISA-GP | 0.0395 | 0.0330 | 0.0612 | 0.0743 | |
| (b) Fixed SNRdB = 3, Varying cond(A) | |||||
| ρk,max | 0 | 0.1 | 0.23 | 0.39 | 0.5 | 0.65 |
|---|---|---|---|---|---|---|
| IVA-GL | 0.4767 | 0.0361 | 0.0114 | 0.0199 | 0.0184 | 0.0186 |
| MISA | 0.0273 | 0.0098 | 0.0072 | 0.0062 | 0.0061 | 0.0049 |
| (c) IVA1: Increasing max. subspace correlation ρk,max | ||||||
| ISA1 (ρk = 0) | ISA2 (ρk > 0.2) | |||
|---|---|---|---|---|
| dk = k | dk = 4 | dk = k | dk = 4 | |
| EST_ISA | – | .0.7557 | – | 0.7766 |
| JBD-SOS | 0.2600 | 0.3496 | 0.2826 | 0.3739 |
| MISA | 0.0239 | 0.0162 | 0.0369 | 0.0326 |
| (d) Varying vs Fixed subspace dimensionality dk | ||||
The quality of results is evaluated using the normalized multidataset Moreau-Amari intersymbol interference (MISI) (14), which extends the ISI [70], [71] to multiple datasets.
| (14) |
where H is a matrix with elements , with (i, j) = 1…K, i.e., the sum of absolute values from all elements of the interference matrix ŴA corresponding to subspaces i and j, and Ŵ is the solution being evaluated.
For fairness, all algorithms are initialized with the same W0. See optimization parameters in supplemental material.
B. Summary of Synthetic Data Simulations
The performance of MISA in a series of synthetic data experiments with different properties is summarized below (Table III). Complete details are available as supplemental material online.
ICA 1 : effects of additive noise (a) and condition number (b) are assessed in a moderately large ICA problem (, M = 1) with rectangular mixing matrix A at a fairly small sample size regime (N = 3500). Under low SNR (b), MISA outperforms Infomax when cond (A) ≠ 1. At high SNR (a), MISA outperforms Infomax more often than not.
IVA 1 (Vm < N, Vm = Cm): MISA performance is assessed in an IVA problem (c), in which subspaces span all of M = 10 datasets. Specifically, we study the case when no data reduction is required (i.e., Vm = Cm = 16), noise is absent, and observations are abundant (N = 32968). The striking feature observed here is that the performance of IVA-GL [21] is much more variable than that from MISA, especially with high correlation within the subspaces. MISA performs well even at low within-subspace correlation levels and is highly stable when these correlations are larger than 0.2.
IVA 2 (Vm < N): Effects of additive noise (a) and condition number (b) are assessed in a larger IVA problem (Cm = 75, M = 16) with rectangular mixing matrix A (Vm = 250) and an abundant number of observations N = 32968. Data reduction with either group PCA (gPCA) or pseudoinverse RE (PRE) produced equivalent results in this large N scenario. Under low SNR, increasing the condition number had a fairly small detrimental effect on the performance of both IVA-L [18] and MISA. More importantly, while both IVA-L and MISA performed very well at mild-to-high SNR levels, the performance of MISA on extremely noisy scenarios (SNRdB = 0.0043) is remarkable (0.1 < MISI < 0.01), irrespective of using PRE or gPCA.
ISA 1 and 2 (): MISA performance is assessed in ISA problems (d), in which subspaces are multidimensional, with M = 1. Specifically, we study the case when no data reduction is required (i.e., ), noise is absent, and the number of observations N is abundant. Fixed and varying configurations of K = 7 subspaces are considered, at two subspace correlation ρk settings. The striking feature observed here is that the performance of both JBD-SOS [23] and EST_ISA [24] is very poor in all cases, even when within-subspace correlations are present. MISA-GP is the only method with good performance, highlighting the large benefit of our approach for evasion of local minima.
ISA 3 : Effects of additive noise (a) and condition number (b) are assessed in a mildly large ISA problem (, M = 1) with variable subspace dimensionalities dk, rectangular mixing matrix A at a fairly small sample size regime (N = 5250). Under a challenging SNR, JBD-SOS and MISA fail in virtually all cases (MISI > 0.1). Inclusion of combinatorial optimization enables MISA-GP to perform quite well at mild-to-high SNR levels (SNRdB ≥ 3).
Execution times for Table III (a–b) are reported in Table IV. The timings were recorded on a Linux server (Ubuntu 16.04) with an Intel Xeon E5–2630v4 (10-core, 20-thread, 3.5GHz) CPU, 256GB RAM (DDR4, 2.4GHz). The code was executed in native Matlab without any optimizations.
TABLE IV.
Timing summary. (a, b) Mean (over 10 dataset instances) of median time (over 10 initializations per dataset). Times are reported in seconds.
| SNRdB | 30 | 10 | 3 | 0.46 | 0.0043 | |
|---|---|---|---|---|---|---|
| ICA1 | PRE+Infomax | 22.6 | 20.5 | 4.4 | 3.3 | 2.7 |
| PRE+MISA | 743.3 | 1392.3 | 2397.5 | 1543.3 | 52.0 | |
| IVA2 | PRE+IVA-L | 3128.7 | 3230.1 | 3094.7 | 3794.6 | 3375.4 |
| PRE+MISA | 11645 | 11087 | 9391.6 | 11323 | 8134.5 | |
| gPCA+IVA-L | 2352.3 | 2170.1 | 2080.5 | 2369.5 | 3167.1 | |
| gPCA+MISA | 4610.2 | 4432.4 | 4408.5 | 5068.8 | 6354.5 | |
| ISA3 | PRE+JBD-SOS | 2965.3 | 3007.2 | 3088.1 | 3016.6 | 2934.2 |
| PRE+MISA | 222.8 | 503.5 | 729.4 | 897.6 | 655.8 | |
| PRE+MISA-GP | 2811.4 | 3479.7 | 3998.7 | 4290.0 | 1429.1 | |
| (a) Varying SNRdB, Fixed cond(A) = 7 | ||||||
| cond(A) | 1 | 3 | 7 | 15 | |
|---|---|---|---|---|---|
| ICA1 | PRE+Infomax | 4.2 | 25.1 | 25.1 | 7.2 |
| pre+misa | 1582.8 | 391.9 | 740.1 | 1889.0 | |
| IVA2 | PRE+IVA-L | 2120.3 | 1784.0 | 1929.4 | 2376.6 |
| pre+misa | 4112.9 | 3895.4 | 4238.5 | 6205.9 | |
| gPCA+IVA-L | 2803.7 | 2393.1 | 2318.0 | 2548.4 | |
| gPCA+MISA | 5689.4 | 4635.0 | 4544.7 | 5267.6 | |
| ISA3 | PRE+JBD-SOS | 2999.9 | 2935.2 | 2934.6 | 3021.1 |
| pre+misa | 1145.7 | 483.6 | 780.3 | 1012.5 | |
| pre+misa-gp | 3307.3 | 1354.9 | 3413.6 | 3893.3 | |
| (b) Fixed SNRdB = 3, Varying cond(A) | |||||
The timings are higher in Table IV (a) than in Table IV (b) for PRE-based ICA1 and IVA2 experiments. This is consistent with a corresponding MISI reduction, which was due to a less strict stopping condition for the PRE gradient norm. This suggests that allowing more noise to leak from the PRE step not only yields poorer MISI performance but also significantly slows down convergence (about 3–4 times slower than comparable experiments in Table IV (b)).
In ICA1, Infomax is 1–2 orders of magnitude faster, owing to its inherently different stochastic optimization strategy and gradient implementation, which is optimized for a single dataset. The difference, however, is not due to a difference in algorithmic complexity. Importantly, Infomax is limited and cannot generalize beyond SDU problems like MISA.
In IVA2, MISA takes at least twice as long to converge than IVA-L but attains better results in terms of MISI. Note that the maximum number of iterations in IVA-L was set to four times the total number of iterations until convergence for MISA on the same problem, from the same starting point.
In ISA3, MISA-GP timings are comparable to those of JBD-SOS. However, MISA-GP attains about one order of magnitude better results in terms of MISI.
Overall, the reported timings support that the computational cost of MISA is tractable, especially given it enables universal application to different problems.
C. Hybrid Data Experiments
We present three major results on novel applications of BSS to brain image analysis, open sourcing realistic hybrid data standards (https://github.com/rsilva8/MISA) that test estimation limits at small sample size. The first pushes the conditions of experiment ICA 1 and emulates a single-subject temporal ICA of functional MRI (fMRI). The second investigates the use of IVA with Vm > N for multimodal fusion of brain MRI-derived data. Finally, the last experiment evaluates the value of MDM models without data reduction for fusion of functional MRI (fMRI) and EEG neural signals.
Given the real features from prior publications utilized here, our experiments indeed reflect the usual size of fMRI, sMRI, and EEG datasets in neuroimaging multimodal fusion. Typically, studies combine 2–4 modalities (here, 2–3) with intrinsic dimensionality Vm of 15k-300k voxels, and 600 timepoints. The last example also illustrates how MISA can recover sources even without data reduction of the Vm dimension. Moreover, we illustrate source estimation with 600–1000 subjects, which is 3–10 times bigger than typical multimodal fusion datasets. Furthermore, the typical number sources in multimodal fusion ranges from 4 to 30 (our experiments are 4 to 20). Lastly, to the best of our knowledge, no other work has attempted general subspace estimation (dmk > 1) in multimodal fusion, which is feasible with MISA, as we demonstrate in our last experiment.
1). Single-Subject Temporal ICA of fMRI:
Here we consider temporal ICA of fast acquisition fMRI. The dimensionality of the data is and N = time points ≈ 1300. In order to better assess the performance of MISA in a realistic scenario, we propose to set the mixing matrix A as the real part of the data. First, we let sources. Then, A must be a 60k × 20 matrix. In order to have it correspond to real data, we assign to it the first twenty well-established aggregate spatial maps (3D volumes) published in [72].
For the synthetic part of the data, we propose to simulate a 20 × 1334 matrix of timecourses y by generating realistic autocorrelated samples that mimic observed fMRI timecourses to a good extent. Sampling 20 such timecourses that retain independence with respect to each other is challenging because independently sampled autocorrelated time series tend to be correlated with one another. Building on the simulation principles outlined in [22], we seek to avoid randomly correlated timecourses (sources) in order to prevent mismatches to the underlying ICA model we wish to test. In the same spirit, we also wish to have sources sampled from the same distribution used in the model, here a Laplace distribution. We developed the following steps in order to meet all these requirements:
Design a joint autocorrelation matrix Ryy for all sources. For the example above, this means a block-diagonal correlation matrix with blocks of size N × N. Each block is designed with an exponentially decaying autocorrelation function with an autocorrelation around 0.85 between time point n and n − 1, and around 0.2 between n and n − 10. This structure retains autocorrelation within each N-long section of an observation while retaining uncorrelation/independence among sections.
Generate 50k -dimensional observations using a Gaussian copula [73] and the autocorrelation matrix Ryy from step 1. Using copulas enables transformation of the marginal distributions while retaining their correlation/dependence.
For each of the 50k copula-sampled observations, transform the sample into a Laplace distribution.
For each of the 50k transformed -dimensional observations, reshape them into a matrix and compute the resulting correlation matrix.
Compute the median correlation matrix over the 50k observed Ry.
Retain the transformed observation whose Ry is closest to and reject the rest.
This type of rejection sampling effectively produces the desired outcome. Finally, Gaussian noise is added to the mixture for a low SNRdB = 3. The condition number of A was 4.59.
In the results, the data was reduced using PRE and then processed with MISA to obtain independent timecourses. The correlation between ground-truth (GT) and PRE+MISA spatial map estimates (RM) is presented in Fig. 4, and the spatial maps (estimating A from Ŵ−) in Fig. 5. MISI = 0.0365.
Fig. 4. Correlation with the ground-truth (hybrid temporal ICA).

The correlation between the spatial map estimates from MISA with PRE (RM) and the ground-truth (GT) is very high with little residual similarity across sources, suggesting the analysis was successful.
Fig. 5. Side-by-side comparison with the ground-truth (hybrid temporal ICA).

The clear resemblance to the ground-truth maps suggests a successful recovery of the mixing matrix A. The sample correlation r is shown below each matched pair. Maps are sorted from highest to lowest correlation.
2). Multimodal IVA of sMRI, fMRI, and FA:
In this multimodal fusion of structural MRI (sMRI), fMRI, and Fractional Anisotropy (FA) diffusion MRI data, the dimensionalities are V1 = voxels ≈ 300k, V2 = voxels ≈ 67k, V3 = voxels ≈ 15k, respectively, and N = subjects = 600 (each modality measured on the same subject). We pursue a hybrid setting where only the mixing matrices Am are taken from real datasets to overcome typically small N in patient population studies. First, we let Cm = 20 sources in each dataset. Then, A1, A2, and A3 must be 300k×20, 67k×20, and 15k×20, respectively. To each, we assign the first twenty aggregate 3D spatial maps published in [74], [72], [75], respectively.
For the simulated part of the data, we generate three 20×600 matrices of subject expression levels y. K = 20 subspaces, each with dk = 3 and N = 600 observations, were sampled independently from a Gaussian copula, using an inverse exponential autocorrelation function with maximal correlation varying from 0.65 to 0.85 for each subspace. These were transformed to Laplace distribution marginals (not multivariate Laplace) so as to induce a controlled mismatch between the data (only SOS dependence) and the model subspace distributions (multivariate Laplace—all-order dependence). Finally, Gaussian noise was added separately in each dataset for a low SNRdB = 3. The condition numbers of A1, A2, and A3 were 1.52, 4.59, 1.63, respectively.
In the results, the data was reduced using PRE and then processed with MISA to obtain independent subject expression levels. Per-modality correlation between ground-truth and PRE+MISA spatial maps are presented in Fig. 6, and spatial maps (estimating A from Ŵ−) in Fig. 7. MISI = 0.0273.
Fig. 6. Correlation with the ground-truth (multimodal IVA).

The correlation between the spatial map estimates from MISA with PRE (RM) and the ground-truth (GT) is very high in all modalities, with little residual similarity across sources, suggesting the analysis was successful.
Fig. 7. Summary of multimodal IVA maps.

In each panel, ground-truth (GT) maps are presented on the left and maps estimated from MISA with PRE (RM) on the right. Each subspace represents the multimodal set of maps (joint features) with highest, median, and minimum correlation with the GT, from top to bottom, respectively. No IVA-L comparison available since it failed to converge, likely due to the small sample size (N = 600) or inability to detect SOS dependence.
3). Multimodal MISA of fMRI, and EEG:
We show the value of MDM models without data reduction for fusion of EEG event-related potentials (ERP) and fMRI datasets, with dimensionality V1 = time points ≈ 600, V2 = voxels ≈ 67k, respectively, and N = subjects = 1001. Let C1 = 4 and C2 = 6 sources in the ERP and fMRI datasets, respectively, organized into K = 4 subspaces (ymi represents source i from dataset m):
Utilizing real spatial maps and timecourses, A1 and A2 must be 600 × 4 and 67k × 6, respectively, this time ensuring they form column-orthogonal mixings (with Gram-Schmidt).
For the simulated part of the data, we generate 4×1001 and 6 × 1001 matrices of subject expression levels for ERP and fMRI datasets, respectively. A total of K = 4 dk-dimensional subspaces with N = 600 observations each were sampled from a multivariate Laplace distribution, using an inverse exponential autocorrelation function with maximal correlation of 0.65 for each subspace. Noise was absent in both datasets. The condition number was 1.00 for both A1 and A2.
Fig. 8 shows the results obtained from constrained MISA-GP, i.e., with  = W⊤ RE constraint using (8). No data reduction was performed on the data. The spatial fMRI maps and ERP timecourses were produced by estimating A from Ŵ⊤. Since subspace independence is invariant to linear transformations (arbitrary basis) within any subspace [17], the estimation yields timecourses (red) and maps (middle) that do not match the GT exactly. In an attempt to correct for that, we performed additional within-modality ICAs on the columns of Am corresponding to subspaces. This effectively selected for a particular basis within each subspace (right maps and cyan timecourses). The ability to choose a particular representation demonstrates the kinds of post-processing enabled by MDM models. Overall, this result validates and illustrates the benefit of a constrained optimization approach.
Fig. 8. Multimodal MISA of fMRI and ERP.

GT maps are presented on the left of each panel, MISA-GP estimates in the middle, and corrected MISA-GP estimates on the right. GT ERPs are presented in blue, MISA-GP ERPs in dashed red, and corrected MISA-GP ERPs in dashed cyan.
VI. Conclusion
We have presented MISA, an approach that solves multiple BSS problems (including ICA, IVA, ISA, and more) under the same framework, with remarkable performance and improved robustness even at low SNR. In particular, we have derived a general formulation that controls for source scales, leveraging the flexible Kotz distribution in an interior point non-linear constraint optimization, with PRE as a general and flexible formulation for either direct subspace estimation or dimensionality reduction, in conjunction with combinatorial optimization for evasion of local minima, permitting self-correction to the closest subspace structures supported by the data (MISA-GP). Altogether, the proposed methods permit all-order statistics linkage across multidatasets as well as features of higher complexity to be identified and fully exploited in a direct, principled, and synergistic way, even at sample sizes as low as N = 600.
Flexible approaches like MISA are key to meet the growing complexity of multidataset tasks. These complexities are incorporated in the hybrid dataset standards we open source here, built from relevant results published in the brain imaging BSS literature. Generalizations building on this work could be easily developed exploring other divergence families. Future work will focus on compiling real multimodal datasets to validate MISA’s ability to capture reliable modes of shared and unique variability across and within modalities.
It is also worth noting the natural trade-off that exists between flexibility and complexity. In practice, given some problem specification and prior information, a dedicated algorithm offers the simplest solution. However, the lack of flexibility therein often limits its utility to explore different scenarios. Our work considers a more general case, where one general solution is easily simplified by taking the domain information into account for a given problem. The complexity is unchanged in comparison to a dedicated algorithm. But the general algorithm makes it very easy to switch between models and explore different solutions.
We suggest that further optimization for computational efficiency is certainly possible.
Supplementary Material
Acknowledgments
This work was primarily funded by NIH grant R01EB005846 and NIH NIGMS Center of Biomedical Research Excellent (COBRE) grant 5P20RR021938/P20GM103472 to Vince Calhoun (PI), NSF 1539067, NSF 1631838, and NSF-CCF 1618551.
Biographies

Rogers F. Silva (Member, IEEE) received the B.Sc. degree in Electrical Engineering in 2003 from the Catholic University (PUCRS), Porto Alegre, Brazil, the M.S. degree in Computer Engineering (with minors in Statistics and in Mathematics) in 2011, and the Ph.D. degree in Computer Engineering with distinction in 2017, both from The University of New Mexico, Albuquerque, NM, USA. He is currently a Research Scientist at the tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS) with Georgia State University, Georgia Institute of Technology, and Emory University. Previously, he was a Postdoctoral Fellow at The Mind Research Network, a Data Scientist with Datalytic Solutions, and worked as an engineer, lecturer, and consultant. As a multidisciplinary scientist, he develops algorithms for statistical and machine learning, image analysis, numerical optimization, memory efficient large scale data reduction, and distributed analyses, focusing on multimodal, multi-subject neuroimaging data from thousands of subjects. His interests are: multimodal data fusion, statistical and machine learning, image, video and data analysis, multiobjective, combinatorial and constrained optimization, signal processing, and neuroimaging.

Sergey M. Plis received the Ph.D. degree in Computer Science in 2007 from The University of New Mexico, Albuquerque, NM, USA. Currently, he is an associate professor of Computer Science at the Georgia State University, and the Director of Machine Learning core at the tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS). His research interests lie in developing novel and applying existing techniques and approaches to analyzing large scale datasets in multimodal brain imaging and other domains. He develops tools that fall within the fields of machine learning and data science. One of his key goals is to take advantage of the strengths of imaging modalities and infer structure and patterns that are hard to obtain non-invasively and/or that are unavailable for direct observation. In the long term this amounts to developing methods capable of revealing mechanisms used by the brain to form task-specific transient interaction networks and their cognition-inducing interactions via multimodal fusion at features and interaction levels. His ongoing work is focused on inferring multimodal probabilistic and causal descriptions of these function-induced networks based on fusion of fast and slow imaging modalities. This includes feature estimation via deep learning-based pattern recognition and learning causal graphical models.

Tülay Adalı (Fellow, IEEE) received the Ph.D. degree in Electrical Engineering from North Carolina State University, Raleigh, NC, USA, in 1992 and joined the faculty at the University of Maryland Baltimore County (UMBC), Baltimore, MD, the same year. She is currently a Distinguished University Professor in the Department of Computer Science and Electrical Engineering at UMBC.
She has been active in conference and workshop organizations. She was the general or technical co-chair of the IEEE Machine Learning for Signal Processing (MLSP) and Neural Networks for Signal Processing Workshops 2001–2008, and helped organize a number of conferences including the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). She has served or currently serving on numerous editorial boards and technical committees of the IEEE Signal Processing Society. She was the Chair of the MLSP Technical Committee , 2003–2005 and 2011–2013, the Technical Program Co-Chair for ICASSP 2017, Special Sessions Chair for ICASSP 2018 and ICASSP 2024. She is currently serving as the Vice President for Technical Directions of the IEEE Signal Processing Society.
Prof. Adali is a Fellow of the IEEE and the AIMBE, a Fulbright Scholar, and an IEEE Signal Processing Society Distinguished Lecturer. She is the recipient of a 2020 Humboldt Research Award, 2010 IEEE Signal Processing Society Best Paper Award, 2013 University System of Maryland Regents’ Award for Research, and an NSF CAREER Award. Her current research interests are in the areas of statistical signal processing, machine learning, and their applications, with emphasis on medical image analysis and fusion.

Marios S. Pattichis (Senior Member, IEEE) received the B.Sc. degree (Hons.) in computer sciences, the B.A. degree (Hons.) in mathematics, the M.S. degree in electrical engineering, and the Ph.D. degree in computer engineering from The University of Texas at Austin, Austin, in 1991, 1993, and 1998, respectively. He is currently a Professor with the Department of Electrical and Computer Engineering, University of New Mexico (UNM), Albuquerque. His current research interests include digital image and video processing, video communications, dynamically reconfigurable computer architectures, biomedical, space, and educational image processing applications.
At UNM, he holds the Gardner Zemke Professorship in Teaching at ECE and he is a Fellow of the Center for Collaborative Research and Community Engagement with the College of Education and Human Sciences. He was a recipient of the 2016 Lawton-Ellis and the 2004 Distinguished Teaching Awards from the Department of Electrical and Computer Engineering, UNM. For his development of the digital logic design labs at UNM, he was recognized by the Xilinx Corporation, in 2003 and by the UNM School of Engineerings Harrison Faculty Excellent Award, in 2006. He was a Founding Co-PI of the Configurable Space Microsystems Innovations and Applications Center (COSMIAC), UNM, where he is currently the Director of ivPCL. He was the General Chair of the 2008 IEEE Southwest Symposium on Image Analysis and Interpretation (SSIAI). He was general Co-Chair of the 2020 IEEE SSIAI. He is currently a Senior Associate Editor for the IEEE Transactions on Image Processing. He has also served as Senior Associate Editor for the IEEE Signal Processing Letters. He has been an Associate Editor of the IEEE Transactions on Image Processing, the IEEE Transactions on Industrial Informatics. He has served as a Guest Associate Editor for the IEEE Transactions on Information Technology in Biomedicine.

Vince D. Calhoun (Fellow, IEEE) received the B.S. degree in electrical engineering from the University of Kansas, Lawrence, KS, USA, in 1991, the M.S. degrees in biomedical engineering and information systems from The Johns Hopkins University, Baltimore, MD, USA, in 1993 and 1996, respectively, and the Ph.D. degree in electrical engineering from the University of Maryland Baltimore County, Baltimore, in 2002.
Dr. Calhoun is the founding director of the tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS) and a Georgia Research Alliance eminent scholar in brain health and image analysis where he holds appointments at Georgia State University, Georgia Institute of Technology and Emory University. He was previously the President of the Mind Research Network and Distinguished Professor of Electrical and Computer Engineering at the University of New Mexico. He is the author of more than 800 full journal articles and over 850 technical reports, abstracts and conference proceedings. His work includes the development of flexible methods to analyze functional magnetic resonance imaging data such as independent component analysis (ICA), deep learning for neuroimaging, data fusion of multimodal imaging and genetics data, neuroinformatics tools, and the identification of biomarkers for disease. His research is funded by the NIH and NSF among other funding agencies. Dr. Calhoun is a fellow of the Institute of Electrical and Electronic Engineers, The American Association for the Advancement of Science, The American Institute of Biomedical and Medical Engineers, The American College of Neuropsychopharmacology, and the International Society of Magnetic Resonance in Medicine. He served as the chair for the Organization for Human Brain Mapping from 2018–2019, and is a past chair of the IEEE Machine Learning for Signal Processing Technical Committee. He currently serves on the IEEE BISP Technical Committee and is also a member of the IEEE Data Science Initiative Steering Committee.
Footnotes
This paper has supplementary downloadable material available at http://ieeexplore.ieee.org. The material includes a document with additional details. Code is available at https://github.com/rsilva8/MISA. Data is available at https://github.com/rsilva8/MISA-data.
The subspace terminology stems from [17] in which the columns of A corresponding to yk form a linear (sub)space.
Strictly speaking, this line is only a portion of the entire hyper surface (polyhedron) of ambiguity.
Contributor Information
Tülay Adalı, Dept. of CSEE, University of Maryland Baltimore County, Baltimore, Maryland USA..
Marios S. Pattichis, Dept. of ECE at The University of New Mexico, NM USA..
Vince D. Calhoun, tri-institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, and Emory University, Atlanta, GA; The Mind Research Network, Albuquerque, NM USA.; Dept. of ECE at The University of New Mexico, NM USA.
References
- [1].Silva RF, Plis SM, Sui J, Pattichis MS, Adalı T, and Calhoun VD, “Blind source separation for unimodal and multimodal brain networks: A unifying framework for subspace modeling,” IEEE J Sel Topics Signal Process, vol. 10, no. 7, pp. 1134–1149, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Comon P and Jutten C, Handbook of Blind Source Separation, 1st ed. Oxford, UK: Academic Press, 2010. [Google Scholar]
- [3].Yi L, Dong N, Yun Y, Deng B, Ren D, Liu S, and Liang Y, “Chemometric methods in data processing of mass spectrometry-based metabolomics: A review,” Anal. Chim Acta, vol. 914, pp. 17–34, 2016. [DOI] [PubMed] [Google Scholar]
- [4].Saito S, Oishi K, and Furukawa T, “Convolutive Blind Source Separation Using an Iterative Least-Squares Algorithm for Non-Orthogonal Approximate Joint Diagonalization,” IEEE/ACM Trans Audio Speech Lang Process, vol. 23, no. 12, pp. 2434–2448, 2015. [Google Scholar]
- [5].Nielsen AA, “Multiset canonical correlations analysis and multispectral, truly multi-temporal remote sensing data,” IEEE Trans Image Process, vol. 11, no. 3, pp. 293–305, 2002. [DOI] [PubMed] [Google Scholar]
- [6].Ammanouil R, Ferrari A, Richard C, and Mary D, “Blind and Fully Constrained Unmixing of Hyperspectral Images,” IEEE Trans Image Process, vol. 23, no. 12, pp. 5510–5518, 2014. [DOI] [PubMed] [Google Scholar]
- [7].Calhoun VD, Liu J, and Adalı T, “A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data,” NeuroImage, vol. 45, no. 1, Supplement 1, pp. S163–S172, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Calhoun VD and Sui J, “Multimodal fusion of brain imaging data: A key to finding the missing link(s) in complex mental illness,” Biol Psychiatry Cogn Neurosci Neuroimag, vol. 1, no. 3, pp. 230–244, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Bhinge S, Levin-Schwartz Y, and Adali T, “Data-driven fusion of multi-camera video sequences: Application to abandoned object detection,” in Proc IEEE ICASSP 2017, 2017, pp. 1697–1701. [Google Scholar]
- [10].Nicolaou MA, Pavlovic V, and Pantic M, “Dyn. Probabilistic CCA Analysis of Affective Behavior and Fusion of Conts. Annotations,” IEEE Trans Pattern Anal Mach Intell, vol. 36, no. 7, pp. 1299–1311, 2014. [DOI] [PubMed] [Google Scholar]
- [11].Silva RF, Plis SM, Adalı T, and Calhoun VD, “Multidataset independent subspace analysis,” in Proc OHBM 2014, Hamburg, Germany, 2014, Poster 3506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Silva RF, Plis SM, Adalı T, and Calhoun VD, “Multidataset independent subspace analysis extends independent vector analysis,” in Proc IEEE ICIP 2014, France, 2014, pp. 2864–2868. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Silva RF, Plis SM, Pattichis MS, Adalı T, and Calhoun VD, “Incorporating second-order statistics in multidataset independent subspace analysis,” in Proc OHBM 2015, Honolulu, HI, 2015, Poster 3743. [Google Scholar]
- [14].Kotz S, “Multivariate distributions at a cross road,” in Proc NATO Advanced Study Institute, Statistical Distributions in Scientific Work. Calgary, Canada: Springer, 1974, pp. 247–270. [Google Scholar]
- [15].Silva RF and Plis SM, How to Integrate Data from Multiple Biological Layers in Mental Health? Springer, 2019, pp. 135–159. [Google Scholar]
- [16].Comon P, “Independent component analysis, a new concept?” Signal Process, vol. 36, no. 3, pp. 287–314, 1994. [Google Scholar]
- [17].Cardoso J-F, “Multidimensional independent component analysis,” in Proc IEEE ICASSP 1998, vol. 4, Seattle, WA, 1998, pp. 1941–1944. [Google Scholar]
- [18].Kim T, Eltoft T, and Lee T-W, “Independent vector analysis: An extension of ICA to multivariate components,” in Proc ICA 2006, Charleston, SC, 2006, vol. 3889, pp. 165–172. [Google Scholar]
- [19].Bell A and Sejnowski T, “An information-maximization approach to blind separation and blind deconvolution.” Neural Comput, vol. 7, no. 6, pp. 1129–1159, 1995. [DOI] [PubMed] [Google Scholar]
- [20].Amari S-I, “Natural gradient works efficiently in learning,” Neural Comput, vol. 10, no. 2, pp. 251–276, 1998. [Google Scholar]
- [21].Anderson M, Adalı T, and Li XL, “Joint blind source separation with multivariate gaussian model: Algorithms and performance analysis,” IEEE Trans Signal Process, vol. 60, no. 4, pp. 1672–1683, 2012. [Google Scholar]
- [22].Silva RF, Plis SM, Adalı T, and Calhoun VD, “A statistically motivated framework for simulation of stochastic data fusion models applied to multimodal neuroimaging,” NeuroImage, vol. 102, Part 1, pp. 92–117, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Lahat D, Cardoso J, and Messer H, “Second-order multidimensional ICA: Performance analysis,” IEEE Trans Signal Process, vol. 60, no. 9, pp. 4598–4610, 2012. [Google Scholar]
- [24].Le QV, Zou WY, Yeung SY, and Ng AY, “Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis,” in Proc CVPR 2011, 2011, pp. 3361–3368. [Google Scholar]
- [25].Lahat D, Adalı T, and Jutten C, “Multimodal data fusion: An overview of methods, challenges, and prospects,” Proc IEEE, vol. 103, no. 9, pp. 1449–1477, 2015. [Google Scholar]
- [26].Miller KL, Alfaro-Almagro F, Bangerter NK, Thomas DL, Yacoub E, Xu J, Bartsch AJ, Jbabdi S, Sotiropoulos SN, Andersson JLR, Griffanti L, Douaud G, Okell TW, Weale P, Dragonu I, Garratt S, Hudson S, Collins R, Jenkinson M, Matthews PM, and Smith SM, “Multimodal population brain imaging in the UK Biobank prospective epidemiological study,” Nat Neurosci, vol. 19, no. 11, pp. 1523–1536, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [27].Calhoun VD and Adalı T, “Multisubject independent component analysis of fMRI: A decade of intrinsic networks, default mode, and neurodiagn. discovery,” IEEE Rev Biomed Eng, vol. 5, pp. 60–73, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [28].Seghouane A and Iqbal A, “Sequential Dictionary Learning From Correlated Data: Application to fMRI Data Analysis,” IEEE Trans Image Process, vol. 26, no. 6, pp. 3002–3015, 2017. [DOI] [PubMed] [Google Scholar]
- [29].Mohammadi-Nejad AR, Hossein-Zadeh GA, and Soltanian-Zadeh H, “Structured and sparse canonical correlation analysis as a brain-wide multi-modal data fusion approach,” IEEE Trans Med Imaging, vol. 36, no. 7, pp. 1438–1448, 2017. [DOI] [PubMed] [Google Scholar]
- [30].Lee J-H, Lee T-W, Jolesz F, and Yoo S-S, “Independent vector analysis (IVA): Multivariate approach for fMRI group study,” NeuroImage, vol. 40, no. 1, pp. 86–109, 2008. [DOI] [PubMed] [Google Scholar]
- [31].Bhinge S, Mowakeaa R, Calhoun VD, and Adal T, “Extraction of time-varying spatio-temporal networks using parameter-tuned constrained IVA,” IEEE Trans Med Imaging, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Pakravan M and Shamsollahi MB, “Extraction and Automatic Grouping of Joint and Individual Sources in Multi-Subject fMRI Data Using Higher Order Cumulants,” IEEE J Biomed Health Inform, 2018. [DOI] [PubMed] [Google Scholar]
- [33].Yu M, Linn KA, Cook PA, Phillips ML, McInnis M, Fava M, Trivedi MH, Weissman MM, Shinohara RT, and Sheline YI, “Statistical harmonization corrects site effects in functional connectivity measurements from multi-site fMRI data,” Hum Brain Mapp, vol. 39, no. 11, pp. 4213–4227, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Mirzaalian H, Ning L, Savadjiev P, Pasternak O, Bouix S, Michailovich O, Grant G, Marx C, Morey R, Flashman L, George M, McAllister T, Andaluz N, Shutter L, Coimbra R, Zafonte R, Coleman M, Kubicki M, Westin C, Stein M, Shenton M, and Rathi Y, “Inter-site and inter-scanner diffusion MRI data harmonization,” NeuroImage, vol. 135, pp. 311–323, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Alam F, Mehmood R, Katib I, Albogami NN, and Albeshri A, “Data Fusion and IoT for Smart Ubiquitous Environments: A Survey,” IEEE Access, vol. 5, pp. 9533–9554, 2017. [Google Scholar]
- [36].Elmadany NED, He Y, and Guan L, “Information Fusion for Human Action Recognition via Biset/Multiset Globality Locality Preserving Canonical Correlation Analysis,” IEEE Trans Image Process, vol. 27, no. 11, pp. 5275–5287, 2018. [DOI] [PubMed] [Google Scholar]
- [37].Uzair M, Mahmood A, and Mian A, “Hyperspectral Face Recognition With Spatiospectral Information Fusion and PLS Regression,” IEEE Trans Image Process, vol. 24, no. 3, pp. 1127–1137, 2015. [DOI] [PubMed] [Google Scholar]
- [38].Yao J, Meng D, Zhao Q, Cao W, and Xu Z, “Nonconvex-sparsity and Nonlocal-smoothness Based Blind Hyperspectral Unmixing,” IEEE Trans Image Process, 2019. [DOI] [PubMed] [Google Scholar]
- [39].Villa A, Benediktsson JA, Chanussot J, and Jutten C, “Hyperspectral Image Classification With Indep. Component Discriminant Analysis,” IEEE Trans Geosci Remote Sens, vol. 49, no. 12, pp. 4865–4876, 2011. [Google Scholar]
- [40].Xu H, Zheng J, Alavi A, and Chellappa R, “Cross-domain visual recognition via domain adaptive dictionary learning,” arXiv preprint, 2018. [Online]. Available: http://arxiv.org/abs/1804.04687 [Google Scholar]
- [41].Fan J, Zhao T, Kuang Z, Zheng Y, Zhang J, Yu J, and Peng J, “HD-MTL: Hierarchical Deep Multi-Task Learning Large-Scale Visual Recog.” IEEE Trans Image Process, vol. 26, no. 4, pp. 1923–1938, 2017. [DOI] [PubMed] [Google Scholar]
- [42].Lu H, Shen C, Cao Z, Xiao Y, and van den Hengel A, “An embarrassingly simple approach to visual domain adaptation,” IEEE Trans Image Process, vol. 27, no. 7, pp. 3403–3417, 2018. [DOI] [PubMed] [Google Scholar]
- [43].Long M, Wang J, Ding G, Sun J, and Yu PS, “Transfer joint matching for unsupervised domain adaptation,” in Proc IEEE CVPR 2014, 2014, pp. 1410–1417. [Google Scholar]
- [44].Patel VM, Gopalan R, Li R, and Chellappa R, “Visual domain adaptation: A survey of recent advances,” IEEE Signal Process Mag, vol. 32, no. 3, pp. 53–69, 2015. [Google Scholar]
- [45].Cai Z, Wang L, Peng X, and Qiao Y, “Multi-view super vector for action recognition,” in Proc IEEE CVPR 2014, 2014, pp. 596–603. [Google Scholar]
- [46].Tang L, Yang Z, and Jia K, “Canonical Correlation Analysis Regularization: An Effective Deep Multi-View Learning Baseline for RGB-D Object Recognition,” IEEE Trans Cogn Devel Syst, 2018. [Google Scholar]
- [47].Sargin ME, Yemez Y, Erzin E, and Tekalp AM, “Audiovisual Synchronization and Fusion Using Canonical Correlation Analysis,” IEEE Trans Multimedia, vol. 9, no. 7, pp. 1396–1403, 2007. [Google Scholar]
- [48].Gao L, Zhang R, Qi L, Chen E, and Guan L, “The Labeled Multiple Canonical Correlation Analysis for Information Fusion,” IEEE Trans Multimedia, vol. 21, no. 2, pp. 375–387, 2019. [Google Scholar]
- [49].Narvor P, Rivet B, and Jutten C, “Audiovisual speech separation based on independent vector analysis using a visual voice activity detector,” in Proc LVA/ICA 2017, Grenoble, France, 2017, pp. 247–257. [Google Scholar]
- [50].Nesta F, Mosayyebpour S, Koldovsk Z, and Paleek K, “Audio/video supervised independent vector analysis through multimodal pilot dependent components,” in Proc EUSIPCO 2017, 2017, pp. 1150–1164. [Google Scholar]
- [51].Anderson M, Fu G-S, Phlypo R, and Adalı T, “Independent vector analysis, the Kotz distribution, and performance bounds,” in Proc IEEE ICASSP 2013, Vancouver, Canada, 2013, pp. 3243–3247. [Google Scholar]
- [52].Hyvärinen A and Köster U, “FastISA: A fast fixed-point algorithm for independent subspace analysis,” in Proc ESANN, 2006, pp. 371–376. [Google Scholar]
- [53].Lahat D and Jutten C, “Joint independent subspace analysis using second-order statistics,” IEEE Trans Signal Process, vol. 64, no. 18, pp. 4891–4904, 2016. [Google Scholar]
- [54].Anderson M, Li X-L, and Adalı T, “Nonorthogonal independent vector analysis using multivariate gaussian model,” in Proc LVA/ICA 2010, ser. Lecture Notes in Computer Science France: Springer, 2010, vol. 6365, pp. 354–361. [Google Scholar]
- [55].Hyvärinen A, Hurri J, and Hoyer P, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, 1st ed., ser. Computational Imaging and Vision Springer, 2009, vol. 39. [Google Scholar]
- [56].Bouwmans T, Javed S, Zhang H, Lin Z, and Otazo R, “On the Applications of Robust PCA in Image and Video Processing,” Proc IEEE, vol. 106, no. 8, pp. 1427–1457, 2018. [Google Scholar]
- [57].Pont-Tuset J, Arbelez P, Barron JT, Marques F, and Malik J, “Multiscale combinatorial grouping for image segmentation and object proposal generation,” IEEE Trans Pattern Anal Mach Intell, vol. 39, no. 1, pp. 128–140, 2017. [DOI] [PubMed] [Google Scholar]
- [58].El-Zehiry NY and Grady L, “Contrast driven elastica image segmnt.” IEEE Trans Image Process, vol. 25, no. 6, pp. 2508–2518, 2016. [DOI] [PubMed] [Google Scholar]
- [59].Szabó Z, Póczos B, and Lőrincz A, “Separation theorem for independent subspace analysis and its consequences,” Pattern Recognit, vol. 45, no. 4, pp. 1782–1791, 2012. [Google Scholar]
- [60].Nadarajah S, “The Kotz-type distribution with applications,” Statistics, vol. 37, no. 4, pp. 341–358, 2003. [Google Scholar]
- [61].Cardoso JF and Laheld BH, “Equivariant adaptive source separation,” IEEE Trans Signal Process, vol. 44, no. 12, pp. 3017–3030, 1996. [Google Scholar]
- [62].Byrd RH, Lu P, Nocedal J, and Zhu C, “A limited memory algorithm for bound constrained optimization,” SIAM J Sci Comput, vol. 16, no. 5, pp. 1190–1208, 1995. [Google Scholar]
- [63].Zhu C, Byrd RH, Lu P, and Nocedal J, “Algorithm 778: L-BFGS-B: Fortran subroutines for large-scale bound-constrained optimization,” ACM Trans Math Softw, vol. 23, no. 4, pp. 550–560, 1997. [Google Scholar]
- [64].Nocedal J and Wright S, Numerical Optimization, 2nd ed. New York, NY: Springer, 2006. [Google Scholar]
- [65].Waltz R, Morales J, Nocedal J, and Orban D, “An interior algorithm for nonlinear optimization that combines line search and trust region steps,” Math Prog, vol. 107, no. 3, pp. 391–408, 2006. [Google Scholar]
- [66].Haufe S, Meinecke F, Görgen K, Dähne S, Haynes J-D, Blankertz B, and Bießmann F, “On the interpretation of weight vectors of linear models in multivar. neuroimag.” NeuroImage, vol. 87, pp. 96–110, 2014. [DOI] [PubMed] [Google Scholar]
- [67].Le Q, Karpenko A, Ngiam J, and Ng A, “ICA with reconstruction cost for efficient overcomplete feature learning,” in Proc NIPS 2011, Granada, Spain, 2011, pp. 1017–1025. [Google Scholar]
- [68].MIALAB, “Group ICA of fMRI Toolbox (GIFT),” 2015. [Online]. Available: http://trendscenter.org/trends/software/gift/index.html
- [69].Rachakonda S, Silva RF, Liu J, and Calhoun VD, “Memory efficient PCA methods for large group ICA,” Front Neurosci, vol. 10, p. 17, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [70].Amari S-I, Cichocki A, and Yang HH, “A new learning algorithm for blind signal separation,” Proc NIPS 1996, vol. 8, pp. 757–763, 1996. [Google Scholar]
- [71].Macchi O and Moreau E, “Self-adaptive source separation by direct or recursive networks,” in Proc ICDSP 1995, Cyprus, 1995, pp. 122–129. [Google Scholar]
- [72].Allen EA, Erhardt EB, Damaraju E, Gruner W, Segall JM, Silva R, Havlicek M, Rachakonda S, Fries J, Kalyanam R, and et al. , “A baseline for the multivariate comparison of resting-state networks,” Front Syst Neurosci, vol. 5, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [73].Nelsen R, An Introduction to Copulas, 2nd ed., ser. Springer Series in Statistics. New York, NY: Springer New York, 2006, vol. 1. [Google Scholar]
- [74].Segall J, Allen E, Jung R, Erhardt E, Arja S, Kiehl K, and Calhoun V, “Correspondence between structure and function in the human brain at rest,” Front Neuroinform, vol. 6, p. 10, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [75].Wu L, Calhoun VD, Jung RE, and Caprihan A, “Connectivity-based whole brain dual parcellation by group ICA reveals tract structures and decreased connectivity in schizophrenia,” Hum Brain Mapp, vol. 36, no. 11, pp. 4681–4701, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
