Abstract
Transcriptome in brain plays a crucial role in understanding the cortical organization and the development of brain structure and function. Two challenges, the incomplete data and the high dimensionality of transcriptome, remain unsolved. Here we present a novel training scheme that successfully adapts the U-net architecture to the problem of volume recovery. By analogy to denoising autoencoder, we hide a portion of each training sample so that the network can learn to recover missing voxels from the context. Then on the completed volumes, we show that restricted Boltzmann Machines (RBMs) can be used to infer co-occurrences among voxels, providing foundations for dividing the cortex into discrete subregions. As we stack multiple RBMs to form a deep belief network (DBN), we progressively map the high-dimensional raw input into abstract representations and create a hierarchy of transcriptome architecture. A coarse to fine organization emerges from the network layers. This organization incidentally corresponds to the anatomical structures, suggesting a close link between structures and the genetic underpinnings. Thus, we demonstrate a new way of learning transcriptome-based hierarchical organization using RBM and DBN.
Keywords: deep belief network, fully convolutional neural network, Restricted Boltzmann Machines, transcriptome architecture
1. Introduction
Transcriptome in brain plays a crucial role in understanding the cortical organization and the development of brain structure. It provides a relatively stable platform for the research of cortical organization because the transcriptome is not altered much by behavioral manipulations or cognitive states, but varies strongly between anatomical locations, cell types, and developmental stages [1]. Previous research has revealed extensive regional heterogeneity of transcriptome. For instance, laminar- and cell type-specific genes have been identified through gene expression studies comparing subregions of neocortex [2], [3], microarray analysis that compares the purified populations of neuronal subtypes [4], [5], and large-scale in situ hybridization (ISH) studies [6]. The discovery of the region-specific and cell-type related genes lays the foundation for the elucidation of the detailed mechanism that controls the specification and differentiation of areas and brain development and functioning, as well as genetic dysregulation in large.
Unsupervised machine learning methods have shown advantages in discovering transcriptome architecture. Similarity-based clustering [7], [8] and non-linear embedding techniques [9] have been applied on gene expression data, revealing areal and laminar structures in the mouse neocortex [10]. Singular value decomposition [11] and matrix factorization methods are also popular approaches in identifying genes with similar coexpression patterns [12], [13]. In our previous work [14], we met two main challenges, incomplete data and high-dimensionality, that have not been solved yet.
A full dataset is usually the prerequisite for many statistical and analytical tools. Yet data loss is common due to the challenging and time-consuming data acquisition process. Any mistreatment of tissue slice, loss of focus during imaging, or misalignment during registration could result in the corresponding data loss. The second chanllege is the high dimensional and complex structure of transcriptome, which makes feature learning very difficult. For example, in [9], Barnes-Hut Stochastic Neighbor Embedding(BH-SNE) is shown superior than linear methods because of its ability of capturing non-linear relations. However, the BH-SNE still failed to produce voxel clusters corresponding to brain structures when no prior dimension reduction was performed.
Here we resolved both above-mentioned challenges, with the final goal of understanding transcriptome organization in a brain a-priori. To estimate the missing values, we propose volume recovery network (VRN). The VRN borrows the idea from denoising autoencoders (DAs) [15]. Instead of feeding the network manually corrupted data and teaching the network to undo noises, we hide a portion of each training sample so that the network can learn to recover the missing voxels from the context. To handle the high dimensionality, we consider Restricted Boltzmann Machines (RBMs) and deep belief network (DBN) because of the larger representation power through composition of nonlinearities. DBN is a multilayer generative model. It is formed by stacking multiple layers of RBMs. RBMs are fit for inferring the transcriptome-based brain parcellation because through training it learns which units in the visible layer tend to co-occur and then record the significant activation in the hidden layer.
The co-occurrences of voxels provide foundations for dividing the cortex into discrete subregions. As we stack multiple RBMs to form a DBN, we progressively map the high-dimensional raw input into abstract representations and create a hierarchical data-driven transcriptome architecture, presumably revealing how brain subregions interact with one another in a hierarchical manner.
2. BACKGROUNDS
2.1. Incomplete Data
Incomplete data has been a problem frequently encountered to genomic data analysis. The simplest solution to incomplete data is to ignore them. In one of our prior works [13], we worked around the problem by first studying the coexpression networks slice-wise and then infer the gene-gene interactions by considering only the slices with data. Using two steps, we focus on the known interactions. Yet the strategy is only applicable to a specific problem and an extra step is required for data integration.
Alternatively, it is possible to use image inpainting methods for missing value estimation because ISH volumes are directly image structures. Classical inpainting methods [16], [17] restore the image based on either local or non-local information. Most existing methods require continuous textures or contours across the known and missing region. However, this assumption is often not necessarily true, especially when the missing region is large and in arbitrary shape. Other methods resort to external database for a possible match for the missing region [18]. Failures occur when the test image is significantly different from the database. Recently, learning-based methods have shown superior performance in image completion problem [19], [20]. Instead of hand-designing features for patch editing or matching, dictionaries or a neural network are learned from data [19], [20]. Deep neural networks have shown great promise in filling large missing regions in images, a more challenging task that requires a deeper understanding of the image. These models provide a plausible completion by learning the semantic meaning [21], [22].
2.2. High-dimensional and complex structure of transcriptome
The second challenge is posed by the enormous dimensionality of the transcriptome. Multiple models such as principal component analysis (PCA) [23], independent component analysis (ICA) [24], classical Multidimensional Scaling (cMDS) [25] and sparse coding [14] have been applied to analyze transcriptomic data. As helpful they are in reducing the dimensionality, they are all linear shallow mappings and inadequate for inferring complex non-linear structures of data. For example, it is reported that features captured by principal components can sometimes degrade cluster quality [26].
Deep models such as deep neural networks (DNNs) show a larger representation power through composition of many nonlinearities. For instance, DAs have been used to learn a compact representation of yeast microarray expression profiles [27]. The clusters obtained from the learned codes is more consistent with the pre-assigned ground truth labels in comparison with those obtained via clustering the raw data. Relatedly, an ensemble of DAs proves effective in extracting stable expression signatures from public gene expression data with diverse experiments [28]. In addition to the expression power, DNNs have a hierarchical structure in which higher-level features are obtained by composing lower-level ones. This compositional hierarchies are also seen in many natural signals such as signaling systems of cells [29]. In several recent publications, DNNs have been successfully used to simulate the cellular signaling system [30], [31].
DBN is a multilayer generative model. It is formed by stacking multiple layers of Restricted Boltzmann Machines (RBMs). Both RBMs and DBNs have been demonstrated effective in extracting features from various modalities including text [32], video, audio [33], gene microarray data [31] and neuroimages [34].
3. METHODS
The computational pipeline of proposed framework is illustrated in Fig. 1. First, we trained a VRN for volume completion. Then we train RBM and DBN using the completed volumes.
Fig. 1.
Computational pipeline of the framework. Axial view of (a) raw ISH volume (b) Completed volume using volume recovery network. The foreground voxels of the completed volumes are fed to (f) RBM and (g) DBN. (c-e) Visualization of selected weight maps in different layers of DBN.
3.1. Volume Recovery Network
3.1.1. Training strategy
The VRN borrows the idea from DAs. Instead of adding noises to the inputs to teach the network to undo the noises, we hide some data of each training data to teach the network to recover the missing voxels from the context. It is important that the design of which portion to hold out can help the network learn to recover the volume. By observation, the data loss for AMBA are usually one or more slices along coronal axis. This loss pattern is a result of the acquisition step when the brain tissues were sectioned along coronal axis and then digitally processed, stitched, registered, gridded, and quantified (Fig. 1a).
Based on the patterns of missing data, we designed three strategies on which portion of data to hold out. First, we hide a random slice by setting all voxels on the slice to −1, which represents missing values. Second, we randomly pick two consecutive slices to hide. Third, we random pick a slice and sample from existing missing data patterns on that slice and mask out part of the slice. To make sure the third strategy does not overlap with the first situation, we set a range of 10 to 80 to the percentage of allowed missing data on that slice. The three ways of simulation of missing data render the network to learn to recover missing data from the previous and the subsequent slices as well as the same slice.
The choice for partial slice mask is important. Previously the frequently used image masks for face/natural image completion are central square mask or random blocks [21]. Here the AMBA brings dozens of existing missing slice masks. Instead of arbitrarily generating the partial slice masks, these masks are sampled from existing data because they come from the exact distribution of the data to be completed. On average, each slice has about 135 different masks with 10-80 percent of voxels of the slice missing. As we will show later, the inclusion of the partial masks is essential to prevent the network from learning the low-level features latching onto the boundaries. Additionally, the limitation on the percent of missing voxels is also essential in preventing the slice training from degrading to strategy 1.
3.1.2. Network Architecture
Fig. 2 shows the architecture of our network, a 3D U-net architecture [35]. It consists of an encoder and a decoder. In the contracting path, repeated convolutions using 3×3×3 filters followed by a Rectified Linear Unit (ReLU) and 2×2×2 pooling layers are used to aggregate features and increase the size of the receptive field. In the expansive path, the 2×2×2 deconvolutional layers are used to propagate context information to higher layers. The skip layers between the encoding and decoding path ensure that high resolutions features are retained and localized [35]. The architecture is fully convolutional, which means the network allows the input volume in arbitrary shape.
Fig. 2.
Volume recovery network architecture. Each training sample is a volume of size 72×48×64. (a) We designed three schemes of holding out data, hide partial slice, hide a single slice and hide two consecutive slices. (b) The volume is first corrupted by one the three ways before it is inputted to the network. (c) The output is the predicted volume of the same size of the input. MSE loss is calculated between the predict-ed volume and the raw input. (d) VRN consists of an encoder and decoder and the mirrored layers are connected via skip layers. The type of the layer, number and size of kernels are denoted in the box.
The training is achieved by regressing to the ground truth content of the entire volume, including the held-out region. Mean squared error (MSE) loss is used as our reconstruction loss function. As the ground truth volume might contain missing values, only the losses of the voxels with ground truth were counted.
The model was implemented using Keras package [36] The initial learning rate was 10−6 and decay rate is 10−6 Adam [37] was used as the optimizer. To ensure a seamless tiling of the output, each training sample is padded on each side and the full volume is of size 72×48×64. The number of training samples is 3300 and the number of validation samples is 330, which consists of 85% of the data. The remaining 15% of data is used for testing. During training, we consider all three strategies for each training sample and each time only one strategy is applied. Assuming that each volume has 57 coronal slices in use, then the first two strategies generate 57×2=114 ways of corruption. For the third strategy, the average number of partial mask for each slice is 135 and with 57 coronal slices there are 135×57=7695 ways of corruption. Putting it altogether for each volume we can generate ~7800 new samples. Therefore, no data augmentation is required. Each epoch took about 40 hours on a 12GB Nvidia Geforce GPU.
3.1.3. Evaluation
In addition to MSE, we use two more metrics to evaluate the quality of the predictions. The first metric is structural similarity index (SSIM) [38]. SSIM estimates the holistic similarity between two images and has been used as a useful metric for evaluating algorithms designed for image compression, image reconstruction, denoising and super-resolution. The second one is the peak signal-to-noise ratio (PSNR) which directly measures the difference in pixel values. The evaluation scheme corresponds to the training strategy, which consists of conditions of one slice missing, two consecutive slices missing and partial slice missing. Instead of using entire volume, we only evaluate the metrics on the completed slice(s) against the ground truth. For partial slice missing, to reduce computations, we estimate the performance by using the same randomly selected ten missing masks for each slice.
3.2. Volume recovery by mean estimation from neighbors (MEN)
Missing values were estimated as the mean of the foreground voxels in its 26 neighborhood. In each iteration, mean calculations were performed for each missing voxel. For a voxel whose surrounding voxels are all missing, it is skipped from filling for the current iteration. The estimation stops when all missing values were filled. MEN is an effective simple method and it is used as the baseline model.
3.3. Volume recovery by three-dimensional convolutional autoencoder (3D CAE)
We also compared the results obtained from 3D CAE [39]. The network of a 3D CAE is same to that of U-net except for that all the skip connections are removed. All hyper parameters, loss functions and the training strategies remain the same to those of VRN.
3.4. Restricted Boltzmann Machine and deep belief network
A RBM [40] is a probabilistic energy-based model and the objective is to fit a probability distribution model over a set of visible random variables to the observed data. As shown in the definition, the model restricts the interactions in the Boltzmann energy function to only those between visible neurons and hidden neurons. An RBM can be graphically represented as a bipartite graph (Fig. 1c). For binary visible units v ∈ {0,1} and h ∈ {0,1}, the energy function is defined as follows:
(1) |
where θ={W,a,b} are the model parameters, weights W connect the visible units (v) and the hidden units (h) and a and b are their biases.
The joint distribution over the visible and hidden units can be obtained via the energy function:
(2) |
where Z is the normalization term. It is given by summing over all possible pairs of visible and hidden vectors and thus intractable.
The probability that the network assigns to a visible vector, v, is given by summing over all possible hidden vectors (3) and the objective of a RBM is to maximize the log likelihood of all data points.
(3) |
Following (3), the model updates can be obtained from the derivatives from the log likelihood given a set of observations v (4). As shown in the equation, there are two ways to adjust the probability that the network assigns to a training image. First is by adjusting the weights and biases to lower the energy of that image. Alternatively, we can raise the energy of other images, which results an increase to the partition function.
(4) |
where the angle brackets denote the expectation with respect to the specified distribution.
Sampling unbiasedly from 〈vihj〉data is easy because there are no direct connections between hidden units or visible units. However, getting an unbiased sample of 〈vihj〉model is difficult because of the long computation for Gibbs sampling. This problem is solved by ‘contrastive divergence’ (CD) [41], whose key ideas are to (1) initialize the Markov chain with a distribution close to the training data and (2) use samples from a few steps of Gibbs sampling as a close approximation.
In our work, the observed expression energies are real-valued v ∈ RD. We use a variant of RBM, the Gaussian-binary RBM [42], for modelling real-valued vectors. The model assumes that each visible unit have independent Gaussian noise. Given real-valued visible units v ∈ RD, and h ∈ {0,1}F, the energy function is defined as
(5) |
where σi is the standard deviation of the Gaussian noise for visible unit i.
The conditional probability of the visible unit given the hidden units is modeled by a Gaussian distribution, whose mean is a function of the hidden units (6).
(6) |
The update of model parameters takes a very similar form to the RBM with binary visible unit (7).
(7) |
Unlike RBM with a single layer, DBN captures the features using multiple layers in a stochastic manner. The top two layers with undirected connections form an RBM and the lower layers have directed connections (Fig. 1g). The training for each RBM was performed layer-wise in a greedy manner. The hidden units in the first RBM (hidden layer 1) are taken as the visible units of the second RBM. The hidden units of the second RBM (hidden layer 2) are fed as the visible units into the third RBM. It has been shown that each addition of a RBM can improve the variational lower bound on the log probability of the training data [41].
In this work, the deepnet (https://github.com/nitishsri-vastava/deepnet), a public available package, was applied to train the RBM and DBN. To handle real-valued data, Gaussian visible units were used. Each Gaussian visible unit was set to have unit variance (σi = 1) which was kept fixed and not learned. The DBN for exploring the voxel co-occurrences consists of 60144 Gaussian visible units followed by 1024 binary hidden units in the first hidden layer, 256 hidden units in the second hidden layer, 64 in the third hidden layer. Each layer of weights was trained using CD with the number of times running CD, i.e. k=1 [41].
The learning rates in all hidden layers are set initially to 0.001 and decrease as the inverse of time with a decay half-life of 5000. The activation function used is tangent hypobolic function. The initial and final momentum are 0.5 and 0.9 with the change step to be 5000. The batch size is 100. The weights are initialized as a zero-mean Gaussian distribution with a standard deviation of 0.01. We want the weights to be sparse because most genes are expressed in a small percentage of cells [43]. Thus, we add a regularization term, the ℓ1 norm of the weights to induce the weight sparsity in each RBM. We found that the ℓ1 value of 0.1 works well in practice over multiple experiments. It has been reported that the ℓ1 constraint does not dominate RBM learning results [34].
4. Experimental materials
Allen Mouse Brain Atlas (AMBA) is a genome-wide cellular-resolution map of gene expressions using ISH that offers brain-wide anatomical coverage of mouse brain [43]. The inbred mouse strain is used to reduce the animal-to-animal variation in brains. For each tested gene, the mouse brain was sectioned into series of tissues in coronal or sagittal planes and then imaged. To enable three-dimensional volumetric representations from the acquired coronal or sagittal series images, a common coordinate space of the three-dimensional (3D) reference atlas [44] was first created so that the ISH images of each gene can be consistently registered to the same space and aligned. Later each image was uniformly divided into 200×200 um grids and gene-expression statistics were computed from the detected signals for each voxel. The resulted voxelized expression grids encoding the important spatial information of over 4,345 genes in coronal sections and 21,718 genes in sagittal sections make up the key components of the AMBA.
We downloaded the 4,345 3D volumes of expression energy of coronal sections as well as the corresponding reference atlas from the website of Allen Brain Atlas (ABA) (http://mouse.brain-map.org/) to perform our analysis. Coronal sections are chosen because they registered more accurately to the reference model than the counterparts of sagittal sections. The dimension of all 3D volumes applied in this study is 67×41×58.
5. Results
5.1. Volume recovery
5.1.1. Comparison with CAE and MEN
To demonstrate that the VRN is able to complete missing data, we manually hide a portion of volume and then compare the results predicted by VRN, CAE and MEN. In the first and second experiment, part of the slice 24 were masked (Fig. 3b,c). The masks were sampled from existing missing data patterns of the AMBA. As shown, the missing regions predicted by the VRN and CAE preserves the gradient of expressions at isocortex layers. The gradient is smoothed out by MEN. The visual differences are also reflected in MSE, SSMI and PSNR in both cases (Fig. 3b,c). Then we tested on masking out the entire slice of 24 (Fig. 3d). A blurry isocortex layers remained for the MEN. In contrast, the performance by VRN and CAE did not deteriorate. The predicted slice 24 emulates the patterns of the raw energies, suggesting the deep models use the previous and subsequent slices for prediction. For CAE, the middle gradient in the isocortex generated is not as sharp as that of VRN (Fig. 3d white arrows). Next, we evaluated the performance of both methods on predicting slice 24 when the previous slice (Fig. 3e) or the subsequent slice (Fig. 3f) is also missing. In both cases, the missing slice filled by VRN are sharper and the layer gradient remains clear. In contrast, the slice filled by CAE and MEN lost the details around somatosensory and motor regions. Overall, the VRN is able to preserve the high-resolution details that are not captured by either MEN or CAE during missing voxels recovery.
Fig. 3.
Comparison of the volume completion by VRN, CAE and MEN. The gene acronym is syt7. The input is the entire volume. Only slice 23, 24 and 25 are shown. White parts indicate regions with missing values. In all cases, slice 24 is predicted. (a) Ground truth for slice 23, 24 and 25. Input volume is corrupted by hiding (b)(c) part of slice 24, (d) the entire slice 24. (e) slice 23 and 24, (f) slice 24 and 25.
Then we also made comparison among the three methods for all transcripts (Table 1). In all three conditions where one or two slices miss or partial slice miss, the predictions made by VRN is consistently better than MEN as well as CAE. The performance improvement is confirmed by all evaluation metrics. The comparison also confirmed the importance of skipping connections in maintaining the details in the volume data.
Table 1.
Performance comparison between VRN, CAE and MEN. The average of three metrics on all slices are presented.
one slice missing | two slices missing | partial slice missing | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
MSE | SSIM | PSNR | MSE | SSIM | PSNR | MSE | SSIM | PSNR | ||
VRN | train | 1.42 | 0.87 | 23.30 | 2.07 | 0.81 | 20.78 | 0.54 | 0.94 | 29.28 |
test | 1.32 | 0.88 | 23.21 | 1.98 | 0.82 | 20.29 | 0.50 | 0.94 | 28.88 | |
CAE | train | 2.10 | 0.81 | 21.77 | 3.30 | 0.75 | 19.59 | 0.82 | 0.93 | 27.80 |
test | 1.82 | 0.83 | 21.39 | 2.87 | 0.79 | 19.28 | 0.73 | 0.92 | 27.96 | |
MEN | train | 2.71 | 0.82 | 18.80 | 4.71 | 0.75 | 18.09 | 1.02 | 0.93 | 26.97 |
test | 2.31 | 0.86 | 18.78 | 3.81 | 0.79 | 17.82 | 0.88 | 0.94 | 27.36 |
5.1.2. Importance of including partial slice training
It is worth noting the importance of incorporating the partial slice training into the training strategy. To demonstrate this point, we trained a network without the strategy of hiding only part of a slice. As a baseline we first compare the performance of both networks when the entire slice is missing. As seen in Fig. 4f,g,m,n, the completed slices by both networks are close to the ground-truth and they are very similar to each other. in the case where only part of the slice is missing. Quantitatively, the SSMI are both 0.98 and PSNR are about 33. The MSEs are −0.6. The performance deviates performance of the network with partial slice training shows improvements (Fig. 4j) or remains at a similar level (Fig. 4c), probably due to the extra information on the slice. In contrast, for the network trained without partial slice training, the voxels near the boundaries of the slice mask showed obvious cracks (Fig. 4d, k black arrows). The discontinuities are also reflected in a much higher MSE and a lower SSMI and PSNR. For slice 32, even though over half of the slice is given, the MSE is almost the same as that of the case where the entire slice is hidden. The performance on slice 22 is even worse because the MSE of the prediction on a portion of slice is over five times higher (3.46) than that of the prediction on the entire slice (0.63). The results here indicate that without partial slice training, the information on the same slice does not help the network complete the missing region, but makes it more complex. The incorporation of partial slice training is essential as it helps train the filters to integrate the information from the previous and subsequent slice as well as the information on the slice.
Fig. 4.
Illustration of the importance of incorporating partial slice corruption into training strategy. Slice 22 (row 1) and 32 (row 2) of gene Itfg1 are presented. First image(a,h) is the ground truth. Partial slice (b,i) is corrupted with sampled masks. Missing region is denoted in white. Then we show the completed slices with(out) partial slice training when part of slice is missing (c,d,j,k) or the entire slice is corrupted (f,g,m,n).
5.2. RBM to infer single-level transcriptome architecture
A RBM serves as a helpful learning tool as it models the density of visible variables by introducing a set of conditionally independent latent variables. In the context of unveiling the transcriptome organization based on gene expression profiles, it is trained to learn which foreground voxels in the visible layer tend to co-occur for the given set of gene expression patterns and then record the significant activation in the hidden layer. We assume that the co-occurred voxels should belong to the same region and these co-occurring patterns are fundamental and intrinsic to the transcriptome data. For the convenience of future discussion, we denote the spatial map of each presented weight as a weight map.
To discover voxel co-occurrence, we set the number of hidden units to 1024. The trained weights were linearly projected to the input space for intuitive interpretation of the representations. The weight maps help us understand how much each foreground voxel contributes to a specific activation pattern. By visual inspection, most spatial distributions of these weights form tight continuous clusters. This clustering patterning agrees with the brain’s organizational principle that transcriptome similarities are strongest between spatial neighbors, both between cortical areas and between cortical layers [45]. The delineations are in general symmetric and match major canonical brain regions including caudoputamen(Fig. 5a), hippocampus (Fig. 5b), isocortex (Fig. 5f–i), thalamus(Fig. 5), hypothalamus (Fig. 5k), medulla (Fig. 5l), midbrain (Fig. 5m) as well as ventricular systems (Fig. 5n) and fiber tracts (Fig. 5o). Interestingly, the features learned in the first layer often correspond to a finer breakdown of known-brain regions. For instance, field CA1, field CA3, and dentate gyrus (DG) and subiculum are identified individually (Fig. 5b–e). The isocortex is further divided into primary somatosensory area (Fig. 5g), primary visual area (Fig. 5h) and primary motor area (Fig. 5i). We also made comparisons with the clustering results obtained using K-means and heirachical clustering (Supplementary material). The clusters obtained by RBM and DBN are more coherent and robust to noises.
Fig. 5.
Visualization of weight maps learned by RBMs. Results were obtained from a RBM with 1024 hidden units. In each subfigure, three views including axial view, sagittal and coronal view were shown.
5.3. DBN to infer a hierarchy of transcriptome architecture
One superior advantage of deep neural networks over other shallow models is that they can capture the hierarchy of features. As we stack multiple RBMs to form a DBN, we create a hierarchy of transcriptome architecture. We use Fig. 6 to demonstrate this hierarchy. A visualization of the weight maps of DBN shows that the features learned in the first levels are generally localized and clustered and in this case, mostly correspond to subregions of caudoputamen or the nearby regions such as olfactory tubercle and striatum (Fig. 6 green shadow). Yet the differences among the patterns are discernable. As we go to the second layer, these fine anatomical subregions learned at the first layer started to merge into larger areas (Fig. 6 blue shadow). Intuitively, the weight maps are more likely to merge with those with similar spatial distributions because the second RBM learns the co-occurrences of subregions identified in the first RBM. Indeed, all weight maps that were combined show spatial overlaps. For example, the weight map 292 (Fig. 6f) in the first layer shows strong signals at medial caudoputamen and it is combined with weight map 764 (Fig. 6g) that is activated at caudal part of caudoputamen. In addition to the merge of subregions, it is common to see the merge of regions spatially adjacent. The weight map 776 (Fig. 6m) features high values at olfactory tubercle and the weight map 519 (Fig. 6l) shows higher weights at rostral nucleus accumbens. Both weight maps are summarized by the weight map 227 (Fig. 6d) in the second hidden layer. It is worth noting that the lower layer weight map can have connections with multiple higher-level weight maps. For example, the layer-1 weight map 519 (Fig. 6l) that shows high values at the rostral Nucleus accumbens is used by both layer-2 weight map 199 (Fig. 6e) and 227 (Fig. 6d). At the third layer, we saw further combinations of weight maps in layer 2, obtaining a layer-3 weight map sp arming the entire caudoputamen, nucleus accumbens, olfactory tubercle as well as piriform area and substantia innominate, etc (Fig. 6a).
Fig. 6.
Visualization of a hierarchy of transcriptome architecture. In each subfigure, three views including axial view (left), sagittal (right top) and coronal view (right bottom) are shown. The weights maps of hidden layer 1, hidden layer 2, hidden layer 3 are colored in green, blue and orange respectively. The index of the weight in the respective layer is noted under each subfigure. For the hidden units, their weights were visualized as a weighted linear combination of the weights of the Gaussian RBM. Here four layer-2 weight maps were selected and for each of the layer-2 weight map, two layer-1 maps were selected for presentation. The arrows indicate a compositional relationship. The weight in the next layer is a sum of weighted linear combination of the weights in the previous layers.
6. Discussions
The architecture of VRN is a convolutional autoencoder with skip layers linking the encoder and decoder. The skip layers are essential for maintaining high-resolution details as they pass image details from the convolutional layers to deconvolutional layers. These high-resolution details that are important for gene expression data. For example, the expression gradient is a key feature associated with how genes regulate brain functions. Many of the differences reported between functionally distinct cortical regions is not due to the selective expression in functionally discrete regions but rather discontinuous sampling across a gradient [46].
The right training strategy is also essential for the performance of VRN. The scheme of hiding a single slice or partial slice can effectively train the convolutional filters learn from previous and subsequent slices as well as the surrounding voxels on the same slice. The strategy of hiding two consecutive slices trains the network to integrate higher-level semantic meanings from regions further apart. Due to these novel and effective designs, the performance of VRN is superior in comparison with mean estimation from neighbors and CAE.
RBMs learned clusters corresponding to classical neuroanatomy. The separation of these subregions not only demonstrated the heterogeneity of transcriptome among these regions, but also the ability of RBM in extracting voxel co-occurrences and inferring transcriptome architecture. It should be noted that the input to RBMs are not constrained to image data. If each visible node is a transcript, then RBM can learn the co-occurrence among genes, i.e. the coexpression patterns.
Another reason for the choice of RBM is that RBMs are the building block for more complex deeper models like DBN. These deep models can capture features that are not possible for a shallow model. Having validated the clusters obtained from RBMs, we stack the RBMs into a three-layer DBN and create a hierarchy. From the hierarchy, we observed a fine to coarse transcriptome organizations. The components learned on lower levels are localized and match with subregions of canonical neuroanatomy. It agrees with the principle of brain organization that transcriptome similarities are the strongest between anatomical neighbors. At higher levels, these localized features merged and interacted with adjacent groups. Overall, a coarse to fine organization emerges from the network layers, and we show how the subregions merge and interact with one another. It is found that the organization incidentally correspond to the anatomical structures well, suggesting a close link between brain structures and the genetic underpinnings. Thus, we demonstrated a new way of learning transcriptome-based hierarchical organization of mouse brain using RBM and DBN.
7. Conclusions
The objective of the work is to understand the organization of brain structure from the transcriptome’s point of view. This objective is achieved by exploring a public dataset called Allen Mouse Brain Atlas using data-driven methods RBM and DBN. Two challenges, the incomplete data and the high dimensionality of transcriptome data, have been addressed in this paper. As to incomplete data, we designed VRN, whose key idea is to hide some portion of each training data and teach the network to recover the missing voxels from the context. The results show that the VRN is adequate in inferring the regions with a portion or entire slice miss or even with two consecutive slices miss.
To handle the high dimensionality of data, we showed that RBM and DBN are effective tools in studying the transcriptome architecture. Specifically, RBM can learn the co-occurrences patterns among voxels, providing a transcriptome-based anatomy. The 3D visualizations of the weight maps show that the voxel clusters match well with the classical neuroanatomy. This provides strong evidence of a close link between transcriptome and brain structure. Further with DBN, we build a hierarchical data-driven transcriptome architecture. A coarse to fine organization emerges from the network, revealing how brain subregions interact with one another in a hierarchical manner.
The transcriptomic similarity provide useful hints for the similarities in structures and functions as well as brain connectivity [47]. In future work, we will extend current framework by including other image modalities such as diffusion tensor image, neuronal tracing data or functional magnetic resonance image. A joint representation of micro-scale gene expression and macro-scale neuroimages can possibly reveal the correlations across different scales and modalities, thus providing deeper understanding of the organization architecture of the brain.
Acknowledgment
T. Liu is supported by NIH R01 DA-033393, NSF CAREER Award IIS-1149260, NIH R01 AG-042599, NSF BME-1302089, NSF BCS-1439051 and NSF DBI-1564736.
Biographies
Yujie Li is a Ph. D student of computer science at the University of Georgia. She received her B.S degree from Huazhong University of Science and Technology, China. Her research interests focus on biomedical image analysis using machine learning.
Heng Huang is a Ph. D student from Northwestern Polytechnical University, China. He is also a visiting student at the Univerisity of Georgia. His research interests include fMRI, Data mining and artificial intelligence.
Hanbo Chen obtained his Ph. D degree in computer science at the University of Georgia, USA. His Ph. D study centered on understanding the architecture of brain connectome. He developed a set of smart computational methods to tackle challenges associated with big neuroimaging data. He is now a senior researcher at Tencent AI Lab, deploying and developing the state-of-the-art deep learning algorithms for computer-aided medical image diagnosis system.
Tianming Liu is a distinguished research professor of computer science at The University of Georgia. His research interest focuses on brain imaging and mapping. He has published over 280 peer-reviewed papers in this area. He is the recipient of the NIH Career Award and the NSF CAREER award, both in the area of brain mapping.
Contributor Information
Yujie Li, Department of Computer Science, University of Georgia, Athens, GA 30602.
Heng Huang, Department of Computer Science, University of Georgia, Athens, GA 30602; School of Automation, Northwestern Polytechnical University, Xi’an, China.
Hanbo Chen, Department of Computer Science, University of Georgia, Athens, GA 30602.
Tianming Liu, Department of Computer Science, University of Georgia, Athens, GA 30602.
References
- [1].Lein ES, Belgard TG, Hawrylycz M, and Molnár Z, “Transcriptomic Perspectives on Neocortical Structure, Development, Evolution, and Disease,” Annu. Rev. Neurosci, vol. 40, no. 1, pp. 629–652, 2017. [DOI] [PubMed] [Google Scholar]
- [2].Liu Q, Dwyer ND, and O’Leary DD, “Differential expression of COUP-TFI, CHL1, and two novel genes in developing neocortex identified by differential display PCR.,” J. Neurosci, vol. 20, no. 20, pp. 7682–7690, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Zhong Y, Takemoto M, Fukuda T, Hattori Y, Murakami F, Nakajima D, Nakayama M, and Yamamoto N, “Identification of the genes that are expressed in the upper layers of the neocortex,” Cereb. Cortex, vol. 14, no. 10, pp. 1144–1152, 2004. [DOI] [PubMed] [Google Scholar]
- [4].Winden KD, Oldham MC, Mirnics K, Ebert PJ, Swan CH, Levitt P, Rubenstein JL, Horvath S, and Geschwind DH, “The organization of the transcriptional network in specific neuronal classes.,” Mol. Syst. Biol, vol. 5, no. 291, p. 291, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Cahoy J, Emery B, Kaushal A, Foo L, Zamanian J, Christopherson K, Xing Y, Lubischer J, Krieg P, Krupenko S, Thompson W, and Barres B, “A Transcriptome Database for Astrocytes, Neurons, and Oligodendrocytes: A New Resource for Understanding Brain Development and Function,” J. Neuronscience, vol. 28, no. 1, pp. 264–278, 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Lein ES, Zhao X, and Gage FH, “Defining a molecular atlas of the hippocampus using DNA microarrays and high-throughput in situ hybridization.,” J. Neurosci, vol. 24, no. 15, pp. 3879–89, Apr. 2004. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [7].Bohland JW, Bokil H, Pathak SD, Lee C-K, Ng L, Lau C, Kuan C, Hawrylycz M, and Mitra PP, “Clustering of spatial gene expression patterns in the mouse brain and comparison with classical neuroanatomy.,” Methods, vol. 50, no. 2, pp. 105–12, 2010. [DOI] [PubMed] [Google Scholar]
- [8].Ng L, Bernard A, Lau C, Overly CC, Dong H-W, Kuan C, Pathak S, Sunkin SM, Dang C, Bohland JW, Bokil H, Mitra PP, Puelles L, Hohmann J, Anderson DJ, Lein ES, Jones AR, and Hawrylycz M, “An anatomic gene expression atlas of the adult mouse brain.,” Nat. Neurosci, vol. 12, no. 3, pp. 356–62, Mar. 2009. [DOI] [PubMed] [Google Scholar]
- [9].Mahfouz A, van de Giessen M, van der Maaten L, Huisman S, Reinders M, Hawrylycz MJ, and Lelieveldt BPF, “Visualizing the spatial gene expression organization in the brain through non-linear similarity embeddings,” Methods, vol. 73, pp. 79–89, 2015. [DOI] [PubMed] [Google Scholar]
- [10].Hawrylycz M, Bernard A, Lau C, Sunkin SM, Chakravarty MM, Lein ES, Jones AR, and Ng L, “Areal and laminar differentiation in the mouse neocortex using large scale gene expression data.,” Methods, vol. 50, no. 2, pp. 113–21, 2010. [DOI] [PubMed] [Google Scholar]
- [11].Grange P, Bohland JW, Okaty BW, Sugino K, Bokil H, Nelson SB, Ng L, Hawrylycz M, and Mitra PP, “Cell-type-based model explaining coexpression patterns of genes in the brain.,” Proc. Natl. Acad. Sci. U. S. A, vol. 111, no. 14, pp. 5397–402, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [12].Thompson CL, Pathak SD, Jeromin A, Ng LL, MacPherson CR, Mortrud MT, Cusick A, Riley ZL, Sunkin SM, Bernard A, Puchalski RB, Gage FH, Jones AR, Bajic VB, Hawrylycz MJ, and Lein ES, “Genomic Anatomy of the Hippocampus,” Neuron, vol. 60, no. 6, pp. 1010–1021, 2008. [DOI] [PubMed] [Google Scholar]
- [13].Li Y, Chen H, Jiang X, Li X, Lv J, Peng H, Tsien J, and Liu T, “Discover Mouse Gene Co-expression Landscapes Using Dictionary Learning and Sparse Coding,” Brain Struct. Funct, vol. 11, pp. 1–18, 2017. [DOI] [PubMed] [Google Scholar]
- [14].Li Y, Chen H, Jiang X, Li X, Lv J, Li M, Peng H, Tsien JZ, and Liu T, “Transcriptome Architecture of Adult Mouse Brain Revealed by Sparse Coding of Genome-Wide In Situ Hybridization Images,” Neuroinformatics, vol. 15, no. 3, pp. 285–295, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Gondara L, “Medical image denoising using convolutional denoising autoencoders,” 2016.
- [16].Criminisi A, Perez P, and Toyama K, “Object removal by exemplar-based inpainting,” 2003 IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognition, 2003. Proceedings, vol. 2, p. II-721–II-728. [Google Scholar]
- [17].Sun J, Yuan L, Jia J, and Shum H-Y, “Image completion with structure propagation,” ACM Trans. Graph, vol. 24, no. 3, p. 861, 2005. [Google Scholar]
- [18].Hays J and Efros AA, “Scene completion using millions of photographs,” Commun. ACM, vol. 51, no. 10, p. 87, 2008. [Google Scholar]
- [19].Xie J, Xu L, and Chen E, “Image Denoising and Inpainting with Deep Neural Networks,” Adv. Neural Inf. Process. Syst, pp. 1–9, 2012. [Google Scholar]
- [20].Mairal J, Elad M, and Sapiro G, “Sparse representation for color image restoration,” IEEE Trans. Image Process, vol. 17, no. 1, pp. 53–69, 2008. [DOI] [PubMed] [Google Scholar]
- [21].Pathak D, Krahenbuhl P, Donahue J, Darrell T, and Efros AA, “Context Encoders: Feature Learning by Inpainting,” in Conference on Computer Vision and Pattern Recognition, 2016. [Google Scholar]
- [22].Yeh RA, Chen C, Lim TY, Schwing AG, Hasegawa-Johnson M, and Do MN, “Semantic Image Inpainting with Deep Generative Models,” in Conference on Computer Vision and Pattern Recognition, 2017. [Google Scholar]
- [23].Raychaudhuri S, Stuart JM, and Altman RB, “Principal components analysis to summarize microarray experiments: application to sporulation time series,” Pac Symp Biocomput, vol. 463, pp. 455–466, 2000. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Liebermeister W, “Linear modes of gene expression determined by independent component analysis,” Bioinformatics, vol. 18, no. 1, pp. 51–60, 2002. [DOI] [PubMed] [Google Scholar]
- [25].Hawrylycz MJ, Lein ES, Guillozet-Bongarts A, Shen EH, Ng L, and Miller JA, “An anatomically comprehensive atlas of the adult human brain transcriptome,” Nature, vol. 489, no. 7416, pp. 391–399, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Yeung KY and Ruzzo WL, “Principal component analysis for clustering gene expression data,” Bioinformatics, vol. 17, no. 9, pp. 763–774, 2001. [DOI] [PubMed] [Google Scholar]
- [27].Gupta A, Wang H, and Ganapathiraju M, “Learning structure in gene expression data using deep architectures, with an application to gene clustering,” 2015 IEEE Int. Conf. Bioinforma. Biomed, pp. 1328–1335, 2015. [Google Scholar]
- [28].Tan J, Doing G, Lewis KA, Price CE, Chen KM, Cady KC, Perchuk B, Laub MT, Hogan DA, and Greene CS, “Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks,” Cell Syst, vol. 5, no. 1, p. 63–71.e6, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [29].Xu J and Lan Y, “Hierarchical feedback modules and reaction hubs in cell signaling networks,” PLoS One, vol. 10, no. 5, pp. 1–25, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [30].Chen L, Cai C, Chen V, and Lu X, “Trans-species learning of cellular signaling systems with bimodal deep belief networks,” Bioinformatics, vol. 31, no. 18, pp. 3008–3015, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [31].Chen L, Cai C, Chen V, and Lu X, “Learning a hierarchical representation of the yeast transcriptomic machinery using an autoencoder model,” BMC Bioinformatics, vol. 17, no. S1, p. 9, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Srivastava N and Salakhutdinov R, “Multimodal Learning with Deep Boltzmann Machines,” Adv. neural Inf Process. Syst, pp. 2222–2230, 2012. [Google Scholar]
- [33].Ngiam J, Khosla A, Kim M, Nam J, Lee H, and Ng AY, “Multimodal Deep Learning,” Proc. 28th Int. Conf. Mach. Learn, pp. 689–696, 2011. [Google Scholar]
- [34].Devon H, Vince C, Ruslan S, Elena A, Tulay A, and Sergey P, “Restricted Boltzmann Machines for Neuroimaging: an Application in Identifying Intrinsic Networks,” Neuroimage, vol. 344, no. 6188, pp. 1173–1178, 2015. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [35].Ronneberger O, Fischer P, and Brox T, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in Medical Image Computing and Computer-Assisted Intervention, 2015, pp. 1–8. [Google Scholar]
- [36].François Chollet, “Keras,” GitHub Repos, 2015.
- [37].Kingma DP and Ba J, “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations, 2015, pp. 1–15. [Google Scholar]
- [38].Wang Z, Bovik AC, Sheikh HR, and Simoncelli EP, “Image quality assessment: From error visibility to structural similarity,” IEEE Trans. Image Process, vol. 13, no. 4, pp. 600–612, 2004. [DOI] [PubMed] [Google Scholar]
- [39].Du B, Xiong W, Wu J, Zhang L, Zhang L, and Tao D, “Stacked Convolutional Denoising Auto-Encoders for Feature Representation,” IEEE Trans. Cybern, vol. 47, no. 4, pp. 1017–1027, 2017. [DOI] [PubMed] [Google Scholar]
- [40].Hinton GE, “Training Products of Experts by Minimizing Contrastive Divergence,” Neural Comput, vol. 14, no. 8, pp. 1771–1800, 2002. [DOI] [PubMed] [Google Scholar]
- [41].Hinton GE, Osindero S, and Teh YW, “A fast learning algorithm for deep belief nets.,” Neural Comput, vol. 18, no. 7, pp. 1527–54, 2006. [DOI] [PubMed] [Google Scholar]
- [42].Hinton G, “A Practical Guide to Training Restricted Boltzmann Machines A Practical Guide to Training Restricted Boltzmann Machines,” Momentum, vol. 9, p. 1, 2010. [Google Scholar]
- [43].Lein ES, Hawrylycz MJ, Ao N, Ayres M, Bensinger A, Bernard A, Boe AF, Boguski MS, Brockway KS, Byrnes EJ, Chen L, Chen L, Chen TM, Chin MC, Chong J, Crook BE, Czaplinska A, Dang CN, Datta S, Dee NR, Desaki AL, Desta T, Diep E, Dolbeare TA, Donelan MJ, Dong HW, Dougherty JG, Duncan BJ, Ebbert AJ, Eichele G, Estin LK, Faber C, Facer BA, Fields R, Fischer SR, Fliss TP, Frensley C, Gates SN, Glattfelder KJ, Halverson KR, Hart MR, Hohmann JG, Howell MP, Jeung DP, Johnson RA, Karr PT, Kawal R, Kidney JM, Knapik RH, Kuan CL, Lake JH, Laramee AR, Larsen KD, Lau C, Lemon TA, Liang AJ, Liu Y, Luong LT, Michaels J, Morgan JJ, Morgan RJ, Mortrud MT, Mosqueda NF, Ng LL, Ng R, Orta GJ, Overly CC, Pak TH, Parry SE, Pathak SD, Pearson OC, Puchalski RB, Riley ZL, Rockett HR, Rowland SA, Royall JJ, Ruiz MJ, Sarno NR, Schaffnit K, Shapovalova NV, Sivisay T, Slaughterbeck CR, Smith SC, Smith KA, Smith BI, Sodt AJ, Stewart NN, Stumpf KR, Sunkin SM, Sutram M, Tam A, Teemer CD, Thaller C, Thompson CL, Varnam LR, Visel A, Whitlock RM, Wohnoutka PE, Wolkey CK, Wong VY, Wood M, Yaylaoglu MB, Young RC, Youngstrom BL, Yuan XF, Zhang B, Zwingman TA, and Jones AR, “Genome-wide atlas of gene expression in the adult mouse brain,” Nature, vol. 445, no. 7124, pp. 168–176, 2007. [DOI] [PubMed] [Google Scholar]
- [44].Dong H, Allen Reference Atlas A Digital Color Brain Atlas of the C57BL/6J Male Mouse - by H. W. Dong., vol. 9, no. 1 John Wiley & Sons, 2008. [Google Scholar]
- [45].Bernard A, Lubbers LS, Tanis KQ, Luo R, Podtelezhnikov AA, Finney EM, Mcwhorter MME, Serikawa K, Lemon T, Morgan R, Copeland C, Smith K, Cullen V, Davis-turak J, Sunkin S, Loboda AP, Levine DM, Stone DJ, Roberts CJ, Jones AR, Geschwind DH, and Lein E, “Transcriptional Architecture of the Primate Neocortex,” Neuron, vol. 73, no. 6, pp. 1083–1099, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [46].Sansom SN and Livesey FJ, “Gradients in the Brain: The Control of the Development of Form and Function in the Cerebral Cortex,” Cold Spring Harb. Perspect. Biol, vol. 1, no. 2, p. 16, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [47].French L and Pavlidis P, “Relationships between gene expression and brain wiring in the adult rodent brain,” PLoS Comput. Biol, vol. 7, no. 1, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]