Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization

Hmrishav Bandyopadhyay; Zihao Deng; Leiting Ding; Sinuo Liu; Mostofa Rafid Uddin; Xiangrui Zeng; Sima Behpour; Min Xu

doi:10.1093/bioinformatics/btab794

. 2021 Nov 23;38(4):977–984. doi: 10.1093/bioinformatics/btab794

Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization

Hmrishav Bandyopadhyay ¹, Zihao Deng ², Leiting Ding ³, Sinuo Liu ⁴, Mostofa Rafid Uddin ⁵, Xiangrui Zeng ⁶, Sima Behpour ⁷, Min Xu ^8,^✉

Editor: Jinbo Xu

PMCID: PMC10060710 PMID: 34897387

Abstract

Motivation

Cryo-Electron Tomography (cryo-ET) is a 3D imaging technology that enables the visualization of subcellular structures in situ at near-atomic resolution. Cellular cryo-ET images help in resolving the structures of macromolecules and determining their spatial relationship in a single cell, which has broad significance in cell and structural biology. Subtomogram classification and recognition constitute a primary step in the systematic recovery of these macromolecular structures. Supervised deep learning methods have been proven to be highly accurate and efficient for subtomogram classification, but suffer from limited applicability due to scarcity of annotated data. While generating simulated data for training supervised models is a potential solution, a sizeable difference in the image intensity distribution in generated data as compared with real experimental data will cause the trained models to perform poorly in predicting classes on real subtomograms.

Results

In this work, we present Cryo-Shift, a fully unsupervised domain adaptation and randomization framework for deep learning-based cross-domain subtomogram classification. We use unsupervised multi-adversarial domain adaption to reduce the domain shift between features of simulated and experimental data. We develop a network-driven domain randomization procedure with ‘warp’ modules to alter the simulated data and help the classifier generalize better on experimental data. We do not use any labeled experimental data to train our model, whereas some of the existing alternative approaches require labeled experimental samples for cross-domain classification. Nevertheless, Cryo-Shift outperforms the existing alternative approaches in cross-domain subtomogram classification in extensive evaluation studies demonstrated herein using both simulated and experimental data.

Availabilityand implementation

https://github.com/xulabs/aitom.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Complex macromolecules participate in a large number of biochemical processes which helps them sustain their cellular environment and governs their cellular activities. For an accurate analysis of cellular processes enabled by the interaction of these macromolecules, an in-depth analysis of their spatial organizations and native structures within the cell is required. Such an analysis needs the observation of macromolecules in situ, which has been recently made possible with cryo-Electron Tomography (cryo-ET). Cellular cryo-ET is an imaging technology that helps to visualize 3D structures of subcellular structures from a series of 2D projections generated through an electron microscope in cryogenic temperature, where the native structure and orientation of sub-cellular components in the cell are preserved (Lučić et al., 2013).

With the advent of cryo-ET, many 3D structures in the form of tomograms can be swiftly generated for the analysis of macromolecules they contain. The difficulty, however, lies in decoding the structural information of the macromolecular complexes from these tomograms. These reconstructed 3D tomograms have complexities in the form of very low Signal-to-Nose Ratio (SNR) and missing wedge effects, making the structural recovery of these macromolecules inherently difficult. Classification of macromolecules in tomograms in such cases goes a long way toward their structural recovery from tomogram level data. To classify macromolecules, in a tomogram, we first extract subtomograms, sub-volumes of tomograms where a single macromolecule is most likely to be present. These subtomograms are then classified based on the macromolecule they contain. CNN-based methods of supervised and semi-supervised classification of subtomograms that excel in accuracy and speed were proposed by Xu et al. (2017), Che et al. (2018), Gao et al. (2020) and Liu et al. (2019). The classified subtomograms can be then aligned and averaged (Briggs, 2013) to get a higher resolution average of the corresponding structure.

While obtaining large-scale experimental data in the form of tomograms for classification is not an issue, the annotation of this data with corresponding macromolecule identifiers requires a considerable amount of time and compute. Common methods of annotation like template matching proposed by Best et al. (2007), Beck et al. (2009) and Kunz et al. (2015) are extremely time-consuming and require quality control in the form of inspection by experts, effectively making the entire process long-drawn and laborious. An apparent lack of scalability in data processing makes large-scale data of macromolecular complexes in the form of labeled subtomograms hard to obtain. To get around this issue, methods like unsupervised template-free subtomogram classification have been proposed by Xu et al. (2012), Bartesaghi et al. (2008) and Chen et al. (2014) while automatic annotation procedures using neural networks have been proposed by Chen et al. (2017). Active learning-based methods that make use of minimal labeled data have also been proposed by Du et al. (2021).

A potential solution to the scarcity of annotated data would be the development of classification algorithms that can classify subtomograms assisted by domain adaptation and randomization algorithms. Domain randomization methods work by training a classification algorithm on labeled simulated data that is easily available and randomizing this data so that the network trained on it can regularize easily to real data as proposed by Che et al. (2019). Contrary to changing (randomizing) the simulated data, domain adaptation-based methods train the deep learning-based classifier in such a way that it can classify both the simulated and the real data corresponding to the source and target domains. Domain in this context refers to data distributions that are related but different. The difference in data distributions is typically projected in the form of a domain shift in the extracted feature space. The concept of domain shift has been illustrated in Figure 1. One of the first adversarial domain adaptation methods for the classification of subtomograms was proposed by Lin et al. (2019) while a few shot method for adaptation was proposed by Yu et al. (2021). All of these methods of classification depend largely on the data simulation procedure used as it forms the base network trained on the source dataset. While some cryo-ET data simulation methods simulate isolated macromolecules, methods proposed by Liu et al. (2020) and Pei et al. (2016) can dynamically simulate complete tomograms by packing multiple macromolecular complexes with additional factors in the form of simulated noise, contrast transfer function and missing wedge effects.

Fig. 1. — Domain shift illustrated between source and target distributions

In this work, we propose the use of a network-driven domain randomization procedure in conjunction with an unsupervised domain adaptation algorithm for the classification of subtomograms. Fundamentally differing from Che et al. (2019) which focuses on randomizing hyperparameters and then performing simulations, our network-driven method Cryo-Shift of domain randomization involves gathering information from the already simulated data to add noise and distortions, allowing the model to generalize to the target domain. Furthermore, we propose the use of a multi-adversarial unsupervised domain adaptation framework inspired from Pei et al. (2018) that takes in minimal unannotated data from the target domain to bridge the domain shift between the simulated and real datasets. We follow the same data generation procedure as Che et al. (2019) in our experiments.

Our primary contributions, thus, can be summed up as:

The development of a network-driven domain randomization algorithm inspired by Zakharov et al. (2019) along with ‘warp’ modules that distort and alter the simulated data for helping the model generalize better.
The addition of an adversarial loss function with a discriminator to help the warp network provide ‘realistic’ warps.
The addition of an unsupervised domain adaptation algorithm that helps reduce the domain shift between real and simulated subtomograms, improving cross-domain classification. Being completely unsupervised, the algorithm provides an efficient solution to the problems faced in cryo-ET subtomogram annotation.

2 Methodology

2.1 Overview and notation

Our methods primarily serve to reduce the domain gap between simulated and experimental data via network-guided domain randomization and unsupervised domain adaptation. Our overall pipeline consists of a warp network W, a ‘step zero network’ M, a feature discriminator F_d and a domain discriminator D. Our data simulation module S takes in a set of hyperparameters h to simulate data from corresponding density maps ( $d_{map}$ ) of n classes (in our analysis n = 4). As the first step of our pipeline, we train the base network M, consisting of a feature extractor F_e and a classifier C, with simulated data $S^{(h)} (d_{map})$ for predicting class labels $\hat{y}$ . The feature extractor F_e denotes the convolutional blocks in M while the classifier C refers to the linear layers after F_e in M.

As the second step of the pipeline, we train the warp network W, the feature discriminator F_d, the domain discriminator D and retrain the base network M simultaneously. The warp network W is fed with simulated data $S^{(h)} (d_{map})$ that it distorts to return a warped set. The domain discriminator D and the warp network W are trained adversarially where D tries to classify the outputs of W as simulated and W tries to fool D into making wrong decisions. The feature extractor from the base network M, F_e, is fed with a concatenated dataset consisting of the output of the warp network $W (S^{(h)} (d_{map}))$ , the simulated data $S^{(h)} (d_{map})$ and a subset $d_{r 1}$ of the real data D_R. While the labels of the simulated data and the output from the warp network are automatically present, the labels of $d_{r 1}$ are not used here. Additionally, the output from the warp network is detached from the computational graph before being fed to F_e. The output from F_e is fed to the feature discriminator F_d and the classifier C. Here, F_d and F_e are trained adversarially while C is trained only on the labeled subset of the data from the feature extractor F_e. While F_d tries to classify the features from F_e into real and simulated domains, C tries to classify the features into classes accurately. An important difference between the domain discriminator D and the feature discriminator F_d is that the former works directly on subtomograms while the latter works on features. Thus, while the former tries to change the subtomogram to look more like real subtomograms, the latter makes sure that the features learned by the network are similar in nature. The overall algorithm is described in Algorithm 1. Figure 2 provides a diagrammatic representation of our method architecture.

Fig. 2. — Model overview. (a) Diagrammatic representation of warp modules and domain discriminator. (b) Diagrammatic representation of the multi-adversarial domain adaptation framework used. The Source Filter here refers to a filter that allows the classifier to back-propagate on loss calculated only on the simulated and the warped (source) data

Algorithm 1 .

Overall Algorithm

Input: hyperparamters h, Density map $d_{map}$ , Experimental Data D_R

Output Trained Network M

1: procedure train(h, $d_{map}$ ,D_R)

2: Load M ▹ Pre-trained Step-Zero Network

3: $d_{r 1}, d_{r 2} \leftarrow$ split(D_R) ▹ 1:9 split

4: for i in num_iterations do

5: $W_{out} \leftarrow W (S^{(h)} (d_{map}))$ ▹ Domain Randomization

6: $d_{s} \leftarrow concat (W_{out}, S^{(h)} (d_{map}), d_{r 1})$

7: $(feat, cls_prob) \leftarrow M (d_{s})$

8: $F_{pred} \leftarrow F_{d} (feat)$

9: $D_{pred} \leftarrow D (concat (W_{out}, sample (d_{r 1})))$

10: $F_{Loss} \leftarrow ℓ_{f d} (F_{pred}, g t_{f}) * cls_prob$ ▹ Adaptation

11: $C_{Loss} \leftarrow ℓ_{task} (cls_prob [source], g t_{c})$

12: $D_{Loss} \leftarrow ℓ_{d d} (D_{pred}, g t_{d})$

13: $W \leftarrow W_{step}$

14: $M \leftarrow M_{step}$

15: $D \leftarrow D_{step}$

16: $F \leftarrow F_{step}$

17: end for

18: return M ▹ Trained network

19: end procedure

2.2 Step zero network

The step zero network or the base network is a simple classification network we pre-train on the simulated data $S^{(h)} (d_{map})$ . This network forms the baseline upon which further training takes place with the help of warp modules and adaptation algorithms. To establish the consistent nature of out algorithm, we make use of multiple architectures in our ablation experiments (in Section 2.7) while proceeding with a single architecture, CB3D from (Che et al., 2018), in the main experiment. A list of the model architectures used along with their corresponding parameters is provided in Table 1.

Table 1.

Networks used

Architecture	Parameter count
CB3D (Che et al., 2018)	72.2 M
DSRF3D_v2 (Che et al., 2018)	18.2 M
DenseNet3D (Hara et al., 2018)	11.2 M
ResNet3D (Hara et al., 2018)	85.1 M

Open in a new tab

2.3 Warp network

The Warp Network W is built of a set of warp modules that distort or alter the simulated data and help the base network generalize more to the experimental data during inference. Domain randomization in the context of the warp modules we use is a misnomer because while domain randomization modules deal with randomizing the source domain to help the model generalize better to any target domain, we specialize our warp network to work on a particular target domain. Although this restricts the portability of our model onto any target dataset, this helps us enhance the performance of our network on the single target dataset we train it on. Thus, while Zakharov et al. (2019) proposes making use of a gradient reversal layer (GRL) to oppose the classifier for training the randomization network, we propose the use of a discriminator with a GRL to oppose our warp network. Here, we pass the output of the warp network and randomly sampled data from the validation set in equal proportions to the discriminator that is tasked with classifying the data as real or fake. The presence of the GRL between the warp network and the discriminator makes the warp network inclined to fool the discriminator into classifying the warped data as real. The concept of GRL, inspired from Ganin and Lempitsky (2014) is that the gradients are reversed during backpropagation, thereby allowing the Discriminator and the warp network to train in opposing manners.

The warp network consists of an encoder–decoder-based architecture with the output of a single encoder E being fed to multiple decoders, each emulating one warping module. The encoder consists of three convolutional blocks with skip connections that connect to the decoder layers and are concatenated there. The input to the encoder is batch-wise 3D cryo-ET subtomogram data in the form of $I_{n} \in R^{b \times 1 \times 40 \times 40 \times 40}$ with b denoting the batchsize. The output from the encoder, the bottleneck B_e can be expressed as $O_{e} \in R^{b \times 256 \times 5 \times 5 \times 5}$ . This bottleneck is split into two equal parts channel-wise and fed to the two decoders which act as warp modules. The warping modules we make use of are:

Noise Module (W_n)
Distortion Module (W_d)

2.3.1 Noise module

The noise module (W_n) adds noise to the input subtomogram I_n as the first randomization step. The encoder split is then passed through a simple decoder with three convolutional blocks where the skip connections from the encoder are concatenated at their respective layers. The output from the last decoder block is passed through a scaled sigmoid activation function, restricting its value to $(- 0.1, 0.1)$ . The final output $d_{0} \in R^{b \times 1 \times 40 \times 40 \times 40}$ is the summed up with the input to the module I_n and is represented as $O_{n} \in R^{b \times 1 \times 40 \times 40 \times 40}$ .

2.3.2 Distortion module

The distortion module takes in two inputs, O_n from the noise module and a channel-wise split of the encoder output E_dec. E_dec is fed to two subnetworks: the first consisting of a single convolutional and linear layer while the second being a decoder with three convolutional blocks with the corresponding skip connections from the encoder concatenated. The first network provides an output in the form of a single value per sample per channel $α R^{b \times 3}$ restricted to (0, 1), while the second decoder gives out a 3D dense grid $G \in R^{b \times 3 \times 40 \times 40 \times 40}$ . This dense-grid is then convolved with a 7 × 7 Gaussian 3D kernel having σ = 1 and multiplied in a channel-wise fashion with α where the three channels store values of x, y and z coordinates. The values in this new dense-grid grid_d are then mapped with the coordinates in O_n and the corresponding deformed output subtomogram O_d is created.

gri d_{d} = conv (W_{d 1} (E_{dec}), G_{7 \times 7}) * W_{d 1} (E_{dec})

(1)

O_{d} = map (O_{n}, gri d_{d})

(2)

2.4 Domain discriminator

The Domain discriminator D consists of an encoder module, acting as a binary classifier, and is pre-trained on simulated data and the unlabeled $d_{r 1}$ subset of real data D_R. The discriminator predicts if the data it is supplied with is from the real dataset or the simulated dataset. While it strives to reduce the discriminative loss $ℓ_{d d}$ and classify the domains correctly, the warp network training alongside adversarially, tries to fool it in classifying its output as real.

2.5 Domain adaptation

We propose an unsupervised domain adaptation paradigm inspired by Pei et al. (2018). This paradigm consists of n similar feature discriminators where n represents the number of classes our classifier has. These discriminators are simply binary classifiers that try to classify the features to their respective domains (simulated or real). A concatenated set consisting of output from the warp network, simulated subtomograms and real subtomogram data ( $d_{r 1}$ ) in equal proportions is fed to the feature extractor F_e of base model M at each iteration. These features from F_e are fed to each of the n feature discriminators. The domain labels for the simulated and warped data are zero, while the domain label for the experimental data is set to one. A weighted gradient reversal layer (GRL) is set up between the discriminators and the feature extractor, forming an adversarial relationship between the networks during training. During the calculation of the mean discriminative loss for training the feature discriminators, a weighting system is used with the predicted class probabilities from the classifier C (detached from the computational graph), for the same features, acting as weights as demonstrated in Equation 5. The pre-trained classifier, which can already predict real subtomograms with considerable accuracy, comes in use now as the discriminative loss for a particular discriminator is multiplied with the probability of the subtomogram belonging to that particular class as predicted by the classifier.

2.6 Objective function

The step-zero network, M, is trained on the domain randomized simulated data $S^{(h)}$ using $ℓ_{task}$ , a categorical cross-entropy loss as the loss function. We make use of this pre-trained model M in the next stage where we simultaneously train the base network M, the domain discriminator D, the warp network W and the feature discriminator F_d. The discriminative loss $ℓ_{disc}$ is a combination of the domain discriminative loss $ℓ_{d d}$ used to train D and W and the feature discriminative loss $ℓ_{f d}$ used to train F_d and F_e. As in the previous stage, in this stage also a task loss $ℓ_{task}$ is used to train the step-zero network M (F_e and C). The corresponding loss function for the entire algorithm can be expressed as:

ℓ_{train} = ℓ_{task} (M (W (S^{(h)} (d_{map}))), y) + ℓ_{disc}

(3)

ℓ_{disc} = ℓ_{d d} (D, t_{d}) + ℓ_{f d} (F, t_{f})

(4)

where y represents the task ground truth. t_d and t_f represent the domain ground truth for the two discriminators.

The components of $ℓ_{disc}, ℓ_{f d}$ and $ℓ_{d d}$ can be represented as:

ℓ_{f d} = \sum_{i = 1}^{N} \sum_{j = 1}^{X} (t_{f j} log (F_{i} (y_{j})) - (1 - t_{f j}) \log (1 - F_{i} (y_{j}))) pro b_{j i}

(5)

ℓ_{d d} = BCELoss (D, t_{d})

(6)

where F_i represents the individual discriminators from 1 to N with N denoting the total classes (four in our case). y_j represents the one-dimensional features from F_e for the j^th sample with X denoting the total samples in a batch. prob_j represents the class confidence of the i^th class for the j^th sample.

2.7 Training

We follow the methodology described above and pre-train the step-zero network as a basic classification module. We use Adam as an optimizer with a learning rate of 1e-4 and set betas to 0.9 and 0.999. We make use of a similar optimizer in the second phase of the training with the learning rate set to 1e-5 for the step zero network and the learning rates for the warp network, domain discriminator and the feature discriminator set to 5e-3, 1e-6and 1e-5, respectively. Amongst these, the optimizer of the base network is monitored with its learning rate reduced when the loss plateaus. The loss checked for this plateau condition is the task loss $ℓ_{task}$ on the validation set we constructed earlier and used as an unannotated data source. We train the step zero network for 120 epochs in the first stage and the 50 epochs in the next stage of the pipeline.

3 Experiments

3.1 Simulated dataset

The simulated data we use in our work is generated using a data generation process similar to (Che et al., 2019). The data generation process starts with density maps obtained directly from EMDB (Electron Microscopy Data Base) accessions or running situs package (Wriggers et al., 1999) on RCSB PDB (Protein Data Bank) files. In our experiments, density maps of four distinct classes (Ribosome, Proteosome, TRiC and Membrane) were obtained from EMDB accessions. These maps have a shape of $40 \times 40 \times 40$ with a voxel spacing of 1.368 nm. To simulate the missing wedge effect, the 3D structural data is projected on a series of 2D projection images with varying tilt angles based on the Maximum Wedge Angle hyperparameter specified. This projection data is then convolved with Contrast Transfer Function (CTF) and Modulation Transfer Function (MTF), adjusted by hyperparameters like Defocus and spherical aberration, for the simulation of the respective optical effects. At this stage, Gaussian Noise is added to the data to simulate the corresponding SNR level and the data is back-projected to obtain the 3D subtomogram. This 3D data is randomized by sampling hyperparameters uniformly from the following list, while in the specific case of the SNR, the sampling is done at a logarithmic scale.

SNR: 0.03 to 10
MWA: 0° to 50°
Dz: −12 to 0
Cs: 1.5 to 3.0

where MWA stands for Maximum Wedge Angle and Cs and Dz are hyperparameters for Spherical Abberation and Defocus, respectively. A sample from the simulated data is provided in Figure 4. Further, samples demonstrating the qualitative effect of these hyperparameters on the simulation is available in Supplementary Material.

Fig. 4. — 2D slice visualization of simulated subtomograms

3.2 Experimental dataset

The real dataset we use in our experiments consists of 1051 subtomograms, which have been processed by template search and have been filtered manually from rat neuron tomograms (Guo et al., 2018). The data has a tilt angle range of −50 $^{°}$ to +70 $^{°}$ and is divided into four classes, namely ribosome, mitochondrial membrane, TRiC and single capped proteasome. A summary of the class-wise distribution of the data in terms of the number of subtomograms is available in Table 2. A figure containing slices of a subtomogram belonging to each class is available in Figure 3. This data is further split in a 1:9 ratio where the smaller part of the split is fed to the network as unlabeled real data for randomization and adaptation. Since this smaller subset of the data is passed as unlabeled and does not actively participate in reducing the classification loss, we reuse it as a validation set in our experiments.

Table 2.

Ablation studies in domain randomization on CB3D

Modules used	Loss function	Macro Avg.
$W_{n} + W_{d}$	$ℓ_{task}^{a}$	0.72
W_n	$ℓ_{task} + ℓ_{d d}$	0.72
W_d	$ℓ_{task} + ℓ_{d d}$	0.75
$W_{d} + W_{n}$	$ℓ_{t ask} + ℓ_{d d}$	0.77

Open in a new tab

The warp network was trained on reverse grad with the classifier. Bold entries represent the methodology described in the paper while the other entries in the corresponding tables either represent ablation experiments with different methodologies or represent comparisons with relevant algorithms.

Fig. 3. — 2D slice visualization of experimental subtomograms

3.3 Ablation study

We perform ablation experiments for the warp network, as the feature discriminator part of our algorithm does not involve multiple modules. For our ablation studies, we use multiple neural networks varying in depth to demonstrate the efficacy of our methods on more than one network architecture. The network architectures we use along with their trainable parameter counts are listed in 1. Table 3 presents the performance of the warp module along with the performance of corresponding step zero networks (in terms of macro-average accuracy across four classes) while Figure 6 presents a visual demonstration of the effect of these modules. The presence of ‘(W)’ along with the neural network architecture used denotes that the model has been trained with the warp network, while the absence of this notation implies the performance recorded is that of the pre-trained step zero network. An increasing average accuracy while using the warp modules is expected here as we are not only using the domain randomization algorithm to fool the discriminator, but are also training the classifier to keep up with it. Table 4 presents more ablation studies using CB3D as the base network with modules zeroed out at places to demonstrate their individual contributions to the network. Table 4 also includes a study where we remove the domain discriminator and force the warp modules to train against the classification loss as proposed in Zakharov et al. (2019). Our ablation studies demonstrate the efficacy of each module in the warp network along with individual components of the loss function we use. Additional ablation experiments on simulated data only are provided in Supplementary Material.

Table 3.

Classwise accuracy comparison with step-zero networks

Architecture	C ₀	C ₁	C ₂	C ₃	Avg.
CB3D	0.70	0.63	0.61	0.91	0.71
CB3D (W)	0.71	0.75	0.69	0.92	0.77
DSRF3D_v2	0.69	0.64	0.59	0.91	0.71
DSRF3D_v2 (W)	0.73	0.66	0.67	0.89	0.74
DenseNet3D	0.70	0.54	0.53	0.97	0.69
DenseNet3D (W)	0.84	0.73	0.45	0.96	0.74
ResNet3D	0.77	0.50	0.54	0.91	0.68
ResNet3D (W)	0.70	0.57	0.59	0.96	0.70

Open in a new tab

Bold entries represent the methodology described in the paper while the other entries in the corresponding tables either represent ablation experiments with different methodologies or represent comparisons with relevant algorithms.

Fig. 6. — Unsupervised domain adaptation results

Table 4.

Experimental data

Class label	Subtomograms	Validation set	Test set
Ribosome (C₀)	80	8	72
Proteasome (C₁)	386	38	348
TRiC (C₂)	125	12	113
Membrane (C₃)	460	46	414
Total	1051	104	947

Open in a new tab

4 Results

The extensive results of randomization and adaptation on the CB3D model are presented in Table 5. We compare our results with relevant domain adaptation methods like FSADA (Yu et al., 2021) and FADA (Motiian et al., 2017) in Table 6. While these are domain adaptation methods that make use of labels in limited amounts, we propose a completely unsupervised procedure of the same and outperform these methods by a large margin (7%). To further ensure a fair comparison with these methods, we train all of them on the same architecture as proposed by Yu et al. (2021). We also include t-SNE plots in Figure 5 that represent feature spaces extracted from M to visualize the reduction of domain shift between simulated and experimental data brought about by our training methodology. The t-SNE plots give a visual of how a domain shift is present between the features of the real and the simulated data. It further demonstrates how a reduction in the same has been achieved in the features extracted from our network.

Table 5.

Adaptation results on CB3D

Label	Precision	Recall	F1_score	Support
Ribosome	0.65	0.99	0.78	72
Proteasome	0.90	0.88	0.89	348
TRiC	0.83	0.78	0.80	113
Membrane	0.99	0.94	0.96	414
Accuracy			0.90	947
Macro Avg.	0.84	0.90	0.86	947
Weighted Avg.	0.91	0.90	0.90	947

Open in a new tab

Table 6.

Accuracy comparison with other domain adaptation methods

Methods	C ₀	C ₁	C ₂	C ₃	Avg.
FADA 3-shot (Motiian et al., 2017)	0.62	0.70	0.66	0.91	0.72
FADA 5-shot	0.61	0.76	0.64	0.91	0.73
FADA 7-shot	0.78	0.64	0.62	0.91	0.73
FSDA 3-shot (Yu et al., 2021)	0.61	0.70	0.67	0.87	0.71
FSDA 5-shot	0.64	0.78	0.66	0.92	0.75
FSDA 7-shot	0.65	0.78	0.70	0.94	0.77
Cryo-shift ^a	0.75	0.85	0.83	0.95	0.84

Open in a new tab

Trained with our methodology on architecture from FSDA. Bold entries represent the methodology described in the paper while the other entries in the corresponding tables either represent ablation experiments with different methodologies or represent comparisons with relevant algorithms.

Fig. 5. — Individual affects of the warp modules on a subtomogram slice

5 Conclusion

The study of in situ macromolecular complexes enabled with cryo-electron tomography is essential for the analysis of various cellular processes. Shortcomings in the imaging procedure in conjunction with high structural complexity in tomograms make this analysis difficult to perform. The classification of subtomograms as sub-volumes of these tomograms forms an integral step in the structural recovery of macromolecules along with aiding in their analysis. Supervised classification methods, however, require large-scale annotated data that is extremely scarce. In this work, we propose an unsupervised domain adaptation and randomization framework which can help train neural networks completely on simulated data and unlabeled experimental data. We first train a base network on simulated data and then use a network-driven domain randomization framework to help the network make better predictions on the experimental dataset. Additionally, we make use of a multi-adversarial domain adaptation module where we help the classifier generate similar features for both real and simulated data, thus mitigating the domain shift between the real and the simulated data features and helping the model classify better in both of these domains. Our work can be further used for automated annotation of subtomograms after being trained on similar simulated data, thus forming an important step in the completely autonomous recognition and structural recovery of subtomograms with the help of neural networks for better analysis of macromolecules and their interactions.

We find our work can be improved upon with future works aimed at further development of simulation algorithms to bridge the domain gap between real and simulated data. Unsupervised deep learning-based algorithms for the analysis of genomic data (Liu et al., 2021b) have made considerable progress with the help of generative networks. A possibility lies in the use of novel conditional generative networks (Liu et al., 2021a; Vahdat et al., 2021) to generate class-specific cryo-et data for improving upon the algorithms demonstrated in this work.

Supplementary Material

btab794_supplementary_data

Click here for additional data file.^{(4.8MB, pdf)}

Acknowledgements

The authors thank Dr Qiang Guo for sharing with us experimental rat-neuron tomograms that the authors make use of as real data.

Funding

This work was supported in part by U.S. National Institutes of Health (NIH) [R01GM134020 and P41GM103712], U.S. National Science Foundation (NSF) [DBI-1949629 and IIS-2007595], AMD COVID-19 HPC Fund and the Mark Foundation For Cancer Research [19-044-ASP]. X.Z. was supported in part by a fellowship from Center of Machine Learning and Health at Carnegie Mellon University.

Conflict of Interest: none declared.

Contributor Information

Hmrishav Bandyopadhyay, Department of Electronics and Telecommunication Engineering, Jadavpur University, Kolkata 700032, India.

Zihao Deng, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Leiting Ding, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Sinuo Liu, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Mostofa Rafid Uddin, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Xiangrui Zeng, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Sima Behpour, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

Min Xu, Computational Biology Department, Carnegie Mellon University, Pittsburgh, PA 15213, USA.

References

Bartesaghi A. et al. (2008) Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol., 162, 436–450. [DOI] [PMC free article] [PubMed] [Google Scholar]
Beck M. et al. (2009) Visual proteomics of the human pathogen leptospira interrogans. Nat. Methods, 6, 817–823. [DOI] [PMC free article] [PubMed] [Google Scholar]
Best C. et al. (2007) Localization of protein complexes by pattern recognition. Methods Cell Biol., 79, 615–638. [DOI] [PubMed] [Google Scholar]
Briggs J.A. (2013) Structural biology in situ—the potential of subtomogram averaging. Curr. Opin. Struct. Biol., 23, 261–267. [DOI] [PubMed] [Google Scholar]
Che C. et al. (2018) Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. Mach. Vis. Appl., 29, 1227–1236. [DOI] [PMC free article] [PubMed] [Google Scholar]
Che C. et al. (2019) Domain randomization for macromolecule structure classification and segmentation in electron cyro-tomograms. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) San Diego, CA, USA. IEEE. pp. 6–11.
Chen M. et al. (2017) Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat. Methods, 14, 983–985. [DOI] [PMC free article] [PubMed] [Google Scholar]
Chen Y. et al. (2014) Autofocused 3D classification of cryoelectron subtomograms. Structure, 22, 1528–1537. [DOI] [PubMed] [Google Scholar]
Du X. et al. (2021) Active learning to classify macromolecular structures in situ for less supervision in cryo-electron tomography. Bioinformatics, 37, 2340–2346. [DOI] [PMC free article] [PubMed] [Google Scholar]
Ganin, Y., and Lempitsky, V. (2015, June). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180-1189). PMLR.
Gao S. et al. (2020) Dilated-densenet for macromolecule classification in cryo-electron tomography. In: International Symposium on Bioinformatics Research and Applications, Moscow, Russia. Springer, pp. 82–94. [DOI] [PMC free article] [PubMed]
Guo Q. et al. (2018) In situ structure of neuronal C9ORF72 poly-GA aggregates reveals proteasome recruitment. Cell, 172, 696–705. [DOI] [PMC free article] [PubMed] [Google Scholar]
Hara K. et al. (2018) Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah. pp. 6546–6555.
Kunz M. et al. (2015) M-free: mask-independent scoring of the reference bias. J. Struct. Biol., 192, 307–311. [DOI] [PubMed] [Google Scholar]
Lin R. et al. (2019) Adversarial domain adaptation for cross data source macromolecule in situ structural classification in cellular electron cryo-tomograms. Bioinformatics, 35, i260–i268. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Q. et al. (2021a) Density estimation using deep generative neural networks. Proc. Natl. Acad. Sci. USA, 118, e2101344118. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu Q. et al. (2021b) Simultaneous deep generative modelling and clustering of single-cell genomic data. Nat. Mach. Intell., 3, 536–544. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liu S. et al. (2019) Semi-supervised macromolecule structural classification in cellular electron cryo-tomograms using 3D autoencoding classifier. In: BMVC, Cardiff University, UK.
Liu S. et al. (2020) Efficient cryo-electron tomogram simulation of macromolecular crowding with application to SARS-COV-2. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 80–87.
Lučić V. et al. (2013) Cryo-electron tomography: the challenge of doing structural biology in situ. J. Cell Biol., 202, 407–419. [DOI] [PMC free article] [PubMed] [Google Scholar]
Motiian S. et al. (2017) Few-shot adversarial domain adaptation. Adv. Neural Inf. Process. Syst., 30, 6670–6680. [Google Scholar]
Pei L. et al. (2016) Simulating cryo electron tomograms of crowded cell cytoplasm for assessment of automated particle picking. BMC Bioinformatics, 17, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]
Pei Z. et al. (2018) Multi-adversarial domain adaptation. Proc. AAAI Conf. Artif. Intell., 32. [Google Scholar]
Vahdat A. et al. (2021) Score-based generative modeling in latent space. arXiv preprint arXiv, 2106, 05931.
Wriggers W. et al. (1999) Situs: a package for docking crystal structures into low-resolution maps from electron microscopy. J. Struct. Biol., 125, 185–195. [DOI] [PubMed] [Google Scholar]
Xu M. et al. (2012) High-throughput subtomogram alignment and classification by Fourier space constrained fast volumetric matching. J. Struct. Biol., 178, 152–164. [DOI] [PMC free article] [PubMed] [Google Scholar]
Xu M. et al. (2017) Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics, 33, i13–i22. [DOI] [PMC free article] [PubMed] [Google Scholar]
Yu L. et al. (2021) Few shot domain adaptation for in situ macromolecule structural classification in cryoelectron tomograms. Bioinformatics, 37, 185–191. [DOI] [PMC free article] [PubMed] [Google Scholar]
Zakharov S. et al. (2019) Deceptionnet: network-driven domain randomization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea. pp. 532–541.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

btab794_supplementary_data

Click here for additional data file.^{(4.8MB, pdf)}

[btab794-B1] Bartesaghi A. et al. (2008) Classification and 3D averaging with missing wedge correction in biological electron tomography. J. Struct. Biol., 162, 436–450. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B2] Beck M. et al. (2009) Visual proteomics of the human pathogen leptospira interrogans. Nat. Methods, 6, 817–823. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B3] Best C. et al. (2007) Localization of protein complexes by pattern recognition. Methods Cell Biol., 79, 615–638. [DOI] [PubMed] [Google Scholar]

[btab794-B4] Briggs J.A. (2013) Structural biology in situ—the potential of subtomogram averaging. Curr. Opin. Struct. Biol., 23, 261–267. [DOI] [PubMed] [Google Scholar]

[btab794-B5] Che C. et al. (2018) Improved deep learning-based macromolecules structure classification from electron cryo-tomograms. Mach. Vis. Appl., 29, 1227–1236. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B6] Che C. et al. (2019) Domain randomization for macromolecule structure classification and segmentation in electron cyro-tomograms. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) San Diego, CA, USA. IEEE. pp. 6–11.

[btab794-B7] Chen M. et al. (2017) Convolutional neural networks for automated annotation of cellular cryo-electron tomograms. Nat. Methods, 14, 983–985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B8] Chen Y. et al. (2014) Autofocused 3D classification of cryoelectron subtomograms. Structure, 22, 1528–1537. [DOI] [PubMed] [Google Scholar]

[btab794-B9] Du X. et al. (2021) Active learning to classify macromolecular structures in situ for less supervision in cryo-electron tomography. Bioinformatics, 37, 2340–2346. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B10] Ganin, Y., and Lempitsky, V. (2015, June). Unsupervised domain adaptation by backpropagation. In International conference on machine learning (pp. 1180-1189). PMLR.

[btab794-B11] Gao S. et al. (2020) Dilated-densenet for macromolecule classification in cryo-electron tomography. In: International Symposium on Bioinformatics Research and Applications, Moscow, Russia. Springer, pp. 82–94. [DOI] [PMC free article] [PubMed]

[btab794-B12] Guo Q. et al. (2018) In situ structure of neuronal C9ORF72 poly-GA aggregates reveals proteasome recruitment. Cell, 172, 696–705. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B13] Hara K. et al. (2018) Can spatiotemporal 3D CNNs retrace the history of 2D CNNs and imagenet? In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah. pp. 6546–6555.

[btab794-B14] Kunz M. et al. (2015) M-free: mask-independent scoring of the reference bias. J. Struct. Biol., 192, 307–311. [DOI] [PubMed] [Google Scholar]

[btab794-B15] Lin R. et al. (2019) Adversarial domain adaptation for cross data source macromolecule in situ structural classification in cellular electron cryo-tomograms. Bioinformatics, 35, i260–i268. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B16] Liu Q. et al. (2021a) Density estimation using deep generative neural networks. Proc. Natl. Acad. Sci. USA, 118, e2101344118. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B17] Liu Q. et al. (2021b) Simultaneous deep generative modelling and clustering of single-cell genomic data. Nat. Mach. Intell., 3, 536–544. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B18] Liu S. et al. (2019) Semi-supervised macromolecule structural classification in cellular electron cryo-tomograms using 3D autoencoding classifier. In: BMVC, Cardiff University, UK.

[btab794-B19] Liu S. et al. (2020) Efficient cryo-electron tomogram simulation of macromolecular crowding with application to SARS-COV-2. In: 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE. pp. 80–87.

[btab794-B20] Lučić V. et al. (2013) Cryo-electron tomography: the challenge of doing structural biology in situ. J. Cell Biol., 202, 407–419. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B21] Motiian S. et al. (2017) Few-shot adversarial domain adaptation. Adv. Neural Inf. Process. Syst., 30, 6670–6680. [Google Scholar]

[btab794-B22] Pei L. et al. (2016) Simulating cryo electron tomograms of crowded cell cytoplasm for assessment of automated particle picking. BMC Bioinformatics, 17, 1–13. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B23] Pei Z. et al. (2018) Multi-adversarial domain adaptation. Proc. AAAI Conf. Artif. Intell., 32. [Google Scholar]

[btab794-B24] Vahdat A. et al. (2021) Score-based generative modeling in latent space. arXiv preprint arXiv, 2106, 05931.

[btab794-B25] Wriggers W. et al. (1999) Situs: a package for docking crystal structures into low-resolution maps from electron microscopy. J. Struct. Biol., 125, 185–195. [DOI] [PubMed] [Google Scholar]

[btab794-B26] Xu M. et al. (2012) High-throughput subtomogram alignment and classification by Fourier space constrained fast volumetric matching. J. Struct. Biol., 178, 152–164. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B27] Xu M. et al. (2017) Deep learning-based subdivision approach for large scale macromolecules structure recovery from electron cryo tomograms. Bioinformatics, 33, i13–i22. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B28] Yu L. et al. (2021) Few shot domain adaptation for in situ macromolecule structural classification in cryoelectron tomograms. Bioinformatics, 37, 185–191. [DOI] [PMC free article] [PubMed] [Google Scholar]

[btab794-B29] Zakharov S. et al. (2019) Deceptionnet: network-driven domain randomization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, South Korea. pp. 532–541.

PERMALINK

Cryo-shift: reducing domain shift in cryo-electron subtomograms with unsupervised domain adaptation and randomization

Hmrishav Bandyopadhyay

Zihao Deng

Leiting Ding

Sinuo Liu

Mostofa Rafid Uddin

Xiangrui Zeng

Sima Behpour

Min Xu

Roles

Abstract

Motivation

Results

Availabilityand implementation

Supplementary information

1 Introduction

Fig. 1.

2 Methodology

2.1 Overview and notation

Fig. 2.

Algorithm 1 .

2.2 Step zero network

Table 1.

2.3 Warp network

2.3.1 Noise module

2.3.2 Distortion module

2.4 Domain discriminator

2.5 Domain adaptation

2.6 Objective function

2.7 Training

3 Experiments

3.1 Simulated dataset

Fig. 4.

3.2 Experimental dataset

Table 2.

Fig. 3.

3.3 Ablation study

Table 3.

Fig. 6.

Table 4.

4 Results

Table 5.

Table 6.

Fig. 5.

5 Conclusion

Supplementary Material

Acknowledgements

Funding

Contributor Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases