Abstract
Advances in biological and medical technologies have been providing us explosive volumes of biological and physiological data, such as medical images, electroencephalography, genomic and protein sequences. Learning from these data facilitates the understanding of human health and disease. Developed from artificial neural networks, deep learning-based algorithms show great promise in extracting features and learning patterns from complex data. The aim of this paper is to provide an overview of deep learning techniques and some of the state-of-the-art applications in the biomedical field. We first introduce the development of artificial neural network and deep learning. We then describe two main components of deep learning, i.e., deep learning architectures and model optimization. Subsequently, some examples are demonstrated for deep learning applications, including medical image classification, genomic sequence analysis, as well as protein structure classification and prediction. Finally, we offer our perspectives for the future directions in the field of deep learning.
Keywords: Deep learning, Big data, Bioinformatics, Biomedical informatics, Medical image, High-throughput sequencing
Introduction
Deep learning is a recent and fast-growing field of machine learning. It attempts to model abstraction from large-scale data by employing multi-layered deep neural networks (DNNs), thus making sense of data such as images, sounds, and texts [1]. Deep learning in general has two properties: (1) multiple layers of nonlinear processing units, and (2) supervised or unsupervised learning of feature presentations on each layer [1]. The early framework for deep learning was built on artificial neural networks (ANNs) in the 1980s [2], while the real impact of deep learning became apparent in 2006 [3], [4]. Since then, deep learning has been applied to a wide range of fields, including automatic speech recognition, image recognition, natural language processing, drug discovery, and bioinformatics [5], [6], [7].
The past decades have witnessed a massive growth in biomedical data, such as genomic sequences, protein structures, and medical images, due to the advances of high-throughput technologies. This deluge of biomedical big data necessitates effective and efficient computational tools to store, analyze, and interpret such data [5], [8]. Deep learning-based algorithmic frameworks shed light on these challenging problems. The aim of this paper is to provide the bioinformatics and biomedical informatics community an overview of deep learning techniques and some of the state-of-the-art applications of deep learning in the biomedical field. We hope this paper will provide readers an overview of deep learning, and how it can be used for analyzing biomedical data.
The development of ANNs
As a basis for deep learning, ANNs were inspired by biological processes in the 1960s, when it was discovered that different visual cortex cells were activated when cats visualized different objects [9], [10]. These studies illustrated that there were connections between the eyes and the cells of the visual cortex, and that the information was processed layer by layer in the visual system. ANNs mimicked the perception of objects by connecting artificial neurons within layers that could extract the features of objects [11], [12], [13], [14], [15], [16]. However, ANN research stagnated after the 1960s, due to the low capability resulting from its shallow structures and the limited computational capacity of computers at that time [17].
Thanks to the improvement in computer capabilities and methodologies [18], ANNs with efficient backpropagation (BP) facilitated studies on pattern recognition [19], [20], [21], [22], [23]. In a neural network with BP, classifications were first processed by the ANN model, and weights were then modified by evaluating the difference between the predicted and the true class labels. Although BP helped to minimize errors through gradient descent, it seemed to work only for certain types of ANNs [24]. Through improving the steeper gradients with BP, several learning methods were proposed, such as momentum [25], adaptive learning rate [26], [27], [28], least-squares methods [29], [30], quasi-Newton methods [31], [32], [33], [34], and conjugate gradient (CG) [35], [36]. However, due to the complexity of ANNs, other simple machine learning algorithms, such as support vector machines (SVMs) [37], random forest [38], [39], and k-nearest neighbors algorithms (k-NN) [40], gradually overtook ANNs in popularity (Figure 1).
Figure 1.
Timeline of the development of deep learning and commonly-used machine learning algorithmsThe development of deep learning and neural networks is shown in the top panel, and several commonly-used machine learning algorithms are shown in the bottom panel. NN, neural network; BP, backpropagation; DBN, deep belief network; SVM, support vector machine; AE: auto-encoder; VAE: variational AE; GAN: generative adversarial network; WGAN: Wasserstein GAN.
The development of deep learning
An ANN with more hidden layers offers much higher capacity for feature extraction [4]. However, an ANN often converges to the local optimum, or encounters gradient diffusion when it contains deep and complex structures [41]. A gradient propagated backwards rapidly diminishes in magnitude along the layers, resulting in slight modification to the weights in the layers near the input (http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial) [42]. Subsequently, a layer-wise pre-training deep auto-encoder (AE) network was proposed, bringing ANNs to a new stage of development [3], [4], [43], [44], [45] (Figure 1). In this network, each layer is trained by minimizing the discrepancy between the original and the reconstructed data [4]. The layer-wise pre-training breaks the barrier of gradient diffusion [4], and also results in a better choice of weights for deep neural networks (DNNs), thereby preventing the reconstructed data from reaching a local optimum where the local optimum is usually caused by the random selection of initial weights. In addition, the employment of graphic processing units (GPUs) also renews the interest of researchers in deep learning [46], [47].
With the focus of more attention and efforts, deep learning has burgeoned in recent years and has been applied broadly in industry. For instance, deep belief networks (DBNs) and stacks of restricted Boltzmann machines (RBMs) [3], [48], [49] have been applied in speech and image recognition [3], [45], [50] and natural language processing [51]. Proposed to better mimick animals’ perceptions of objects [52], convolutional neural networks (CNN) have been widely applied in image recognition [53], [54], [55], image segmentation [56], video recognition [57], [58], and natural language processing [59]. Recurrent neural networks (RNNs) are another class of ANNs that exhibit dynamic behavior, with artificial neurons that are associated with time steps [25], [60], [61]. RNNs have become the primary tool for handling sequential data [62], and have been applied in natural language processing [63] and handwriting recognition [64]. Later on, variants of AEs, including sparse AEs, stacked AEs (SAEs), and de-noising AEs, have also gained popularity in pre-training deep networks [49], [65], [66], [67].
Although applications of deep learning have been primarily focused on image recognition, video and sound analyses, as well as natural language processing, it also opens doors in life sciences, which will be discussed in detail in the next sections.
Brief description of deep learning
Although the underlying assumptions and theories are different, the basic idea and processes for feature extraction in most deep NN (DNN) architectures are similar. In the forward pass, the network is activated by an input to the first layer, which then spreads the activation to the final layer along the weighted connections, and generates the prediction or reconstruction results. In the backward pass, the weights of connections are tuned by minimizing the difference between the predicted and the real data.
Basic concepts
Activation functions
Activation functions form the non-linear layers in all deep learning frameworks; and their combinations with other layers are used to simulate the non-linear transformation from the input to the output [62]. Therefore, better feature extraction can be achieved by selecting appropriate activation functions [7], [68], [69]. Here, we introduce several commonly-used activation functions, represented by g.
-
•
Sigmoid function: , where a is the input from the front layer. A sigmoid function transforms variables to values ranging from 0 to 1 and is commonly used to produce a Bernoulli distribution. For example:
-
•
Hyperbolic tangent: . Here, the derivative of g is calculated as , making it easy to work with in BP algorithms.
-
•
Softmax: . The softmax output, which an be considered as a probability distribution over the categories, is commonly used in the final layer.
-
•
Rectified linear unit (ReLU): . This activation function and its variants show superior performance in many cases and are the most popular activation function in deep learning so far [68], [70], [71], [72]. ReLU can also solve the gradient diffusion problem [73], [74].
-
•
Softplus: . This is one of the variants of ReLU, representing a smooth approximation of ReLU (in this article, the log always represents the natural logarithm).
-
•
Absolute value rectification: . This function is useful when the pooling layer takes the average value in CNNs [75], thus preventing otherwise the negative features and the positive features from diminishing.
-
•
Maxout: . The weight matrix in this function is a three-dimensional array, where the third array corresponds to the connection of the neighboring layers [76].
Optimization objective
An optimization objective is often composed of a loss function and a regularization term. The loss function measures the discrepancy between the output of the network depend on model parameters (θ) and the expected result y, e.g., the true class labels in classification tasks, or the true level in prediction tasks. However, a good learning algorithm performs well not only on the training data, but also on the test data. A collection of strategies designed to reduce the test error is called regularization [62]. Some regularization terms apply penalties to parameters to prevent overly complex models. Here, we briefly introduce the commonly used loss function and regularization term . The optimization objective is usually defined as:
| (1) |
where is a balance of these two components, and in practice, the loss function is usually calculated across randomly-sampled training samples rather than the data-generating distribution, since the latter is unknown.
Loss function
Most DNNs use cross entropy between the training data and the model distribution as the loss function. The most commonly used form of cross entropy is the negative conditional log-likelihood: . This is a collection of loss functions corresponding to the distribution of y given the value of input variable x. Here, we introduce several commonly used loss functions that follow this pattern:
Suppose y is continuous and has a Gaussian distribution over a given variable x. The loss function would be:
| (2) |
Which is equivalently described as the squared error. The squared error was the most commonly used loss function in the 1980s [62]. However, it often tends to penalize outliers excessively, leading to slower convergence rates [77].
If y follows the Bernoulli distribution, then the loss function will be:
| (3) |
When y is discrete and has only two values, for instance, , we can take the softmax value (see commonly-used activation functions) as the probability over the categories. Then the loss function will be:
| (4) |
Regularization term
parameter regularization is the most common form of regularization term and contributes to the convexity of the optimization objective, leading to an easy solution for the minimum using the Hessian matrix [78], [79]. parameter regularization can be defined as
| (5) |
where represents weights of connecting units in the network (the same as in the following context).
Compared to parameter regularization, parameter regularization results in a sparser solution of ω and tends to learn small groups of features. parameter regularization can be defined as
| (6) |
Frobenius parameter regularization is induced by the inner product and is block decomposable, therefore it is easier to compute [80], [81]. Frobenius parameter regularization can be defined as
| (7) |
where is the i-th largest singular value. Frobenius parameter regularization has a function similar to nuclear norm in terms of regularization.
Nuclear norm has been widely used as regularization in recent years [82], [83], [84]. Nuclear norm regularization measures the sum of the singular values of and can be defined as
| (8) |
Optimization methods
A learning task is transformed to an optimization problem, to achieve the minima of the objective function by selecting appropriate hyperparameters. The basic processes of different optimization methods are similar. First, the output and the optimization objective of the model are computed using the initial parameters . The network parameters are then tuned to decrease the objective function value from the final layer to the first layer [18]. This process is repeated until the proper model and a small fit error, i.e., loss function value, are obtained (http://deeplearning.stanford.edu/wiki/index.php/UFLDL_Tutorial).
However, different optimization methods have different advantages and disadvantages on different architectures and loss functions [62], [85]. Stochastic gradient descent (SGD) and its variants are the most-used methods, which update the parameters by a gap corresponding to the Jacobian matrix. The computation time per update does not grow too much even with a large training set [86], [87], [88]. AdaGrad updates parameters according to the accumulation of squared gradients, which can converge rapidly when applied to convex functions, but performs worse in certain models [62]. RMSProp, an AdaGrad algorithm, has been an effective and popular method for parameter optimization. Another type of algorithm makes use of second order derivatives to improve optimization. For instance, limited-memory Broyden–Fletcher–Goldfarb–Shanno algorithm (BFGS) is one type of quasi-Newton method, which iteratively refines the approximation of the inverse of the Hessian matrix and avoids storing the matrix. BFGS is good at dealing with low dimensionality problems, particularly for convolutional models [85]. In addition, conjugate gradient combines conjugacy and gradient descent in the update direction decision for parameters, efficiently avoiding the calculation of the inverse Hessian [4], [35], [36], while contrastive divergence is usually used in RBM model [89], [90], [91]. With the help of a GPU [47], many algorithms can be accelerated significantly [85].
The proper architecture and objective function should be selected according to data considered. As a type of machine learning, deep learning can also encounter “overfitting,” that is, low error on training data but high error on test data. In addition to the regularization terms, other methods for regularization are also important for reducing test error. Adding noise to the input or to the weights are efficient regularization strategies [41], [92], as in the case of a denoising AE [93]. Stopping the optimization early by setting an iteration number is another commonly used strategy to prevent the network from overfitting [62]. Parameter sharing, just like in CNN, can also contribute to regularization [94]. Dropout can force units to independently evolve, and randomly remove portions of units in ANN on each iteration, and can therefore achieve better results with inexpensive computation [73], [95], [96].
Deep learning architectures
AEs
Different from ordinary ANNs, AEs extract features from unlabeled data and set target values to be equal to the inputs [4], [49], [97]. Given the input vector , the AE tries to learn the model:
| (9) |
where W and b are the parameters of the model, g is the activation function (same definition applied in the following context), and represents the hidden units. When the number of hidden units, which represents the dimension of features, is smaller than the input dimension, the AE performs a reduction of data dimensionality similar to principal component analysis [98]. Besides pattern recognition, an AE with a classifier in the final layer can perform classification tasks as well.
RBMs and DBNs
RBMs are generative graphical models that aim to learn the distribution of training data. Since we do not know which distribution the data obeys, we cannot directly compute model parameters using the maximum likelihood principle. Boltzmann machines (BMs) use an energy function to generate the probability distribution (see Equations (12), (13) below), and then optimize parameters until the model learns the true distribution of the data. The original BMs have not been demonstrated to be useful for practical problems, while RBMs are commonly used in deep learning.
RBMs restrict the BMs to a bipartite graph, i.e., there are no connections within visible units or hidden units . This restriction ensures the conditional independency of hidden units and visible units [91], i.e.,
| (10) |
Furthermore, most RBMs rely on the assumption that all units in the network take only one of the two possible values 0 or 1, i.e., . Provided with the activation function, the conditional distribution of hidden and visible units can be expressed in the following form:
| (11) |
According to the Boltzmann distribution, probability distributions over hidden and visible vectors are defined as:
| (12) |
where is the normalizing constant and is the energy function [99]. The conditional probability distribution can also be computed by integral, and the parameters can then be optimized by minimizing the Kullback-Leibler divergence.
Overall, given the network architectures and optimized parameters, the distribution of the visible units could be computed as:
| (13) |
A DBN can be viewed as a stack of RBMs [6], [24], [100] or AEs [66], [101]. Similar to RBMs, DBNs can learn the distribution of the samples, or learn to classify the inputs given class labels [3]. However, the in the formula is replaced by a better model after the weight of connections W is learned by an RBM [3], [100].
In addition to feature extraction, RBMs can also learn distributions of unlabeled data as generative models, and classify labeled data as discriminative models (regard the hidden units as labels). Similar to AEs, RBMs can also pre-train parameters for a complex network.
Convolutional neural networks
Different from other deep learning structures, artificial neurons in convolutional neural networks (CNNs) extract features of small portions of input images, which are called receptive fields. This type of feature extraction was inspired by the visual mechanisms in living organisms, where cells in the visual cortex are sensitive to small regions of the visual field [52], [102].
Besides the activation function, there are two particular types of layers in CNNs: the convolutional layer and the pooling layer (Figure 2). In the convolutional layer, the image is convolved by different convolutional filters via shifting the receptive fields step by step [87] (Figure 2A). The convolutional filters share the same parameters in every small portion of the image, largely reducing the number of hyperparameters in the model. A pooling layer, taking advantage of the “stationarity” property of images, takes the mean, the max, or other statistics of the features at various locations in the feature maps, thus reducing the variance and capturing essential features (http://deeplearning.net/tutorial/lenet.html) (Figure 2B).
Figure 2.
Illustration of convolutional neural network
A. In the convolution layer, fields (different color blocks in the table) of the input patch (represented by a) are multiplied by matrices (convolution kernel, represented by k). B. In the pooling layer, the results of convolution are summarized (the max pooling is taken as example here). aij, cij, kij represent the number located in line i and column j in the corresponding matrix.
Recurrent neural networks
Recurrent neural networks (RNNs) outperform other deep learning approaches in dealing with the sequential data. Based on the property of sequential data, parameters across different time steps of the RNN model are shared. Taking speech as an example: some vowels may last longer than other sounds; the difference makes absolute time steps meaningless and demands that the model parameters be the same among the time steps [62].
Beside the parameter sharing, RNNs are different from other multilayer networks by virtue of having a circuit, which represents hidden-to-hidden recurrence. A simple recurrent network corresponds to the following equation:
| (14) |
where t is the label for time, W and V represent the weights connecting hidden and input units, and hidden and output units, respectively, b and c are the offsets of the visible and hidden layers, respectively, g is the activation function, and U represents the weights connecting hidden units at time to hidden units at time t (Figure 3).
Figure 3.
Illustration of recurrent neural network
A. The unfold form of common neural networks (top) and schema (bottom). B. An illustration of recurrent neural networks (top) and their unfold form (bottom). The red square represents one time step delay. Different from panel A, the arrows in panel B represent sets of connections. W and B represent the weight matrix and bias vector, respectively. x and y represent the input and output of the network, respectively; h indicates the hidden units of network; L consists of couples of transformations, such as densely-connected layers or dropout layers; U indicates the transformation between two neighbor time points; and t represents the time point.
Similar to other deep learning architectures, RNNs can also be trained using the BP method. A variant of the BP method called back propagation through time (BPTT) is the standard optimization method for RNNs [25], [103], and some alternative methods have also been proposed to speed up the optimization or to extend its capacity [63], [104], [105], [106], [107].
Applications in biomedicine
Owing to advances in high-throughput technologies, a deluge of biological and medical data has been obtained in recent decades, including data related to medical images, biological sequences, and protein structures. Some successful applications of deep learning in biomedical fields are reviewed in this section and a summary of applications is shown in Table 1.
Table 1.
Applications of deep learning frameworks in biomedical informatics
| Topic | DL architecture | Brief description | Refs. |
|---|---|---|---|
| Medical images analysis | CNN | Brain tumor segmentation, won top 2 in BRATS | [108] |
| Segmentation of pancreas in CT | [113] | ||
| Knee cartilage segmentation | [114] | ||
| Segmentation of hippocampus | [117] | ||
| Predict semantic descriptions from medical images | [118] | ||
| Segmentation of MR brain images | [121] | ||
| Anatomy-specific classification of medical images | [123] | ||
| Cerebral microbleeds from MR images | [125] | ||
| Coronary artery calcium scoring in CT images | [126] | ||
| Nuclei detection in routine colon cancer histology images | [129] | ||
| Histopathological cancer classification | [130] | ||
| Invasive ductal carcinoma segmentation in WSI | [132] | ||
| Mammographic lesions detection | [133] | ||
| Haemorrhages detection in fundus images | [137] | ||
| Exudates detection in fundus images | [138] | ||
| SAE | Segmentation of hippocampus from infant brains | [116] | |
| Organ detection in 4D patient data | [122] | ||
| Histological characterization healthy skin and healing wounds | [124] | ||
| Scoring of percentage mammographic density and mammographic texture related to breast cancer risk | [134] | ||
| Optic disc detection from fundus photograph | [135] | ||
| DBN | Segmentation of left ventricle of the heart from MR data | [112] | |
| Discriminate retinal-based diseases | [139] | ||
| DNN | Brain tumor segmentation in MR images, won 2nd place in BRATS | [109] | |
| Prostate MR segmentation | [115] | ||
| Gland instance segmentation | [119] | ||
| Semantic segmentation of tissues in CT images | [120] | ||
| Mitosis detection in breast cancer histological images | [131] | ||
| RNN | EEG-based prediction of epileptic seizures propagation using time-delayed NN | [141] | |
| Classification of patterns of EEG synchronization for seizure prediction | [142] | ||
| EEG-based lapse detection | [143] | ||
| Prediction of epileptic seizures | [144] | ||
| Genomic sequencing and gene expression analysis | DNN | Gene expression inference | [145] |
| Identification of cis-regulatory regions and replication timing domains | [151] | ||
| Prediction of enhancer | [152] | ||
| Prediction of splicing patterns in individual tissues and differences in splicing patterns across tissues | [159] | ||
| Annotation of the pathogenicity of genetic variants | [161] | ||
| DBN | Modeling structural binding preferences and predicting binding sites of RNA-binding proteins | [146] | |
| Prediction of splice junction at DNA level | [156] | ||
| Prediction of transcription factor binding sites | [148], [149] | ||
| Annotation and interpretation of the noncoding genome | [151] | ||
| Prediction of the noncoding variant effects de novo from sequence | [162] | ||
| RNN | Prediction of miRNA precursor and miRNA targets | [153], [154] | |
| Detection of splice junctions from DNA sequences | [157] | ||
| Prediction of non-coding function de novo from sequence | [163] | ||
| Analysis of human splicing codes and their determination of diseases | [161] | ||
| Protein structure prediction | DBN | Modeling structural binding preferences and predicting binding sites of RBPs | [146] |
| Ab initio prediction of the protein secondary structures | [171] | ||
| Prediction of protein disorder | [184] | ||
| Prediction of secondary structures, local backbone angles, and solvent accessible surface area of proteins | [170] | ||
| CNN | Prediction of protein order/disorder regions | [183] | |
| Prediction of protein secondary structures | [179], [180], [182] | ||
| Prediction of protein structure properties, including secondary structure, solvent accessibility, and disorder regions | [185] | ||
| SAE | Sequence-based prediction of backbone Cα angles and dihedrals | [169] | |
| RNN | Prediction of protein secondary structure | [172], [173], [174], [178] | |
| Prediction of protein contact map | [175], [176], [177] | ||
Note: NN, neural networks; CNN, convolutional NN; SAE, stacked auto-encoder; DBN, deep belief network; RNN, recurrent NN.
Medical image classification and segmentation
Machine learning for medical images has long been a powerful tool in the diagnosis or assessment of diseases. Traditionally, discriminative features referring to medical image interpretation are manually designed for classification (detection of lesions or abnormalities) and segmentation of regions of interest (tissues and organs) in different medical applications. This requires the participation of physicians with expertise. Nonetheless, the complexity and ambiguity of medical images, limited knowledge for medical image interpretation, and the requirement of large amounts of annotated data have hindered the wide use of machine learning in the medical image domain. Notably, deep learning methods have attained success in a variety of computer vision tasks such as object recognition, localization, and segmentation in natural images. These have soon brought about an active field of machine learning in medical image analysis.
Segmentation of tissues and organs is crucial for qualitative and quantitative assessment of medical images. Pereira et al. used data augmentation, small convolutional kernels, and a pre-processing stage to achieve accurate brain tumor segmentation [108]. Their CNN-based segmentation method won first place in the Brain Tumor Segmentation (BRATS) Challenge in 2013, and second place in 2015. Havaei et al. presented a fully automatic brain tumor segmentation method based on DNNs in magnetic resonance (MR) images with a two-phase training procedure [109], which obtained second place in the 2013 BRATS. Their methodology was tested on the publicly available datasets INbreast [110] and Digital Database for Screening Mammography (DDSM) [111], outperforming in terms of accuracy and efficiency several state-of-the-art methods when tested on DDSM. Additional medical applications employing a deep learning architecture have been demonstrated in segmenting the left ventricle of the heart from the MR data [112], the pancreas through computed tomography (CT) [113], tibial cartilage through magnetic resonance imaging (MRI) [114], the prostate through MRI [115], and the hippocampus through MR brain images [116], [117]. The differentiation of tissues or organs in medical images has been termed semantic segmentation [118], [119] in which each pixel of an image is assigned to a class or a label. The skeletal muscles, organs, and fat in CT images are well delineated through semantic segmentation based on a DNN architecture [120]. Similarly, the semantic segmentation of MR images also attained accurate segmentation results [121], [122], [123].
Detection of lesion and abnormality is the major issue in medical image analysis. Deep learning methods learn the representations directly instead of using hand-crafted features from training data. A classifier is then used to assign the representations to a probability that indicates whether or not the image contains lesions. In other words, the deep learning schemas classify each pixel to be a lesion point or not, which can be done in two ways: (1) classifying the mini patch around the pixel with a deep network, and (2) using a fully convolutional network to classify each pixel.
Sheet et al. [124] applied a DNN to histologically characterize healthy skin and healing wounds to reduce clinical reporting variability. Two unsupervised pre-trained layers of denoising AEs (DAEs) were used to learn features in their hybrid architecture, and subsequently the whole network was learned using labelled tissues for characterization. Detection of cerebral microbleeds [125] and coronary artery calcification [126] also produced better results when using deep learning-based approaches. In addition, brain tumor progression prediction implemented with a deep learning architecture [127] has also shown a more robust tumor progression model in comparison with a high-precision manifold learning approach [128].
Detection of pathologies on stained histopathology images [129], [130], [131] exemplify the high precision of deep learning-based approaches. For breast cancer detection in histopathology images, Cruz-Roa et al. [132] established a deep learning model to precisely delineate the invasive ductal carcinoma (IDC) regions to distinguish the invasive tumor tissue and non-invasive or healthy tissue. Their 3-layer CNN architecture, composed of two cascading convolutional and pooling layers, a full-connected layer, and a logistic regression classifier for prediction, attained a better F-measure (71.8%) and higher balanced accuracy (BAC; 84.23%) in comparison with an approach using handcrafted image features and a machine learning classifier.
The mammogram is one of the most effective imaging modalities in early diagnosis and risk prediction of breast cancer. A deep learning model [133] trained on a large dataset of 45,000 images attained performance similar to that of certified screening radiologists in mammographic lesion detection. Kallenberg et al. [134] investigated the scoring of percentage mammographic density (PMD) and mammographic texture (MT) related to prediction of breast cancer risk. They employed a sparse AE to learn deep hierarchical features from unlabeled mammograms. Multinomial logistic regression or softmax regression was then used as a classifier in the supervised training. As a result, the performance of their approach was comparable with that of the subjective and expensive manual PMD and MT scorings.
Color fundus photography is an important diagnostic tool for ophthalmic diseases. Deep learning-based methods with fundus images have recently gained considerable interest as a key to developing automated diagnosis systems. A DNN architecture was proposed by Srivastava et al. [135] to distinguish optic disc (OD) from parapapillary atrophy (PPA). A DNN consisting of SAEs followed by a refined active shape model attained accurate OD segmentation. For image registration, deep learning in combination with a multi-scale Hessian matrix [136] was used to detect vessel landmarks in the retinal image, whereas convolutional neural networks have also produced excellent results in the detection of hemorrhages [137] and exudates [138] in color fundus images. It is difficult to design an automatic screening system for retinal-based diseases such as age-related molecular degeneration, diabetic retinopathy, retinoblastoma, retinal detachment, and retinitis pigmentosa, because these diseases share similar characteristics. Through deep learning methods, Arunkumar et al. [139] successfully built a system to discriminate retina-based diseases only using fundus images. First, a DBN composed of a stack of RBMs was designed for feature extraction. Then a generalized regression neural network (GRNN) was employed to reduce dimensionality. Finally, a multi-class SVM was used for classification. Interestingly, Kaggle organized a competition on the staging of diabetic retinopathy from 35,126 training and 53,576 test color fundus images in 2015. Using convolutional neural networks, the top model outperformed other machine learning methods with a kappa score of 0.8496 (https://www.kaggle.com/c/diabetic-retinopathy-detection/leaderboard).
In addition to static images, time-series medical records such as signal maps from electro-encephalography and magnetoencephalography can also be analyzed using deep learning methods [140], [141]. These deep learning schemas take coded features of signals [142], [143] or raw signals [144] as input, and extract features from the data for anomaly classification or understanding emotions.
All the aforementioned applications illustrate that as a frontier of machine learning, deep learning has made substantial progress in medical image segmentation and classification. We expect that more clinical trials and systematic medical image analytic applications will emerge to help achieve better performance when applying deep learning in medicine.
Genomic sequencing and gene expression analysis
Deep learning also plays an important role in genomic sequencing and gene expression analyses. To infer the expression profiles of target genes based on approximately 1000 landmark genes from the NIH Integrated Network-based Cellular Signatures (LINCS) program, Chen et al. presented D-GEX, a deep learning method with dropout as regularization, which significantly outperformed linear regression (LR) in terms of prediction accuracy on both microarray and RNA-seq data [145]. By applying a multimodal DBN to model structural binding preferences and to predict binding sites of RNA-binding proteins (RBPs) using the primary sequence as well as the secondary and tertiary structural profiles, Zhang et al. achieved an AUC of 0.98 for some proteins [146]. To predict binding sites of DNA- and RNA-binding proteins, Alipanahi et al. developed DeepBind, a CNN-based method, which surpassed other state-of-the-art methods, even when trained with in vitro data and tested with in vivo data [147]. Subsequently, Lanchantin et al. [148] and Zeng et al. [149] also applied CNN to predict transcription factor binding sites (TFBSs), and both studies demonstrated an improvement over the performance of DeepBind (AUC of 0.894). The input of these deep CNNs is encoded sequence characters obtained through protein binding microarrays or other assays, and the output is a real value indicating whether the sequence is a binding site or not. The deeper model can make more accurate classification by extracting higher-level features from the raw nucleotide sequences [148]. In addition, Kelley et al. presented Basset, an open source package to apply deep CNNs to learn the chromatin accessibility code, enabling annotation and interpretation of the noncoding genome [150]. Other applications include that of Li et al. [134] and Liu et al. [151], [152], who proposed deep learning approaches for the identification of cis-regulatory regions and replication timing domains, respectively. In addition, Yoon and his collaborators employed RNNs to predict miRNA precursors and targets. As a result, they achieved 25% increase in F-measure compared to existing alternative methods [153], [154].
Genetic variation can influence the transcription of DNA and the translation of mRNA [155]. Understanding the effects of sequence variants on pre-mRNA splicing facilitates not only whole genome annotation but also an understanding of genome function. To predict splice junction at the DNA level, Yoon and his collaborators developed a novel DBN-based method that was trained on the RBMs by boosting contrastive divergence with categorical gradients [156]. Their method not only achieved better accuracy and robustness but also discovered subtle non-canonical splicing patterns [156]. Furthermore, by exploiting RNNs to model and detect splice junctions from DNA sequences, the same authors also achieved a better performance than the previous DBN-based method [157].
Frey et al. formulated the assembly of a splicing code as a statistical inference problem [158], and proposed a Bayesian method to predict tissue-regulated splicing using RNA sequences and cellular context. Subsequently, they developed a DNN model with dropout to learn and predict alternative splicing (AS) [159]. This model took both the genomic features and tissue context as inputs, and predicted splicing patterns in individual tissues and differences in splicing patterns across tissues. They showed that their method surpassed the previous Bayesian methods and other common machine learning algorithms, such as multinomial logistic regression (MLR) and SVMs, in terms of AS prediction. Furthermore, they built a computational model using a Bayesian deep learning algorithm to predict the effects of genetic variants on AS [160]. This model took DNA sequences alone as input without using disease annotations or population data, and then scored the effects that variants had on AS, providing valuable insights into the genetic determinants of spinal muscular atrophy, nonpolyposis colorectal cancer, and autism spectrum disorder.
To annotate the pathogenicity of genetic variants, Quang et al. developed a DNN algorithm named DANN, which outperforms logistic regression (LR) and SVMs, with the AUC metric increased by 14% over SVMs [161]. Zhou et al. proposed a CNN-based algorithmic framework, DeepSEA, to predict the functional effects of noncoding variants de novo from sequences [162]. DeepSEA directly learns a regulatory sequence code from large-scale chromatin-profiling data, and can then predict the chromatin effects of sequence alterations with single-nucleotide sensitivity, and further prioritize functional variants based on the predicted chromatin effect signals. Subsequently, DanQ, a novel hybrid framework that combines CNN and bi-directional long short-term memory (BLSTM) RNNs, was presented to predict non-coding function de novo from sequences alone [163]. DanQ achieved an AUC 50% higher than other models, including the aforementioned DeepSEA.
Prediction of protein structure
The 3D structure of proteins is determined by their comprising amino acid sequence [164]. However, the computational prediction of 3D protein structure from the 1D sequences remains challenging [165]. The correct 3D structure of a protein is crucial to its function, and improper structures could lead to a wide range of diseases [166], [167], [168]. Deep learning technologies have shown great capabilities in the area of protein structure prediction, which aims to predict the secondary structure or contact map of a protein.
Lyons et al. reported the first SAE for sequence-based prediction of backbone Cα angles and dihedrals [169]. Heffernan et al. also employed SAEs to predict secondary structure, local backbone angles, and solvent-accessible surface area (ASA) of proteins from amino acid sequences [170]; they achieved an accuracy of 82% for secondary structure prediction. Spencer et al. proposed DNSS, an ab initio approach to predicting the secondary structure of proteins using deep learning network architectures [171]. DNSS was trained using a position-specific scoring matrix of the protein sequence and Atchley’s factors of residues, and was optimized to accelerate the computation using the GPU and compute unified device architecture (CUDA). Baldi and his colleagues successfully applied various RNN-based algorithms to predict protein secondary structure [172], [173], [174] and protein contact map [175], [176], [177], with accuracies of 84% and 30%, respectively. Sønderby et al. used a bidirectional RNN (BRNN) with long short-term memory cells to improve the prediction of secondary structure, with better accuracy (0.671) than that using state of the art (0.664) [178]. Compared with SAEs, DBNs, and RNNs, CNNs were seldom used for protein structure prediction until recently. Li et al. developed Malphite, a CNN and ensemble learning-based method for predicting protein secondary structures, which achieved an accuracy of 82.6% for a dataset containing 3000 proteins [179]. Additionally, Lin et al. proposed MUST-CNN, a multilayer shift-and-stitch convolutional neural network architecture to predict protein secondary structure from primary amino acid sequences [180]. Besides classical deep learning architectures, some other architectures were also employed to predict protein secondary structure. For example, Lena et al. introduced a deep spatio-temporal learning architecture, achieved an accuracy roughly 10% higher than other methods [181], and Zhou et al. presented a deep supervised and convolutional generative stochastic network, achieving an accuracy of 66.4% [182].
In addition to the secondary structure prediction, deep learning was also employed in protein region prediction [183], [184]. For instance, sequenced-based predictor of protein disorder using boosted ensembles of deep networks (DNdisorder), a deep neural network with multi-layers of RBMs [184], achieved an average balanced accuracy of 0.82 and an AUC of 0.90. Incorporated with predicted secondary structure and predicted ASA, a weighted deep convolutional neural fields (DeepCNF) was proposed to predict protein order/disorder regions, obtains an AUC of 0.898 on the Critical As-sessment of Techniques for Protein Structure Prediction (CASP10) dataset [183]. All of these methods surpassed other state-of-the-art predictors in accuracy while still maintaining an extremely high computing speed. Recently, RaptorX-Property, a web server employing DeepCNF, was also presented to predict protein structure properties, including secondary structure, solvent accessibility, and disorder regions [185]. RaptorX-Property can be easily used and offer good performance (an AUC of 0.89 on its test data).
Conclusion and perspective
Deep learning is moving toward its original goal: artificial intelligence. The state-of-the-art feature extraction capacity of deep learning enables its application in a wide range of fields. Many deep learning frameworks are open source, including commonly-used frameworks like Torch, Caffe, Theano, MXNet, DMTK, and TensorFlow. Some of them are designed as high-level wrappers for easy use, such as Keras, Lasagne, and Blocks. The applications of deep learning algorithms is further facilitated by the freely available sources. Figure 4 summarizes commonly-used frameworks in Github (https://github.com/) where the number of stars reflects the popularity of the frameworks.
Figure 4.
Popularity of deep learning frameworks in Github
The distributions of stars in Github of deep learning frameworks written in C++, Lua, Python, Matlab, Julia, and Java are shown in the pie chart. More stars in Github indicate higher popularity. Font size of the frameworks in the pie chart reflects the number of stars.
Breakthroughs in technologies, particularly next-generation sequencing, are producing a large quantity of genomic data. Efficient interpretation of these data has been attracting much attention in recent years. In this scenario, uncovering the relationship between genomic variants and diseases, and illustrating the regulatory process of genes in cells have been important research areas. In this review, we introduced the way deep learning gets involved in these areas using examples. With deep architecture, these models can simulate more complex transformations and discover hierarchical data representations. On the other hand, almost all of these models can be trained in parallel on GPUs for fast processing. Furthermore, deep learning can extract data-driven features and deal with high-dimensional data, while machine learning usually depends on hand-crafted features and is suitable only to low-dimensional data. Thus, deep learning is becoming more and more popular in genomic sequence analysis.
Deep learning is represented by a group of technologies (introduced in brief description of deep learning), and has been widely used in biomedical data (introduced in applications in biomedicine). SAEs and RBMs can extract patterns from unlabeled data [186] as well as labeled data when stacked with a classifier [156]. They can also deal with dynamic data [187]. CNNs are most commonly used in the biomedical image analysis domain due to their outstanding capacity in analyzing spatial information. Although relatively few CNNs are used in sequencing data, CNNs have great potential in omics analysis [147] and biomedical signals [142]. On the other hand, RNN-based architectures are tailored for sequential data, and are most often used for sequencing data [154], [157] and in dynamic biomedical signals [144], but less frequently in static biomedical images. Currently, more and more attention is being paid to the usage of deep learning in biomedical information, and new applications of each schema may be discovered in the near future.
Despite the notable advantages of deep learning, challenges in applying deep learning to the biomedical domain still remain. Take biomedical image analysis for instance: we use fundus images to exemplify how deep learning works to define the level of diabetic retinopathy, and to detect lesion areas in different ways. Besides high accuracy and speed, the intelligent use of receptive fields also endows deep learning with overwhelming superiority in terms of image recognition. Furthermore, the development of end-to-end classification methods based on deep learning sheds new light on classifying pixels as lesioned or not. However, the usage of deep learning in medical images is still challenging. For model training, we need large amounts of data with labels, sometimes with labels in terms of pixel classification. Manually labeling these medical images is laborious and requires professional experts. On the other hand, medical images are highly associated with privacy, so collecting and protecting the data is demanding. Furthermore, biomedical data are usually imbalanced because the quantity of data from normal classes is much larger than that from other classes.
In addition to the balancing challenges, the large amount of data required, and the labeling for biomedical data, deep learning also requires technological improvements. Unlike other images, subtle changes in medical images may indicate disease. Therefore, analyzing these images requires high-resolution inputs, high training speed, and a large memory. Additionally, it is difficult to find a uniform assessment metric for biomedical data classification or prediction. Unlike other projects, we can tolerate false positives to some extent, and reject few or no false negatives in disease diagnosis. With different data, it is necessary to assess the model carefully and to tune the model according to characteristics of the data. Fortunately, the deeper networks with inception modules are accelerated [188], [189] and provide higher accuracy in biomedical image analysis [190]. On the other hand, crowdsourcing approaches have begun to pave the way in collecting annotations [191], [192], which may be an important tool in the next few years. These bidirectional drivers would promote the applications of deep learning in biomedical informatics.
As a long-term goal, precision medicine research demands active learning from all biological, biomedical, as well as health data. Together with medical devices and instruments, wearable sensors and smart phones are providing unprecedented amounts of health data. Deep learning is a promising interpreter of these data, serving in disease prediction, prevention, diagnosis, prognosis, and therapy. We expect that more deep learning applications will be available in epidemic prediction, disease prevention, and clinical decision-making.
Competing interests
The authors have declared no competing interests.
Acknowledgments
Acknowledgments
This work was supported by the Center for Precision Medicine, Sun Yat-sen University and the National High-tech R&D Program (863 Program; Grant No. 2015AA020110) of China awarded to YZ.
Handled by Xuegong Zhang
Footnotes
Peer review under responsibility of Beijing Institute of Genomics, Chinese Academy of Sciences and Genetics Society of China.
Contributor Information
Yiming Zhou, Email: yimingzhou@capitalbio.com.
Xiaochen Bo, Email: boxc@bmi.ac.cn.
Zhi Xie, Email: xiezh8@sysu.edu.cn.
References
- 1.Yu D., Deng L. Deep learning and its applications to signal and information processing. IEEE Signal Process Mag. 2011;28:145–154. [Google Scholar]
- 2.Fukushima K. Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol Cybern. 1980;36:193–202. doi: 10.1007/BF00344251. [DOI] [PubMed] [Google Scholar]
- 3.Hinton G.E., Osindero S., Teh Y.W. A fast learning algorithm for deep belief nets. Neural Comput. 2006;18:1527–1554. doi: 10.1162/neco.2006.18.7.1527. [DOI] [PubMed] [Google Scholar]
- 4.Hinton G.E., Salakhutdinov R.R. Reducing the dimensionality of data with neural networks. Science. 2006;313:504–507. doi: 10.1126/science.1127647. [DOI] [PubMed] [Google Scholar]
- 5.Cios K.J., Mamitsuka H., Nagashima T., Tadeusiewicz R. Computational intelligence in solving bioinformatics problems. Artif Intell Med. 2005;35:1–8. doi: 10.1016/j.artmed.2005.07.001. [DOI] [PubMed] [Google Scholar]
- 6.Längkvist M., Karlsson L., Loutfi A. A review of unsupervised feature learning and deep learning for time-series modeling. Pattern Recognit Lett. 2014;42:11–24. [Google Scholar]
- 7.Krizhevsky A., Sutskever I., Hinton G.E. ImageNet classification with deep convolutional neural networks. Adv Neural Inform Process Syst. 2012;60:1097–1105. [Google Scholar]
- 8.Asgari E, Mofrad MRK. ProtVec: a continuous distributed representation of biological sequences. arXiv1503.05140v1. [DOI] [PMC free article] [PubMed]
- 9.Hubel D.H., Wiesel T.N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. J Physiol. 1962;160:106–154. doi: 10.1113/jphysiol.1962.sp006837. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Hubel D.H., Wiesel T.N. Receptive fields of single neurones in the cat’s striate cortex. J Physiol. 1959;148:574–591. doi: 10.1113/jphysiol.1959.sp006308. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Weng J., Ahuja N., Huang T.S. Cresceptron: a self-organizing neural network which grows adaptively. Proc Int Jt Conf Neural Netw. 1992;1:576–581. [Google Scholar]
- 12.Weng J.J., Ahuja N., Huang T.S. Learning recognition and segmentation of 3-D objects from 2-D images. Proc IEEE Int Conf Comput Vis. 1993:121–128. [Google Scholar]
- 13.Weng J., Ahuja N., Huang T.S. Learning recognition and segmentation using the cresceptron. Int J Comput Vis. 1997;25:109–143. [Google Scholar]
- 14.Riesenhuber M., Poggio T. Hierarchical models of object recognition in cortex. Nat Neurosci. 1999;2:1019–1025. doi: 10.1038/14819. [DOI] [PubMed] [Google Scholar]
- 15.Joseph R.D. Cornell University; 1961. Contributions to perceptron theory. Ph.D. thesis. [Google Scholar]
- 16.Viglione S.S. Applications of pattern recognition technology. In: Mendel J.M., Fu K.S., editors. Mathematics in science and engineering. Elsevier B. V; Amsterdam: 1970. pp. 115–162. [Google Scholar]
- 17.Newell A., Papert M.M. Perceptrons An introduction to computational geometry. Science. 1969;165:780–782. [Google Scholar]
- 18.Werbos P. Beyond regression: new tools for prediction and analysis in the behavioral sciences. In: Ph.D. dissertation, Harvard University; 1974, 29:65−78.
- 19.Werbos P. Applications of advances in nonlinear sensitivity analysis. In: Drenick R.F., Kozin F., editors. System modeling and optimization. Springer, Berlin Heidelberg; Berlin: 1982. pp. 762–770. [Google Scholar]
- 20.Werbos P. Backwards differentiation in ad and neural nets: past links and new opportunities. In: Bücker M., Corliss G., Naumann U., Hovland P., Norris B., editors. Automatic differentiation: applications, theory, and implementations. Springer, Berlin Heidelberg; Berlin: 2006. pp. 15–34. [Google Scholar]
- 21.LeCun Y. Une procédure d’apprentissage pour réseau à seuil asymétrique. Proc Cogn. 1985:599–604. [Google Scholar]
- 22.LeCun Y. A theoretical framework for back-propagation. Proc 1988 Connect Model Summer Sch 1988:21–8.
- 23.Lang K.J., Waibel A.H., Hinton G.E. A time-delay neural network architecture for isolated word recognition. Neural Netw. 1990;3:23–43. [Google Scholar]
- 24.Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. [DOI] [PubMed] [Google Scholar]
- 25.Rumelhart DE, McClelland JL, the PDP Research Group. Parallel distributed processing: explorations in the microstructure of cognition. Cambridge: MIT Press; 1986, p. 318–62.
- 26.West AHL, Saad D. Adaptive back-propagation in on-line learning of multilayer networks. NIPS'95 Proc 8th Int Conf Neural Inform Process Syst 1995:323–9.
- 27.Battiti R. Accelerated backpropagation learning: two optimization methods. Complex Syst. 1989;3:331–342. [Google Scholar]
- 28.Almeida L.B. IEEE Press; Piscataway: 1990. Artificial neural networks. [Google Scholar]
- 29.Marquardt D.W. An algorithm for least-squares estimation of nonlinear parameters. J Soc Ind Appl Math. 1963;11:431–441. [Google Scholar]
- 30.Gauss C.F. Cambridge University Press; Cambridge: 1809. Theoria motus corporum coelestium in sectionibus conicis solem ambientium. [Google Scholar]
- 31.Broyden C.G. A class of methods for solving nonlinear simultaneous equations. Math Comput. 1965;19:577–593. [Google Scholar]
- 32.Fletcher R., Powell M.J.D. A rapidly convergent descent method for minimization. Comput J. 1963;6:163–168. [Google Scholar]
- 33.Goldfarb D. A family of variable-metric methods derived by variational means. Math Comput. 1970;24:23–26. [Google Scholar]
- 34.Shanno D.F. Conditioning of quasi-Newton methods for function minimization. Math Comput. 1970;24:647–656. [Google Scholar]
- 35.Møller M. Exact calculation of the product of the hessian matrix of feed-forward network error functions and a vector in 0 (n) time. Daimi Rep 1993:14.
- 36.Hestenes M.R., Stiefel E. Methods of conjugate gradients for solving linear systems. J Res Nat Bur Stand. 1952;49:409–436. [Google Scholar]
- 37.Cortes C., Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–297. [Google Scholar]
- 38.Ho TK. Random decision forests. Proc 3rd Int Conf Doc Anal Recognit 1995;1:278–82.
- 39.Ho T.K. The random subspace method for constructing decision forest. IEEE Trans Pattern Anal Mach Intell. 1998;20:832–844. [Google Scholar]
- 40.Altman N.S. An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat. 1992;46:175–185. [Google Scholar]
- 41.Graves A. Practical variational inference for neural networks. In: Shawe-Taylor J., Zemel R.S., Bartlett P.L., Pereira F., Weinberger K.Q., editors. Advances in neural information processing systems. Curran Associates Inc.; New York: 2011. pp. 2348–2356. [Google Scholar]
- 42.Bengio Y., Simard P., Frasconi P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Netw Learn Syst. 1994;5:157–166. doi: 10.1109/72.279181. [DOI] [PubMed] [Google Scholar]
- 43.LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. [DOI] [PubMed] [Google Scholar]
- 44.Ciresan DC, Meier U, Masci J, Gambardella LM, Schmidhuber J. Flexible, high performance convolutional neural networks for image classification. IJCAI'11 Proc 22ed Int Joint Conf Artif Intell 2011;22:1237–42.
- 45.Hinton G., Deng L., Yu D., Dahl G.E., Mohamed A., Jaitly N. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. Signal Process Mag IEEE. 2012;29:82–97. [Google Scholar]
- 46.Cireşan D.C., Meier U., Gambardella L.M., Schmidhuber J. Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 2010;22:3207–3220. doi: 10.1162/NECO_a_00052. [DOI] [PubMed] [Google Scholar]
- 47.Raina R, Madhavan A, Ng AY. Large-scale deep unsupervised learning using graphics processors. ICML'09 Proc 26th Ann Int Conf Mach Learn 2009:873–80.
- 48.Hinton G.E. Boltzmann machine. Scholarpedia. 2007;2:1668. [Google Scholar]
- 49.Bengio Y. Now Publishers Inc.; Delft: 2009. Learning deep architectures for AI; pp. 1–127. [Google Scholar]
- 50.Sutskever I., Hinton G.E. Learning multilevel distributed representations for high-dimensional sequences. J Mach Learn Res. 2007;2:548–555. [Google Scholar]
- 51.Sarikaya R., Hinton G.E., Deoras A. Application of deep belief networks for natural language understanding. IEEE/ACM Trans Audio Speech Lang Process. 2014;22:778–784. [Google Scholar]
- 52.Matsugu M., Mori K., Mitari Y., Kaneda Y. Subject independent facial expression recognition with robust face detection using a convolutional neural network. Neural Netw. 2003;16:555–559. doi: 10.1016/S0893-6080(03)00115-1. [DOI] [PubMed] [Google Scholar]
- 53.Sermanet P., LeCun Y. Traffic sign recognition with multi-scale convolutional networks. Neural Netw. 2011;42:3809–3813. [Google Scholar]
- 54.Lawrence S., Giles C.L., Tsoi A.C., Back A.D. Face recognition: a convolutional neural-network approach. IEEE Trans Neural Netw Learn Syst. 1997;8:98–113. doi: 10.1109/72.554195. [DOI] [PubMed] [Google Scholar]
- 55.Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, et al. Going deeper with convolutions. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015:1−9.
- 56.Long J., Shelhamer E., Darrell T. Fully convolutional networks for semantic segmentation. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015;79:3431–3440. doi: 10.1109/TPAMI.2016.2572683. [DOI] [PubMed] [Google Scholar]
- 57.Karpathy A., Toderici G., Shetty S., Leung T., Sukthankar R., Li F.F. Large-scale video classification with convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit. 2014:1725–1732. [Google Scholar]
- 58.Simonyan K., Zisserman A. Two-stream convolutional networks for action recognition in videos. In: Ghahramani Z., Welling M., Cortes C., Lawrence N.D., Weinberger K.Q., editors. Advances in neural information processing systems. Curran Associates Inc.; New York: 2014. pp. 568–576. [Google Scholar]
- 59.Collobert R, Weston J. A unified architecture for natural language processing: deep neural networks with multitask learning. ACM Proc Int Conf Mach Learn; 2008:160–7.
- 60.Hochreiter S., Schmidhuber J. Long short-term memory. Neural Comput. 1997;9:1735–1780. doi: 10.1162/neco.1997.9.8.1735. [DOI] [PubMed] [Google Scholar]
- 61.Graves A. Springer-Verlag, Berlin Heidelberg; Berlin: 2012. Supervised sequence labelling with recurrent neural networks. [Google Scholar]
- 62.Goodfellow I., Bengio Y., Courville A. Modern practical deep networks. In: Goodfellow I., Bengio Y., Courville A., editors. Deep learning. MIT Press; Cambridge: 2015. pp. 162–481. [Google Scholar]
- 63.Gers F.A., Schmidhuber J. LSTM recurrent networks learn simple context-free and context-sensitive languages. IEEE Trans Neural Netw. 2001;12:1333–1340. doi: 10.1109/72.963769. [DOI] [PubMed] [Google Scholar]
- 64.Graves A., Schmidhuber J. Offline handwriting recognition with multidimensional recurrent neural networks. In: Koller D., Schuurmans D., Bengio Y., Bottou L., editors. Advances in neural information processing systems. Curran Associates Inc.; New York: 2009. pp. 545–552. [Google Scholar]
- 65.Ballard D.H. Modular learning in neural networks. Proc Conf AAAI Artif Intell. 1987:279–284. [Google Scholar]
- 66.Schölkopf B., Platt J., Hofmann T. Greedy layer-wise training of deep networks. Adv Neural Inf Process Syst. 2007:153–160. [Google Scholar]
- 67.Schölkopf B., Platt J., Hofmann T. Efficient sparse coding algorithms. Adv Neural Inf Process Syst. 2007:801–808. [Google Scholar]
- 68.Bengio Y. Practical recommendations for gradient-based training of deep architectures. Lect Notes Comput Sci. 2012;7700:437–478. [Google Scholar]
- 69.Singh R.G., Kishore N. The impact of transformation function on the classification ability of complex valued extreme learning machines. Int Conf Control Comput Commun Mater. 2013:1–5. [Google Scholar]
- 70.Toth L. Phone recognition with deep sparse rectifier neural networks. Proc IEEE Int Conf Acoust Speech Signal Process. 2013:6985–6989. [Google Scholar]
- 71.Maas AL, Hannun AY, Ng AY. Rectifier nonlinearities improve neural network acoustic models. Proc 30th Int Conf Mach Learn 2013:30.
- 72.Nair V, Hinton GE. Rectified linear units improve restricted boltzmann machines. ICML'10 Proc 27th Int Conf Mach Learn 2010:807–14.
- 73.Lai M. Deep learning for medical image segmentation. arXiv150502000.
- 74.Glorot X., Bordes A., Bengio Y. Deep sparse rectifier neural networks. J Mach Learn Res. 2011;15:315–323. [Google Scholar]
- 75.Jarrett K., Kavukcuoglu K., Ranzato M., LeCun Y. What is the best multi-stage architecture for object recognition? Proc IEEE Int Conf Comput Vis. 2009:2146–2153. [Google Scholar]
- 76.Goodfellow IJ, Warde-Farley D, Mirza M, Courville A. Maxout Networks. arXiv13024389.
- 77.Rosasco L., De Vito E., Caponnetto A., Piana M., Verri A. Are loss functions all the same? Neural Comput. 2004;16:1063–1076. doi: 10.1162/089976604773135104. [DOI] [PubMed] [Google Scholar]
- 78.Binmore K.G., Davies J. Cambridge University Press; Cambridge: 2002. Calculus: concepts and methods. [Google Scholar]
- 79.Boyd S., Vandenberghe L. Cambridge University Press; Cambridge: 2004. Convex optimization. [Google Scholar]
- 80.Huang J., Dong M., Li S. A new method of regularization parameter estimation for source localization. IEEE CIE Int Conf. 2011;2:1804–1808. [Google Scholar]
- 81.Yu Y., Schuurmans D. Rank/norm regularization with closed-form solutions: application to subspace clustering. Assoc Uncertain Artif Intell. 2002:1–5. [Google Scholar]
- 82.Abernethy J., Bach F., Evgeniou T., Vert J.P. A new approach to collaborative filtering: operator estimation with spectral regularization. J Mach Learn Res. 2009;10:803–826. [Google Scholar]
- 83.Argyriou A., Evgeniou T., Pontil M. Convex multi-task feature learning. Mach Learn. 2008;73:243–272. [Google Scholar]
- 84.Obozinski G., Taskar B., Jordan M.I. Joint covariate selection and joint subspace selection for multiple classification problems. Stat Comput. 2010;20:231–252. [Google Scholar]
- 85.Gauriau R., Cuingnet R., Lesage D., Bloch I. Multi-organ localization with cascaded global-to-local regression and shape prior. Med Image Anal. 2015;23:70–83. doi: 10.1016/j.media.2015.04.007. [DOI] [PubMed] [Google Scholar]
- 86.Bottou L. Stochastic gradient learning in neural networks. Proc Neuro Nımes 1991;91.
- 87.Lecun Y., Bottou L., Bengio Y., Haffner P. Gradient-based learning applied to document recognition. Proc IEEE. 1998;86:2278–2324. [Google Scholar]
- 88.Zinkevich M., Weimer M., Li L., Smola A.J. Parallelized stochastic gradient descent. In: Lafferty J.D., Williams C.K.I., Shawe-Taylor J., Zemel R.S., Culotta A., editors. Advances in Neural Information Processing Systems. Curran Associates Inc.; New York: 2010. pp. 2595–2603. [Google Scholar]
- 89.Hinton G.E. Products of experts. ICANN. 1999:1–6. [Google Scholar]
- 90.Hinton G.E. Training products of experts by contrastive divergence. Neural Comput. 2002:1771–1800. doi: 10.1162/089976602760128018. [DOI] [PubMed] [Google Scholar]
- 91.Carreira-Perpinan M.A., Hinton G.E. On contrastive divergence learning. Proc Artif Intell Stat. 2005:1–17. [Google Scholar]
- 92.Jim K.C., Giles C.L., Horne B.G. An analysis of noise in recurrent neural networks: convergence and generalization. IEEE Trans Neural Netw. 1996;7:1424–1438. doi: 10.1109/72.548170. [DOI] [PubMed] [Google Scholar]
- 93.Vincent P., Larochelle H., Lajoie I., Bengio Y., Manzagol P.A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res. 2010;11:3371–3408. [Google Scholar]
- 94.Lasserre J.A., Bishop C.M., Minka T.P. Principled hybrids of generative and discriminative models. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2006;1:87–94. [Google Scholar]
- 95.Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR. Improving neural networks by preventing co-adaptation of feature detectors. arXiv12070580.
- 96.Srivastava N., Hinton G.E., Krizhevsky A., Sutskever I., Salakhutdinov R. Dropout : a simple way to prevent neural networks from overfitting. J Mach Learn Res. 2014;15:1929–1958. [Google Scholar]
- 97.Aurelio Ranzato M., Poultney C., Chopra S., Cun Y.L. Efficient learning of sparse representations with an energy-based model. In: Schölkopf B., Platt J.C., Hoffman T., editors. Advances in neural information processing systems. Curran Associates Inc.; New York: 2007. pp. 1137–1144. [Google Scholar]
- 98.Bourlard H., Kamp Y. Auto-association by multilayer perceptrons and singular value decomposition. Biol Cybern. 1988;59:291–294. doi: 10.1007/BF00332918. [DOI] [PubMed] [Google Scholar]
- 99.Hinton G. A practical guide to training restricted boltzmann machines. In: Montavon G., Orr G., Müller K.R., editors. Neural networks: tricks of the Trade. Springer, Berlin Heidelberg; Berlin: 2012. pp. 599–619. [Google Scholar]
- 100.Hinton G.E. Oxford Univ Press; Oxford: 2009. Deep belief networks; p. 5947. [Google Scholar]
- 101.Erhan D., Bengio Y., Courville A., Manzagol P.A., Vincent P., Bengio S. Why does unsupervised pre-training help deep learning? J Mach Learn Res. 2010;11:625–660. [Google Scholar]
- 102.Ciresan D., Meier U., Schmidhuber J. Multi-column deep neural networks for image classification. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2012:3642–3649. [Google Scholar]
- 103.Werbos P. Generalization of backpropagation with application to a recurrent gas market model. Neural Netw. 1988;1:339–356. [Google Scholar]
- 104.Pearlmutter B. Learning state space trajectories in recurrent neural networks. Neural Comput. 2011;1:263–269. [Google Scholar]
- 105.Hochreiter S., Bengio Y., Frasconi P., Schmidhuber J. Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. In: Kolen J.F., Kremer S.C., editors. A field guide to dynamical recurrent neural networks. Wiley-IEEE press; 2001. pp. 237–243. [Google Scholar]
- 106.Syed O. Case Western Reserve University Press; Cleveland: 1995. Applying genetic algorithms to recurrent neural networks for learning network parameters and architecture. [Google Scholar]
- 107.Gomez F., Schmidhuber J., Miikkulainen R. Accelerated neural evolution through cooperatively coevolved synapses. J Mach Learn Res. 2008;9:937–965. [Google Scholar]
- 108.Pereira S., Pinto A., Alves V., Silva C.A. Brain Tumor segmentation using convolutional neural networks in MRI images. IEEE Trans Med Imaging. 2016;35:1240–1251. doi: 10.1109/TMI.2016.2538465. [DOI] [PubMed] [Google Scholar]
- 109.Havaei M., Davy A., Warde-Farley D., Biard A., Courville A., Bengio Y. Brain tumor segmentation with deep neural networks. Med Image Anal. 2017;35:18–31. doi: 10.1016/j.media.2016.05.004. [DOI] [PubMed] [Google Scholar]
- 110.Moreira I.C., Amaral I., Domingues I., Cardoso A., Cardoso M.J., Cardoso J.S. INbreast: Toward a full-field digital mammographic database. Acad Radiol. 2012;19:236–248. doi: 10.1016/j.acra.2011.09.014. [DOI] [PubMed] [Google Scholar]
- 111.Health M., Bowyer K., Kopans D., Moore R., Kegelmeyer W.P., Sallam M. The digital database for screening mammography. In: Yaffe M.J., editor. Detection and characterization of mammographic masses by artificial neural network. Springer, Netherlands; Berlin: 2001. pp. 457–460. [Google Scholar]
- 112.Ngo T.A., Lu Z., Carneiro G. Combining deep learning and level set for the automated segmentation of the left ventricle of the heart from cardiac cine magnetic resonance. Med Image Anal. 2017;35:159–171. doi: 10.1016/j.media.2016.05.009. [DOI] [PubMed] [Google Scholar]
- 113.Roth HR, Farag A, Lu L, Turkbey EB, Summers RM. Deep convolutional networks for pancreas segmentation in CT imaging. ArXiv1504.03967.
- 114.Prasoon A., Petersen K., Igel C., Lauze F., Dam E., Nielsen M. Deep feature learning for knee cartilage segmentation using a triplanar convolutional neural network. Med Image Comput Comput Assist Interv. 2013;8150:246–253. doi: 10.1007/978-3-642-40763-5_31. [DOI] [PubMed] [Google Scholar]
- 115.Liao S., Gao Y., Oto A., Shen D. Representation learning: A unified deep learning framework for automatic prostate MR segmentation. Med Image Comput Comput Assist Interv. 2013;16:254–261. doi: 10.1007/978-3-642-40763-5_32. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 116.Guo Y.R., Wu G.R., Commander L.A., Szary S., Jewells V., Lin W.L. Segmenting hippocampus from infant brains by sparse patch matching with deep-learned features. Med Image Comput Comput Assist Interv. 2014;8674:308–315. doi: 10.1007/978-3-319-10470-6_39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 117.Kim M., Wu G., Shen D. Unsupervised deep learning for hippocampus segmentation in 7.0 tesla MR images. In: Wu G., Zhang D., Shen D., Yan P., Suzuki K., Wang F., editors. Proceedings of the 4th international workshop on machine learning in medical imaging. Springer-Verlag New York Inc.; New York: 2013. pp. 1–8. [Google Scholar]
- 118.Schlegl T., Waldstein S.M., Vogl W.D., Schmidt-Erfurth U., Langs G. Springer International Publishing AG; Basel: 2015. Predicting semantic descriptions from medical images with convolutional neural networks; pp. 437–448. [DOI] [PubMed] [Google Scholar]
- 119.Xu Y, Li Y, Liu M, Wang Y, Fan Y, Lai M, et al. Gland instance segmentation by deep multichannel neural networks. arXiv160704889. [DOI] [PubMed]
- 120.Lerouge J., Herault R., Chatelain C., Jardin F., Modzelewski R. IODA: an input/output deep architecture for image labeling. Pattern Recognit. 2015;48:2847–2858. [Google Scholar]
- 121.Moeskops P., Viergever M.A., Mendrik A.M., de Vries L.S., Benders M.J.N.L., Isgum I. Automatic segmentation of MR brain images with a convolutional neural network. IEEE Trans Med Imaging. 2016;35:1252–1261. doi: 10.1109/TMI.2016.2548501. [DOI] [PubMed] [Google Scholar]
- 122.Shin H.C., Orton M.R., Collins D.J., Doran S.J., Leach M.O. Stacked autoencoders for unsupervised feature learning and multiple organ detection in a pilot study using 4D patient data. IEEE Trans Pattern Anal Mach Intell. 2013;35:1930–1943. doi: 10.1109/TPAMI.2012.277. [DOI] [PubMed] [Google Scholar]
- 123.Roth H.R., Lee C.T., Shin H.C.C., Seff A., Kim L., Yao J. Anatomy-specific classification of medical images using deep convolutional nets. Proc IEEE Int Symp Biomed Imaging. 2015:101–104. [Google Scholar]
- 124.Sheet D., Karri S.P.K., Katouzian A., Navab N., Ray A.K., Chatterjee J. Deep learning of tissue specific speckle representations in optical coherence tomography and deeper exploration for in situ histology. Proc IEEE Int Symp Biomed Imaging. 2015:777–780. [Google Scholar]
- 125.Dou Q., Chen H., Yu L., Zhao L., Qin J., Wang D. Automatic detection of cerebral microbleeds from MR images via 3D convolutional neural networks. IEEE Trans Med Imaging. 2016;35:1182–1195. doi: 10.1109/TMI.2016.2528129. [DOI] [PubMed] [Google Scholar]
- 126.Wolterink J.M., Leiner T., de Vos B.D., van Hamersvelt R.W., Viergever M.A., Išgum I. Automatic coronary artery calcium scoring in cardiac CT angiography using paired convolutional neural networks. Med Image Anal. 2016;34:123–136. doi: 10.1016/j.media.2016.04.004. [DOI] [PubMed] [Google Scholar]
- 127.Zhou D.Q., Tran L., Wang J.H., Li J. A comparative study of two prediction models for brain tumor progression. Image Process Algorithms Syst. 2015:9399. [Google Scholar]
- 128.Tran L., Banerjee D., Wang J., Kumar A.J., Mckenzie F., Li Y. High-dimensional MRI data analysis using a large-scale manifold learning approach. Mach Vis Appl. 2013;24:995–1014. [Google Scholar]
- 129.Sirinukunwattana K., Raza S.E.A., Tsang Y.W., Snead D.R.J., Cree I.A., Rajpoot N.M. Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Trans Med Imaging. 2016;35:1196–1206. doi: 10.1109/TMI.2016.2525803. [DOI] [PubMed] [Google Scholar]
- 130.Xu Y., Zhu J.Y., Chang E., Tu Z. Multiple clustered instance learning for histopathology cancer image classification, segmentation and clustering. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2012:964–971. [Google Scholar]
- 131.Cireşan D.C., Giusti A., Gambardella L.M., Schmidhuber J. Mitosis detection in breast cancer histology images with deep neural networks. Med Image Comput Comput Assist Interv. 2013:411–418. doi: 10.1007/978-3-642-40763-5_51. [DOI] [PubMed] [Google Scholar]
- 132.Cruz-Roa A., Basavanhally A., Gonzalez F., Gilmore H., Feldman M., Ganesan S. Automatic detection of invasive ductal carcinoma in whole slide images with convolutional neural networks. Med Imaging. 2014:9041. [Google Scholar]
- 133.Kooi T., Litjens G., van Ginneken B., Gubern-Mérida A., Sánchez C.I., Mann R. Large scale deep learning for computer aided detection of mammographic lesions. Med Image Anal. 2017;35:303–312. doi: 10.1016/j.media.2016.07.007. [DOI] [PubMed] [Google Scholar]
- 134.Kallenberg M., Petersen K., Nielsen M., Ng A.Y., Diao P.F., Igel C. Unsupervised deep learning applied to breast density segmentation and mammographic risk scoring. IEEE Trans Med Imaging. 2016;35:1322–1331. doi: 10.1109/TMI.2016.2532122. [DOI] [PubMed] [Google Scholar]
- 135.Srivastava R., Cheng J., Wong D.W.K., Liu J. Using deep learning for robustness to parapapillary atrophy in optic disc segmentation. IEEE 12th Int Symp Biomed Imaging. 2015:768–771. [Google Scholar]
- 136.Fang T., Su R., Xie L., Gu Q., Li Q., Liang P. Retinal vessel landmark detection using deep learning and hessian matrix. Proc Int Symp Image Signal Process Anal. 2015:387–392. [Google Scholar]
- 137.Van Grinsven M.J.J.P., van Ginneken B., Hoyng C.B., Theelen T., Sanchez C.I. Fast convolutional neural network training using selective data sampling: application to hemorrhage detection in color fundus images. IEEE Trans Med Imaging. 2016;35:1273–1284. doi: 10.1109/TMI.2016.2526689. [DOI] [PubMed] [Google Scholar]
- 138.Prentašić P., Lončarić S. Detection of exudates in fundus photographs using convolutional neural networks. Proc Int Symp Image Signal Process Anal. 2015:188–192. [Google Scholar]
- 139.Arunkumar R., Karthigaikumar P. Multi-retinal disease classification by reduced deep learning features. Neural Comput Appl. 2015:1–6. [Google Scholar]
- 140.Mirowski P.W., LeCun Y., Madhavan D., Kuzniecky R. Comparing SVM and convolutional networks for epileptic seizure prediction from intracranial EEG. IEEE Int Workshop Mach Learn Signal Process. 2008:244–249. [Google Scholar]
- 141.Mirowski P.W., Madhavan D., Lecun Y. Time-delay neural networks and independent component analysis for Eeg-Based prediction of epileptic seizures propagation. Proc Conf AAAI Artif Intell. 2007:1892–1893. [Google Scholar]
- 142.Mirowski P., Madhavan D., LeCun Y., Kuzniecky R. Classification of patterns of EEG synchronization for seizure prediction. Clin Neurophysiol. 2009;120:1927–1940. doi: 10.1016/j.clinph.2009.09.002. [DOI] [PubMed] [Google Scholar]
- 143.Davidson P.R., Jones R.D., Peiris M.T.R. EEG-based lapse detection with high temporal resolution. IEEE Trans Biomed Eng. 2007;54:832–839. doi: 10.1109/TBME.2007.893452. [DOI] [PubMed] [Google Scholar]
- 144.Petrosian A., Prokhorov D., Homan R., Dasheiff R., Wunsch D. Recurrent neural network based prediction of epileptic seizures in intra- and extracranial EEG. Neurocomputing. 2000;30:201–218. [Google Scholar]
- 145.Chen Y., Li Y., Narayan R., Subramanian A., Xie X. Gene expression inference with deep learning. Bioinfarmatics. 2016;32:1832–1839. doi: 10.1093/bioinformatics/btw074. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 146.Zhang S., Zhou J., Hu H., Gong H., Chen L., Cheng C. A deep learning framework for modeling structural features of RNA-binding protein targets. Nucleic Acids Res. 2016;44:e32. doi: 10.1093/nar/gkv1025. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.Alipanahi B., Delong A., Weirauch M.T., Frey B.J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat Biotechnol. 2015;33:1–9. doi: 10.1038/nbt.3300. [DOI] [PubMed] [Google Scholar]
- 148.Lanchantin J, Singh R, Lin Z, Qi Y. Deep Motif: visualizing genomic sequence classifications. arXiv160501133.
- 149.Zeng H., Edwards M.D., Liu G., Gifford D.K. Convolutional neural network architectures for predicting DNA-protein binding. Bioinformatics. 2016;32:i121–i127. doi: 10.1093/bioinformatics/btw255. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Kelley D.R., Snoek J., Rinn J.L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 2016;26:990–999. doi: 10.1101/gr.200535.115. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 151.Liu F., Ren C., Li H., Zhou P., Bo X., Shu W. De novo identification of replication-timing domains in the human genome by deep learning. Bioinformatics. 2015;32:641–649. doi: 10.1093/bioinformatics/btv643. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Liu F., Li H., Ren C., Bo X., Shu W. PEDLA: predicting enhancers with a deep learning-based algorithmic framework. Sci Seq. 2016;6:28517. doi: 10.1038/srep28517. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 153.Park S, Min S, Choi H, Yoon S. deepMiRGene: deep neural network based precursor microrna prediction. arXiv1605.00017.
- 154.Lee B, Baek J, Park S, Yoon S. deepTarget: end-to-end learning framework for microRNA target prediction using deep recurrent neural networks. arXiv1603.09123.
- 155.Guigo R., Valcarcel J. Prescribing splicing. Science. 2015;347:124–125. doi: 10.1126/science.aaa4864. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Lee T, Yoon S. Boosted categorical restricted boltzmann machine for computational prediction of splice junctions. Proc Int Conf Mach Learn 2015;37.
- 157.Lee B, Lee T, Na B, Yoon S. DNA-level splice junction prediction using deep recurrent neural networks. arXiv1512.05135.
- 158.Xiong H.Y., Barash Y., Frey B.J. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics. 2011;27:2554–2562. doi: 10.1093/bioinformatics/btr444. [DOI] [PubMed] [Google Scholar]
- 159.Leung M.K.K., Xiong H.Y., Lee L.J., Frey B.J. Deep learning of the tissue-regulated splicing code. Bioinformatics. 2014;30:i121–i129. doi: 10.1093/bioinformatics/btu277. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Xiong H.Y., Alipanahi B., Lee L.J., Bretschneider H., Merico D., Yuen R.K.C. The human splicing code reveals new insights into the genetic determinants of disease. Science. 2014;347:1254806. doi: 10.1126/science.1254806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 161.Quang D., Chen Y., Xie X. DANN: A deep learning approach for annotating the pathogenicity of genetic variants. Bioinformatics. 2015;31:761–763. doi: 10.1093/bioinformatics/btu703. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 162.Zhou J., Troyanskaya O.G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat Methods. 2015;12:931–934. doi: 10.1038/nmeth.3547. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 163.Quang D., Xie X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 2016;44:11. doi: 10.1093/nar/gkw226. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Anfinsen C.B. The formation and stabilization of protein structure. Biochem J. 1972;128:737–749. doi: 10.1042/bj1280737. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 165.Gibson K.D., Scheraga H.A. Minimization of polypeptide energy. I. Preliminary structures of bovine pancreatic ribonuclease S-peptide. Proc Natl Acad Sci U S A. 1967;58:420–427. doi: 10.1073/pnas.58.2.420. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 166.Hammarstrom P., Wiseman R.L., Powers E.T., Kelly J.W. Prevention of transthyretin amyloid disease by changing protein misfolding energetics. Science. 2003;299:713–716. doi: 10.1126/science.1079589. [DOI] [PubMed] [Google Scholar]
- 167.Chiti F., Dobson C.M. Protein misfolding, functional amyloid, and human disease. Annu Rev Biochem. 2006;75:333–366. doi: 10.1146/annurev.biochem.75.101304.123901. [DOI] [PubMed] [Google Scholar]
- 168.Selkoe D.J. Folding proteins in fatal ways. Nature. 2003;426:900–904. doi: 10.1038/nature02264. [DOI] [PubMed] [Google Scholar]
- 169.Lyons J., Dehzangi A., Heffernan R., Sharma A., Paliwal K., Sattar A. Predicting backbone Cα angles and dihedrals from protein sequences by stacked sparse auto-encoder deep neural network. J Comput Chem. 2014;35:2040–2046. doi: 10.1002/jcc.23718. [DOI] [PubMed] [Google Scholar]
- 170.Heffernan R., Paliwal K., Lyons J., Dehzangi A., Sharma A., Wang J. Improving prediction of secondary structure, local backbone angles, and solvent accessible surface area of proteins by iterative deep learning. Sci Rep. 2015;5:11476. doi: 10.1038/srep11476. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 171.Spencer M., Eickholt J., Cheng J. A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM Trans Comput Biol Bioinform. 2015;12:103–112. doi: 10.1109/TCBB.2014.2343960. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 172.Baldi P., Pollastri G., Andersen C.A.F., Brunak S. Matching protein beta-sheet partners by feedforward and recurrent neural networks. Proc Int Conf Intell Syst Mol Biol. 2000:25–36. [PubMed] [Google Scholar]
- 173.Baldi P., Brunak S., Frasconi P., Soda G., Pollastri G. Exploiting the past and the future in protein secondary structure prediction. Bioinformatics. 1999;15:937–946. doi: 10.1093/bioinformatics/15.11.937. [DOI] [PubMed] [Google Scholar]
- 174.Pollastri G., Przybylski D., Rost B., Baldi P. Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles. Proteins. 2002;47:228–235. doi: 10.1002/prot.10082. [DOI] [PubMed] [Google Scholar]
- 175.Pollastri G., Baldi P. Prediction of contact maps by GIOHMMs and recurrent neural networks using lateral propagation from all four cardinal corners. Bioinformatics. 2002;18 doi: 10.1093/bioinformatics/18.suppl_1.s62. [DOI] [PubMed] [Google Scholar]
- 176.Baldi P., Pollastri G. The principled design of large-scale recursive neural network architectures-DAG-RNNs and the protein structure prediction problem. J Mach Learn Res. 2004;4:575–602. [Google Scholar]
- 177.Di Lena P., Nagata K., Baldi P. Deep architectures for protein contact map prediction. Bioinformatics. 2012;28:2449–2457. doi: 10.1093/bioinformatics/bts475. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 178.Sønderby SK, Winther O. Protein secondary structure prediction with long short term memory networks. arXiv1412.7828
- 179.Li Y., Shibuya T. Malphite: a convolutional neural network and ensemble learning based protein secondary structure predictor. Proc IEEE Int Conf Bioinformatics Biomed. 2015:1260–1266. [Google Scholar]
- 180.Lin Z, Lanchantin J, Qi Y. MUST-CNN: a multilayer shift-and-stitch deep convolutional architecture for sequence-based protein structure prediction. Proc Conf AAAI Artif Intell 2016:8.
- 181.Lena P.D., Nagata K., Baldi P.F. Deep spatio-temporal architectures and learning for protein structure prediction. Adv Neural Inf Process Syst. 2012:512–520. [Google Scholar]
- 182.Troyanskaya OG. Deep supervised and convolutional generative stochastic network for protein secondary structure prediction. Proc 31st Int Conf Mach Learn 2014; 32:745–53.
- 183.Wang S., Weng S., Ma J., Tang Q. DeepCNF-D: predicting protein order/disorder regions by weighted deep convolutional neural fields. Int J Mol Sci. 2015;16:17315–17330. doi: 10.3390/ijms160817315. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 184.Eickholt J., Cheng J. DNdisorder: predicting protein disorder using boosting and deep networks. BMC Bioinformatics. 2013;14:88. doi: 10.1186/1471-2105-14-88. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 185.Wang S., Li W., Liu S., Xu J. RaptorX-Property: a web server for protein structure property prediction. Nucleic Acids Res. 2016;44 doi: 10.1093/nar/gkw306. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 186.Shin H.C., Orton M., Collins D.J., Doran S., Leach M.O. Autoencoder in time-series analysis for unsupervised tissues characterisation in a large unlabelled medical image dataset. Proc Int Conf Mach Learn Appl. 2011;1:259–264. [Google Scholar]
- 187.Jia X., Li K., Li X., Zhang A. A novel semi-supervised deep learning framework for affective state recognition on EEG signals. Proc IEEE Int Symp Bioinformatics Bioeng. 2014:30–37. [Google Scholar]
- 188.He K., Zhang X., Ren S., Sun J. Deep residual learning for image recognition. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2015;7:171–180. [Google Scholar]
- 189.Szegedy C, Ioffe S, Vanhoucke V. Inception-v4, InceptionResNet and the impact of residual connections on learning. arXiv1602.07261.
- 190.Yarlagadda DVK, Rao P, Rao D. MitosisNet: a deep learning network for mitosis detection in breast cancer histopathology images. IEEE EMBS Int Conf Biomed Health Inform 2017.
- 191.Irshad H, Oh EY, Schmolze D, Quintana LM, Collins L, Tamimi RM, et al. Crowdsourcing scoring of immunohistochemistry images: evaluating performance of the crowd and an automated computational method. arXiv160606681. [DOI] [PMC free article] [PubMed]
- 192.Albarqouni S., Baur C., Achilles F., Belagiannis V., Demirci S., Navab N. AggNet: Deep learning from crowds for mitosis detection in breast cancer histology images. IEEE Trans Med Imaging. 2016;35:1313–1321. doi: 10.1109/TMI.2016.2528120. [DOI] [PubMed] [Google Scholar]




