Skip to main content
. 2023 Jan 11;3(1):100384. doi: 10.1016/j.crmeth.2022.100384

Table 3.

Proteomic- and phenotype-level deep-learning applications

Method name Year Main functionalities Datasets Model Species
Post-translational modification (PTM):

DeepPhos112 2019 phosphorylation site prediction (general/residual-specific/kinase-specific)
  • phosphorylation sites collection
  • 12,810 protein sequences

  • de-duplication criterion
    • o
      CD-HIT118 similarity 40%
  • Densely Connected CNN (DC-CNN)119 (21 aa input)

  • prediction tasks
    • o
      general prediction
    • o
      residual-specific prediction
    • o
      kinase-specific prediction
human
DeepUbi120 2019 ubiquitination prediction
  • PLMD v3.0121
    • o
      25,103 proteins
    • o
      53,999 positives
    • o
      50,315 negatives
    • o
      CD-HIT similarity 30%
  • CNN (31 aa input)

multiple (176 species)
MusiteDeep122,123,124 2017-2020 multiple PTM prediction
  • UniProt13
    • o
      13 PTM types used in the final version
  • de-duplication criterion
    • o
      CD-HIT similarity 40or50
  • ensemble and bootstrapping of the following two models (33 aa input)
    • o
      multi-layer CNN
    • o
      Capsule Network125
  • prediction results publicly available at https://www.musite.net.

multiple animal species

Protein-subcellular localization:

DeepLoc126 2017 subcellular localization prediction
  • UniProt release 2016_04
    • o
      40 aa
    • o
      with no more than 1 subcellular location
    • o
      with experimental support
    • o
      CD-HIT similarity ≤30%
  • Höglund et al., 2006127

  • 10 subcellular locations in total

  • CNN + LSTM (max. 1,000 aa input)

  • attention-based decoding

  • hierarchical tree classification and likelihood

multiple eukaryotes

Genotype-to-phenotype inference in animal species:

Zhou et al.128 2019 prediction of the effect of non-coding variants to autism spectrum disorder
  • Roadmap Epigenomics histone marks and DNase I profiles
    • o
      2,002 epigenetic features
  • ENCODE and previously published CLIP datasets
    • o
      231 profiles for a total of 82 RBPs
  • The Simons Simplex Collection of whole-genome sequencing data of 7,097 genomes for 1,790 ASD-affected families

  • transcriptional regulatory effects model
    • o
      DeepSEA with doubled convolution layers
    • o
      model prediction expanded from the original 919 epigenetic targets to 2,002 targets
  • post-transcriptional regulatory effects model
    • o
      similar architecture as DeepSEA
    • o
      prediction of binding affinity of 82 unique RBPs
human
DeepWAS129 2020 using genomic deep-learning model to enhance genome-wide association studies
  • KKNMS microarray profiles for multiple sclerosis (MS)130

  • MDDC microarray profiles for major depressive disorder (MDD)131

  • KORA microarray profiles for body height132

  • using the pre-trained DeepSEA model for prioritizing variants that affect genomic functional units

  • using the prioritized variants to propose candidate variants for GWAS analysis

human

Genotype-to-phenotype inference in plant species:

DeepGP133 2020 multiple phenotype prediction in polyploid outcrossing species
  • five advanced selection trials of strawberry (University of Florida)134
    • o
      evaluation of five yield and fruit quality traits
      • soluble solid content
      • average fruit weight
      • total marketable yield
      • early marketable yield
      • percentage of culled fruit
    • o
      microarray genotyping of 1,233 individuals
  • one cycle of blueberry breeding program (University of Florida)135
    • o
      evaluation of yield and fruit quality traits
      • firmness
      • fruit size
      • weight
      • yield
      • scar
    • o
      genotyping by Rapid Genomics Capture-seq136
  • using both CNNs and Bayesian penalized linear regression for phenotype prediction.

strawberry
blueberry
Shook et al.137 2021 crop yield prediction based on genotype and environmental factors
  • Uni-form Soybean Tests data138
    • o
      soybean yield in USA and Canada during 2003–2015
  • LSTM and temporal attention model

soybean