Table 1:
Breakdown of the top-performing models into key components
Participant team name | NN architecture type | Input encoding and channels | Input flanking region length | Usage of reverse strand during model training | Train validation split | Parameters (millions) | Optimizer | Loss function | Learning Rate Scheduler | Metric |
---|---|---|---|---|---|---|---|---|---|---|
Autosome.org | CNN (EfficientNetV 2 (32)) OHE | OHE [6:bases/NC*/RC* | 70 | Data aug. (additional channel) + Model (additional channel) | 100–0 | 1.9 | AdamW (37) | Kullback-Leibler divergence | One Cycle LR | r,ρ# |
BHI | CNN + RNN (Bi-LSTM) (34) | OHE [4:bases] | 30 | Post-hoc conjoined setting (41) | 100–0 | 6.8 | AdamW (37) | Huber | Cosine Anneal LR | r,ρ# |
Unlock_DNA | Transformer | OHE [6:bases/N*/M*] | 20 | Input to model (concat. with forward strand) | 95–5 | 47.4 | Adam (36) | MSE + custom | One Cycle LR | r |
Camformers | CNN (ResNet (33)) | OHE [4:bases] | 30 | None | 90–10 | 16.6 | AdamW (37) | L1 | Reduce LR On Plateau | r,ρ |
NAD | CNN + Transformer | GloVe (38) [128] | 0 | None | 90–10 | 15.5 | AdamW (37) + GSAM (42) | smooth L1 | Linear LR | r |
wztr | CNN (ResNet (33)) | OHE [4:bases] | 62 | Input to model (concat. with forward strand) | 99–1 | 4.8 | Adam (36) | MSE | Reduce LR On Plateau | r |
High Schoolers Are All You Need (High Schoolers) | CNN + Transformer + MLP | OHE [4:bases] | 31 | Model (RC parameter sharing) (41) | 98–2 | 4.7 | Adam (36) + SWA (43) | MSE | Multi Step LR | r |
BioNML | Vision Transformer (44) | OHE [4:bases] | 30 | Model (RC parameter sharing) (41) | 86–14 | 78.7 | Adamax (36) + L2 regularizer | Huber | Multi Step LR | r , CoL |
BUGF | Transformer | OHE [6:bases/N*/P*] | 32 | None | 94–6 | 4.5 | RAdam (45) | Multi-label focal loss (46) + custom | None | r |
mt | GRU (47) +CNN | OHE [6:bases/N*/P*] | 62 | Model (RC parameter sharing) (41) | 99.8–0.2 | 3.1 | Adam (36) | binary cross. | None | r ,CoD # |
NC: If the sequence was present in more than one cell, 0 for all bases, otherwise 1; RC: If the sequence is reverse-complemented, 1 for all bases, otherwise 0; N: If a base is unknown, 1 for that base, otherwise 0; P: If a base has been padded to maintain fixed input length, 1 for that base, otherwise 0; M: If a base is masked 1 for that base otherwise 0.
These teams employed the metrics in a cross-validation setting to determine the optimal number of epochs for their models and ultimately saved the model weights after running for the n epochs, without relying on validation metric scores. In contrast, other teams utilized validation metric scores to select the best-performing model.