[Preprint]. 2024 Feb 17:2023.04.26.538471. [Version 2] doi: 10.1101/2023.04.26.538471

Table 1:

Breakdown of the top-performing models into key components

Participant team name	NN architecture type	Input encoding and channels	Input flanking region length	Usage of reverse strand during model training	Train validation split	Parameters (millions)	Optimizer	Loss function	Learning Rate Scheduler	Metric
Autosome.org	CNN (EfficientNetV 2 (32)) OHE	OHE [6:bases/NC^/RC^	70	Data aug. (additional channel) + Model (additional channel)	100–0	1.9	AdamW (37)	Kullback-Leibler divergence	One Cycle LR	r,ρ^#
BHI	CNN + RNN (Bi-LSTM) (34)	OHE [4:bases]	30	Post-hoc conjoined setting (41)	100–0	6.8	AdamW (37)	Huber	Cosine Anneal LR	r,ρ^#
Unlock_DNA	Transformer	OHE [6:bases/N^/M^]	20	Input to model (concat. with forward strand)	95–5	47.4	Adam (36)	MSE + custom	One Cycle LR	r
Camformers	CNN (ResNet (33))	OHE [4:bases]	30	None	90–10	16.6	AdamW (37)	L1	Reduce LR On Plateau	r,ρ
NAD	CNN + Transformer	GloVe (38) [128]	0	None	90–10	15.5	AdamW (37) + GSAM (42)	smooth L1	Linear LR	r
wztr	CNN (ResNet (33))	OHE [4:bases]	62	Input to model (concat. with forward strand)	99–1	4.8	Adam (36)	MSE	Reduce LR On Plateau	r
High Schoolers Are All You Need (High Schoolers)	CNN + Transformer + MLP	OHE [4:bases]	31	Model (RC parameter sharing) (41)	98–2	4.7	Adam (36) + SWA (43)	MSE	Multi Step LR	r
BioNML	Vision Transformer (44)	OHE [4:bases]	30	Model (RC parameter sharing) (41)	86–14	78.7	Adamax (36) + L2 regularizer	Huber	Multi Step LR	r , CoL
BUGF	Transformer	OHE [6:bases/N^/P^]	32	None	94–6	4.5	RAdam (45)	Multi-label focal loss (46) + custom	None	r
mt	GRU (47) +CNN	OHE [6:bases/N^/P^]	62	Model (RC parameter sharing) (41)	99.8–0.2	3.1	Adam (36)	binary cross.	None	r ,CoD ^#

NC: If the sequence was present in more than one cell, 0 for all bases, otherwise 1; RC: If the sequence is reverse-complemented, 1 for all bases, otherwise 0; N: If a base is unknown, 1 for that base, otherwise 0; P: If a base has been padded to maintain fixed input length, 1 for that base, otherwise 0; M: If a base is masked 1 for that base otherwise 0.

These teams employed the metrics in a cross-validation setting to determine the optimal number of epochs for their models and ultimately saved the model weights after running for the n epochs, without relying on validation metric scores. In contrast, other teams utilized validation metric scores to select the best-performing model.