Skip to main content
. 2025 Jul 24;11:e3001. doi: 10.7717/peerj-cs.3001

Table 3. Table of hyperparameters values used in each model.

Model Parameters
RNN h, input_size, inference_input_size, loss=MQLoss, scaler_type, encoder_n_layers, encoder_hidden_size, context_size, decoder_hidden_size, decoder_layers, max_steps, full_horizon, model_name
LSTM h, input_size, loss=DistributionLoss, scaler_type, encoder_n_layers, encoder_hidden_size, context_size, decoder_hidden_size, decoder_layers, max_steps, full_horizon, model_name
GRU h, input_size, loss=DistributionLoss, scaler_type, encoder_n_layers, encoder_hidden_size, context_size, decoder_hidden_size, decoder_layers, max_steps, full_horizon, model_name
TCN h, input_size, loss=GMM, learning_rate=5e−4, kernel_size=2, dilations=[1, 2, 4, 8, 16], encoder_hidden_size, context_size, decoder_hidden_size, decoder_layers, scaler_type, max_steps, full_horizon, model_name
DeepAR h, input_size, lstm_n_layers=3, trajectory_samples=100, loss=DistributionLoss, learning_rate=0.005, max_steps, val_check_steps, early_stop_patience_steps, scaler_type=standard, full_horizon, model_name
DilatedRNN h, input_size, loss=DistributionLoss, scaler_type=robust, encoder_hidden_size, max_steps, full_horizon, model_name
BiTCN h, input_size=24, loss=GMM, max_steps=100, scaler_type=standard, full_horizon, model_name
TFT h, input_size=tune.choice([horizon]), hidden_size=tune.choice([8, 32]), n_head=tune.choice([2, 8]), learning_rate=tune.loguniform(1e−4, 1e−1), scaler_type=tune.choice([robust, standard]), max_steps=tune.choice([500, 1,000]), windows_batch_size=tune.choice([8, 32]), check_val_every_n_epoch=tune.choice([100]), random_seed=tune.randint(1, 20), num_samples=10, freq=’H’, save_dataset=True, overwrite=True, full_horizon, model_name
VanillaTransformer h, input_size=horizon, hidden_size=16, conv_hidden_size=32, n_head=2, loss=MAE, scaler_type=robust, learning_rate=1e−3, max_steps=500, full_horizon, model_name
Informer h, input_size=horizon, hidden_size=16, conv_hidden_size=32, n_head=2, learning_rate=1e−3, scaler_type=robust, max_steps=500, full_horizon, model_name
Former h, input_size=horizon, hidden_size=16, conv_hidden_size=32, n_head=2, learning_rate=1e−3, scaler_type=robust, max_steps=500, full_horizon, model_name
FEDformer h, input_size=24, modes=64, hidden_size=64, conv_hidden_size=128, n_head=8, learning_rate=1e−3, scaler_type=robust, max_steps=500, batch_size=2, windows_batch_size=32, val_check_steps=50, early_stop_patience_steps=2, full_horizon, model_name
PatchTST h, input_size=104, patch_len=24, stride=24, revin=False, hidden_size=16, n_heads=4, scaler_type=robust, learning_rate=1e−3, max_steps=500, val_check_steps=50, early_stop_patience_steps=2, full_horizon, model_name
iTransformer h, input_size=24, n_series=2, hidden_size=128, n_heads=2, e_layers=2, d_layers=1, d_ff=4, factor=1, dropout=0.1, use_norm=True, loss=MSE, valid_loss=MAE, early_stop_patience_steps=3, batch_size=32, full_horizon, model_name