| features (inputs) |
4 |
| responses (outputs) |
1 |
| hidden units |
50 |
| max epochs |
256 |
| mini-batch size |
128 |
| gradient
threshold |
0.2000 |
| initial learn
rate |
0.3000 |
| verbose (indicator
to display training progress information) |
1 |
| training options |
sgdm (stochastic gradient
descent with momentum) |
| momentum |
0.9000 |
| L2 Regularization (factor for L2
regularization) |
0.0001 |
| OutputMode
(format of output) |
sequence |
| StateActivationFunction (activation function to update the
cell and hidden state) |
tanh |
| GateActivationFunction (activation function to apply to the
gates) |
sigmoid |
| shuffle |
once |
| InputWeightsInitializer (function
to initialize input weights) |
Glorot (Glorot initializer) |
| LearnRateSchedule (option for dropping the learning
rate during
training) |
none (the learning rate remains constant throughout
the training) |
| RecurrentWeightsInitializer
(function to initialize recurrent
weights) |
orthogonal |
| BiasInitializer
(function to initialize bias) |
unit-forget-gate |
| GradientThresholdMethod (gradient threshold method) |
l2norm |
| SequenceLength (option to pad,
truncate, or split input sequences) |
longest |
| SequencePaddingDirection (direction of padding or truncation) |
right |
| ExecutionEnvironment (hardware
resource for the training network) |
auto |