Table A4:
Parameter | Query function selection in Table A3 | Active learning evaluation in Figure 7 |
---|---|---|
batch size | 20 | 100 |
learning rate | 0.001 | 0.005 |
maximum gradient L2 norm | 1.0 | 1.0 |
maximum length | 200 | 200 |
number of epochs | 500 | 500 |
LSTM hidden size | 100 | 100 |
dropout, input to LSTM | 0.7 | 0.4 |
dropout, output of LSTM | 0.0 | 0.4 |
dropout, self-attention | 0.7 | 0.4 |