Skip to main content
. Author manuscript; available in PMC: 2021 May 19.
Published in final edited form as: Adv Neural Inf Process Syst. 2020 Dec;33:18296–18307.

Table 7:

Exact D-Cal, Soft-Dcal, and NLL at end of training, evaluated on training data for models trained with λ = 10 and batch size 1,000. Approximation improves as γ increases. Gradients vanish when γ gets too large. All experiments are better in calibration than the λ = 0 MLE model, which has exact D-cal 0.09.

γ 10 102 103 104 105 106 107 5 × 107
Exact D-Cal 0.2337 0.0095 0.0079 0.0039 0.0025 0.0014 0.0015 0.0048
Soft D-Cal 0.4599 0.0604 0.0074 0.0039 0.0025 0.0014 0.0015 0.0048
NLL 2.1180 1.1362 1.0793 1.2508 1.6993 2.3873 2.6940 3.4377