Appendix Table A1.
Architecture | Config. | Capacity | Dev. (%) | |
---|---|---|---|---|
Encoder | = | 72 | 252 k | |
Encoder | = | 45 | 250 k | |
Encoder | = | 27 | 247 k | |
Encoder | = | 16 | 255 k | |
Encoder | = | 9 | 260 k | |
Decomposer | = 0 | 63 | 253 k | |
Decomposer | = 1 | 59 | 254 k | |
Decomposer | = 2 | 55 | 248 k | |
Decomposer | = 3 | 52 | 246 k | |
Detector | = | 62 | 248 k | |
Detector | = | 62 | 252 k | |
Detector | = | 61 | 253 k | |
Detector | = | 58 | 246 k | |
Encoder | = | 110 | 498 k | |
Decomposer | = 0 | 89 | 504 k | |
Detector | = | 87 | 503 k |
Each model configuration uses a different network width held constant across its convolutional layers to reach a target capacity of 250 k or 500 k parameters, up to a small deviation.