Table 2.
Quantitative characteristics and stability of identified multivariate linear mixture models tested on full and undersampled dataset.
| Dataset size | 100% | 90% | 80% | 70% | 60% | 50% | |
|---|---|---|---|---|---|---|---|
| Model1 | Model detection rate [%] | *66.0 | *20.8 | 11.2 | 6.4 | 3.2 | 2.2 |
| Total number of identified models | 6 | 61 | 90 | 121 | 142 | 160 | |
| Height regression coefficient | 0.0049 ± 0.0002 | 0.0051 ± 0.0007 | 0.0052 ± 0.0008 | 0.0055 ± 0.0009 | 0.0059 ± 0.0011 | 0.0067 ± 0.0012 | |
| Fe regression coefficient | − 0.0715 ± 0.0005 | − 0.0702 ± 0.0025 | − 0.0687 ± 0.0035 | − 0.0684 ± 0.0049 | − 0.0664 ± 0.0053 | − 0.0654 ± 0.0074 | |
| Fer regression coefficient | 0.0025 ± 0.0001 | 0.0028 ± 0.0003 | 0.0031 ± 0.0004 | 0.0033 ± 0.0005 | 0.0037 ± 0.0007 | 0.0040 ± 0.0007 | |
| UIBC regression coefficient | 0.0119 ± 0.0004 | 0.0129 ± 0.0010 | 0.0136 ± 0.0015 | 0.0145 ± 0.0017 | 0.0153 ± 0.0023 | 0.0165 ± 0.0028 | |
| F-statistics | 38.41 ± 0.66 | 35.00 ± 2.43 | 31.54 ± 2.92 | 28.96 ± 3.31 | 25.45 ± 3.41 | 23.17 ± 3.83 | |
| Root mean square error | 0.6239 ± 0.0024 | 0.6221 ± 0.0094 | 0.6206 ± 0.0128 | 0.6162 ± 0.0163 | 0.6144 ± 0.0197 | 0.6012 ± 0.0247 | |
| Explained variance R2 [%] | 46.04 ± 0.42 | 46.61 ± 1.62 | 46.75 ± 2.20 | 47.95 ± 2.75 | 48.61 ± 3.28 | 50.92 ± 4.04 | |
| Pearson correlation (y1 vs yp1) | 0.643 ± 0.000 | 0.646 ± 0.012 | 0.648 ± 0.017 | 0.656 ± 0.021 | 0.663 ± 0.026 | 0.677 ± 0.031 | |
| Non-seizure/seizure separating threshold | 0.5744 ± 0.0317 | 0.6853 ± 0.0793 | 0.7355 ± 0.1091 | 0.8442 ± 0.1372 | 0.9531 ± 0.1674 | 1.0655 ± 0.2197 | |
| Training: sensitivity | 95.49 ± 1.61 | 93.90 ± 4.98 | 95.51 ± 4.60 | 92.68 ± 5.92 | 91.39 ± 6.18 | 93.40 ± 6.53 | |
| Training: specificity | 69.43 ± 1.15 | 70.90 ± 4.15 | 68.95 ± 5.25 | 72.24 ± 6.39 | 73.91 ± 7.31 | 71.25 ± 8.07 | |
| Testing: sensitivity | 87.32 ± 15.00 | 89.65 ± 12.12 | 84.72 ± 12.09 | 80.26 ± 13.38 | 83.27 ± 12.29 | ||
| Testing: specificity | 67.36 ± 18.25 | 65.29 ± 14.61 | 66.71 ± 10.79 | 67.45 ± 10.67 | 66.43 ± 9.29 | ||
| Model2 | Model detection rate [%] | *100.0 | *72.0 | *50.9 | *24.9 | 14.2 | 6.8 |
| Total number of identified models | 1 | 44 | 64 | 127 | 160 | 209 | |
| Age regression coefficient | − 0.0050 ± 0.0002 | − 0.0052 ± 0.0007 | − 0.0056 ± 0.0009 | − 0.0060 ± 0.0010 | − 0.0064 ± 0.0011 | − 0.0072 ± 0.0014 | |
| Height regression coefficient | 0.0036 ± 0.0002 | 0.0036 ± 0.0005 | 0.0038 ± 0.0006 | 0.0040 ± 0.0007 | 0.0043 ± 0.0008 | 0.0048 ± 0.0010 | |
| satFe regression coefficient | − 3.2236 ± 0.0455 | − 3.1911 ± 0.2123 | − 3.1108 ± 0.2796 | − 3.0829 ± 0.3491 | − 2.9630 ± 0.3964 | − 2.8432 ± 0.4344 | |
| UIBC regression coefficient | 0.0093 ± 0.0003 | 0.0094 ± 0.0011 | 0.0098 ± 0.0015 | 0.0100 ± 0.0016 | 0.0108 ± 0.0019 | 0.0113 ± 0.0020 | |
| F− statistics | 28.82 ± 0.63 | 25.12 ± 2.25 | 22.85 ± 2.59 | 21.00 ± 3.00 | 19.20 ± 3.24 | 17.28 ± 3.37 | |
| Root mean square error | 0.3620 ± 0.0019 | 0.3638 ± 0.0076 | 0.3642 ± 0.0097 | 0.3601 ± 0.0123 | 0.3558 ± 0.0150 | 0.3509 ± 0.0175 | |
| Explained variance R2 [%] | 47.38 ± 0.54 | 47.47 ± 2.10 | 48.01 ± 2.69 | 49.13 ± 3.41 | 51.02 ± 4.08 | 53.10 ± 4.60 | |
| Pearson correlation (y2 vs yp2) | 0.660 ± 0.000 | 0.662 ± 0.015 | 0.667 ± 0.020 | 0.674 ± 0.026 | 0.691 ± 0.031 | 0.704 ± 0.034 | |
| Non− seizure/seizure separating threshold | 0.3495 ± 0.0261 | 0.3657 ± 0.0899 | 0.3779 ± 0.1170 | 0.4135 ± 0.1332 | 0.4652 ± 0.1607 | 0.4906 ± 0.1730 | |
| Training: sensitivity | 83.53 ± 1.04 | 81.15 ± 4.19 | 83.30 ± 4.35 | 80.72 ± 5.91 | 82.28 ± 6.42 | 85.87 ± 6.79 | |
| Training: specificity | 82.89 ± 0.92 | 86.06 ± 4.14 | 84.81 ± 4.94 | 88.90 ± 5.16 | 89.72 ± 5.66 | 88.30 ± 7.02 | |
| Testing: sensitivity | 75.60 ± 18.66 | 75.50 ± 13.62 | 71.20 ± 12.15 | 70.69 ± 10.81 | 72.76 ± 10.10 | ||
| Testing: specificity | 81.14 ± 15.07 | 78.56 ± 12.53 | 81.53 ± 9.97 | 79.77 ± 10.16 | 77.35 ± 11.19 | ||
| Model3 | Model detection rate [%] | *51.5 | 28.4 | 10.4 | 4.6 | 2.0 | 1.1 |
| Total number of identified models | 15 | 73 | 203 | 293 | 383 | 506 | |
| Height regression coefficient | − 0.0072 ± 0.0005 | − 0.0070 ± 0.0007 | − 0.0079 ± 0.0011 | − 0.0080 ± 0.0012 | − 0.0083 ± 0.0012 | − 0.0088 ± 0.0012 | |
| HGB regression coefficient | 0.0129 ± 0.0009 | 0.0136 ± 0.0013 | 0.0153 ± 0.0022 | 0.0158 ± 0.0024 | 0.0171 ± 0.0028 | 0.0179 ± 0.0028 | |
| satFe regression coefficient | 6.1796 ± 0.5323 | 6.1236 ± 0.8455 | 6.8889 ± 1.3212 | 7.0798 ± 1.4197 | 7.8790 ± 1.7529 | 9.1360 ± 2.7797 | |
| F-statistics | 8.24 ± 0.83 | 7.41 ± 1.30 | 8.79 ± 2.07 | 8.48 ± 2.31 | 8.60 ± 2.54 | 10.17 ± 3.67 | |
| Root mean square error | 0.4182 ± 0.0055 | 0.4130 ± 0.0095 | 0.4068 ± 0.0148 | 0.3917 ± 0.0178 | 0.3799 ± 0.0218 | 0.3615 ± 0.0300 | |
| Explained variance R2 [%] | 26.04 ± 1.93 | 26.37 ± 3.35 | 32.33 ± 4.90 | 35.13 ± 5.75 | 40.08 ± 6.62 | 47.23 ± 8.58 | |
| Pearson correlation (y3 vs yp3) | 0.441 ± 0.005 | 0.457 ± 0.033 | 0.495 ± 0.046 | 0.533 ± 0.053 | 0.577 ± 0.057 | 0.630 ± 0.067 | |
| Non-recurrent/recurrent seizure separating threshold | 1.4001 ± 0.0999 | 1.4947 ± 0.1410 | 1.6849 ± 0.2406 | 1.7372 ± 0.2945 | 1.9210 ± 0.3366 | 2.0830 ± 0.3144 | |
| Training: sensitivity | 83.86 ± 7.67 | 86.15 ± 10.19 | 88.11 ± 11.57 | 91.45 ± 10.93 | 92.50 ± 8.92 | 92.03 ± 9.59 | |
| Training: specificity | 58.44 ± 6.69 | 60.50 ± 6.80 | 64.23 ± 9.26 | 66.54 ± 10.81 | 69.81 ± 10.25 | 76.00 ± 10.77 | |
| Testing: sensitivity | 73.33 ± 44.24 | 69.80 ± 30.16 | 74.70 ± 28.62 | 74.29 ± 26.08 | 69.81 ± 24.57 | ||
| Testing: specificity | 45.18 ± 27.23 | 44.85 ± 19.98 | 42.13 ± 17.47 | 46.57 ± 14.52 | 49.61 ± 12.24 |
All values were averaged from utilized 5000 iterations with randomized initial conditions. Values are represented as mean ± standard deviation among the iterations. In a majority of the listed quantitative measurements, the mean values are quite stable and standard deviation increases as the dataset is more undersampled.
*The bold highlighted “Model detection rate” represents that the model with listed regression coefficients has been the most often identified as the best model characterizing the data among the iterations.
The adaptive synthetic sampling matched the number of female samples in the case groups to minimize the risk of the imbalanced learning within each modeling iteration.
The separating threshold has been identified by maximizing sum of sensitivity and specificity. Then, the classifying sensitivity and specificity have been tested on the training dataset itself and on the training dataset (i.e., the samples excluded from the training due to dataset undersampling).