Skip to main content
. 2023 Oct 13;13:17372. doi: 10.1038/s41598-023-43599-5

Table 2.

Quantitative characteristics and stability of identified multivariate linear mixture models tested on full and undersampled dataset.

Dataset size 100% 90% 80% 70% 60% 50%
Model1 Model detection rate [%] *66.0 *20.8 11.2 6.4 3.2 2.2
Total number of identified models 6 61 90 121 142 160
Height regression coefficient 0.0049 ± 0.0002 0.0051 ± 0.0007 0.0052 ± 0.0008 0.0055 ± 0.0009 0.0059 ± 0.0011 0.0067 ± 0.0012
Fe regression coefficient − 0.0715 ± 0.0005 − 0.0702 ± 0.0025 − 0.0687 ± 0.0035 − 0.0684 ± 0.0049 − 0.0664 ± 0.0053 − 0.0654 ± 0.0074
Fer regression coefficient 0.0025 ± 0.0001 0.0028 ± 0.0003 0.0031 ± 0.0004 0.0033 ± 0.0005 0.0037 ± 0.0007 0.0040 ± 0.0007
UIBC regression coefficient 0.0119 ± 0.0004 0.0129 ± 0.0010 0.0136 ± 0.0015 0.0145 ± 0.0017 0.0153 ± 0.0023 0.0165 ± 0.0028
F-statistics 38.41 ± 0.66 35.00 ± 2.43 31.54 ± 2.92 28.96 ± 3.31 25.45 ± 3.41 23.17 ± 3.83
Root mean square error 0.6239 ± 0.0024 0.6221 ± 0.0094 0.6206 ± 0.0128 0.6162 ± 0.0163 0.6144 ± 0.0197 0.6012 ± 0.0247
Explained variance R2 [%] 46.04 ± 0.42 46.61 ± 1.62 46.75 ± 2.20 47.95 ± 2.75 48.61 ± 3.28 50.92 ± 4.04
Pearson correlation (y1 vs yp1) 0.643 ± 0.000 0.646 ± 0.012 0.648 ± 0.017 0.656 ± 0.021 0.663 ± 0.026 0.677 ± 0.031
Non-seizure/seizure separating threshold 0.5744 ± 0.0317 0.6853 ± 0.0793 0.7355 ± 0.1091 0.8442 ± 0.1372 0.9531 ± 0.1674 1.0655 ± 0.2197
Training: sensitivity 95.49 ± 1.61 93.90 ± 4.98 95.51 ± 4.60 92.68 ± 5.92 91.39 ± 6.18 93.40 ± 6.53
Training: specificity 69.43 ± 1.15 70.90 ± 4.15 68.95 ± 5.25 72.24 ± 6.39 73.91 ± 7.31 71.25 ± 8.07
Testing: sensitivity 87.32 ± 15.00 89.65 ± 12.12 84.72 ± 12.09 80.26 ± 13.38 83.27 ± 12.29
Testing: specificity 67.36 ± 18.25 65.29 ± 14.61 66.71 ± 10.79 67.45 ± 10.67 66.43 ± 9.29
Model2 Model detection rate [%] *100.0 *72.0 *50.9 *24.9 14.2 6.8
Total number of identified models 1 44 64 127 160 209
Age regression coefficient − 0.0050 ± 0.0002 − 0.0052 ± 0.0007 − 0.0056 ± 0.0009 − 0.0060 ± 0.0010 − 0.0064 ± 0.0011 − 0.0072 ± 0.0014
Height regression coefficient 0.0036 ± 0.0002 0.0036 ± 0.0005 0.0038 ± 0.0006 0.0040 ± 0.0007 0.0043 ± 0.0008 0.0048 ± 0.0010
satFe regression coefficient − 3.2236 ± 0.0455 − 3.1911 ± 0.2123 − 3.1108 ± 0.2796 − 3.0829 ± 0.3491 − 2.9630 ± 0.3964 − 2.8432 ± 0.4344
UIBC regression coefficient 0.0093 ± 0.0003 0.0094 ± 0.0011 0.0098 ± 0.0015 0.0100 ± 0.0016 0.0108 ± 0.0019 0.0113 ± 0.0020
F− statistics 28.82 ± 0.63 25.12 ± 2.25 22.85 ± 2.59 21.00 ± 3.00 19.20 ± 3.24 17.28 ± 3.37
Root mean square error 0.3620 ± 0.0019 0.3638 ± 0.0076 0.3642 ± 0.0097 0.3601 ± 0.0123 0.3558 ± 0.0150 0.3509 ± 0.0175
Explained variance R2 [%] 47.38 ± 0.54 47.47 ± 2.10 48.01 ± 2.69 49.13 ± 3.41 51.02 ± 4.08 53.10 ± 4.60
Pearson correlation (y2 vs yp2) 0.660 ± 0.000 0.662 ± 0.015 0.667 ± 0.020 0.674 ± 0.026 0.691 ± 0.031 0.704 ± 0.034
Non− seizure/seizure separating threshold 0.3495 ± 0.0261 0.3657 ± 0.0899 0.3779 ± 0.1170 0.4135 ± 0.1332 0.4652 ± 0.1607 0.4906 ± 0.1730
Training: sensitivity 83.53 ± 1.04 81.15 ± 4.19 83.30 ± 4.35 80.72 ± 5.91 82.28 ± 6.42 85.87 ± 6.79
Training: specificity 82.89 ± 0.92 86.06 ± 4.14 84.81 ± 4.94 88.90 ± 5.16 89.72 ± 5.66 88.30 ± 7.02
Testing: sensitivity 75.60 ± 18.66 75.50 ± 13.62 71.20 ± 12.15 70.69 ± 10.81 72.76 ± 10.10
Testing: specificity 81.14 ± 15.07 78.56 ± 12.53 81.53 ± 9.97 79.77 ± 10.16 77.35 ± 11.19
Model3 Model detection rate [%] *51.5 28.4 10.4 4.6 2.0 1.1
Total number of identified models 15 73 203 293 383 506
Height regression coefficient − 0.0072 ± 0.0005 − 0.0070 ± 0.0007 − 0.0079 ± 0.0011 − 0.0080 ± 0.0012 − 0.0083 ± 0.0012 − 0.0088 ± 0.0012
HGB regression coefficient 0.0129 ± 0.0009 0.0136 ± 0.0013 0.0153 ± 0.0022 0.0158 ± 0.0024 0.0171 ± 0.0028 0.0179 ± 0.0028
satFe regression coefficient 6.1796 ± 0.5323 6.1236 ± 0.8455 6.8889 ± 1.3212 7.0798 ± 1.4197 7.8790 ± 1.7529 9.1360 ± 2.7797
F-statistics 8.24 ± 0.83 7.41 ± 1.30 8.79 ± 2.07 8.48 ± 2.31 8.60 ± 2.54 10.17 ± 3.67
Root mean square error 0.4182 ± 0.0055 0.4130 ± 0.0095 0.4068 ± 0.0148 0.3917 ± 0.0178 0.3799 ± 0.0218 0.3615 ± 0.0300
Explained variance R2 [%] 26.04 ± 1.93 26.37 ± 3.35 32.33 ± 4.90 35.13 ± 5.75 40.08 ± 6.62 47.23 ± 8.58
Pearson correlation (y3 vs yp3) 0.441 ± 0.005 0.457 ± 0.033 0.495 ± 0.046 0.533 ± 0.053 0.577 ± 0.057 0.630 ± 0.067
Non-recurrent/recurrent seizure separating threshold 1.4001 ± 0.0999 1.4947 ± 0.1410 1.6849 ± 0.2406 1.7372 ± 0.2945 1.9210 ± 0.3366 2.0830 ± 0.3144
Training: sensitivity 83.86 ± 7.67 86.15 ± 10.19 88.11 ± 11.57 91.45 ± 10.93 92.50 ± 8.92 92.03 ± 9.59
Training: specificity 58.44 ± 6.69 60.50 ± 6.80 64.23 ± 9.26 66.54 ± 10.81 69.81 ± 10.25 76.00 ± 10.77
Testing: sensitivity 73.33 ± 44.24 69.80 ± 30.16 74.70 ± 28.62 74.29 ± 26.08 69.81 ± 24.57
Testing: specificity 45.18 ± 27.23 44.85 ± 19.98 42.13 ± 17.47 46.57 ± 14.52 49.61 ± 12.24

All values were averaged from utilized 5000 iterations with randomized initial conditions. Values are represented as mean ± standard deviation among the iterations. In a majority of the listed quantitative measurements, the mean values are quite stable and standard deviation increases as the dataset is more undersampled.

*The bold highlighted “Model detection rate” represents that the model with listed regression coefficients has been the most often identified as the best model characterizing the data among the iterations.

The adaptive synthetic sampling matched the number of female samples in the case groups to minimize the risk of the imbalanced learning within each modeling iteration.

The separating threshold has been identified by maximizing sum of sensitivity and specificity. Then, the classifying sensitivity and specificity have been tested on the training dataset itself and on the training dataset (i.e., the samples excluded from the training due to dataset undersampling).