Table 2:

Performance of TopLapGBT with existing state-of-the-art models on the independent blind test classification.

Performance Metric		Independent Test
Performance Metric		PON-Sol [5]	SODA	SODA(5 as Threshold)	SODA(10 as Threshold)	SODA(17 as Threshold)	PON-Sol2 [11]	TopGBT	TopLapGBT
PPV	−	0.593/0.428	0.427/0.258	0.606/0.428	0.673/0.468	0.742/0.585	0.804/0.643	0.781/0.649	0.789/0.645
	N	0.427/0.385	NaN/NaN	0.425/0.365	0.397/0.357	0.383/0.350	0.600/0.475	0.617/0.462	0.624/0.475
	+	0.151/0.373	0.080/0.229	0.047/0.149	0.060/0.184	0.098/0.284	0.233/0.472	0.524/0.761	0.476/0.718
NPV	−	0.514/0.691	0.373/0.537	0.508/0.684	0.502/0.677	0.501/0.677	0.794/0.887	0.843/0.920	0.842/0.918
	N	0.685/0.700	0.642/0.667	0.761/0.739	0.797/0.782	0.797/0.782	0.804/0.793	0.816/0.795	0.826/0.809
	+	0.881/0.693	0.832/0.605	0.848/0.633	0.858/0.649	0.858/0.649	0.879/0.684	0.881/0.692	0.880/0.688
Sensitivity	−	0.263/0.263	0.488/0.488	0.195/0.195	0.098/0.098	0.068/0.068	0.802/0.802	0.867/0.867	0.864/0.864
	N	0.456/0.456	0.000/0.000	0.759/0.759	0.886/0.886	0.954/0.954	0.671/0.671	0.692/0.692	0.713/0.713
	+	0.448/0.448	0.253/0.253	0.069/0.069	0.057/0.057	0.046/0.046	0.161/0.161	0.126/0.126	0.115/0.115
Specificity	−	0.812/0.824	0.318/0.297	0.867/0.869	0.951/0.944	0.975/0.976	0.796/0.777	0.747/0.765	0.759/0.763
	N	0.659/0.636	1.000/1.000	0.426/0.340	0.249/0.204	0.144/0.116	0.751/0.630	0.760/0.597	0.760/0.606
	+	0.617/0.623	0.558/0.573	0.786/0.802	0.863/0.872	0.936/0.942	0.920/0.910	0.983/0.980	0.981/0.977
CPR		0.356/0.389	0.282/0.247	0.381/0.341	0.375/0.347	0.382/0.356	0.671/0.545	0.707/0.562	0.711/0.564
GC²		0.010/0.011	NaN/NaN	0.041/0.045	0.022/0.022	0.016/0.016	0.181/0.157	0.205/0.184	0.206/0.185

The negative solubility samples are denoted as ”−” whereas the positive solubility change samples are denoted as ”+”. The samples with no solubility change are denoted as ”N”. Performance metrics include the positive predicted values (PPV), negative predicted values (NPV), sensitivity, specificity, correct prediction ratio (CPR) and generalised correlation (GC²). PPV refers to the proportions of positive predictions for each solubility class while NPV refers to the proportions of negative predictions for each solubility class. CPR calculates the percentage of correctly classified samples while GC² measures the correlation coefficient of the classification. All normalized metrics are also reported. For each metric, the first value is without normalization while the second one is with normalization.