. 2025 Jun 26;16:1605722. doi: 10.3389/fphar.2025.1605722

TABLE 1.

Performance Metrics of best performing models developed for NS3 and NS5 Protein using various MLTs and Feature Selection method on TT and IV Datasets.

Algorithm	Feature selection	Model parameters	Dataset	MAE	MSE	RMSE	R ²	PCC
NS3
SVM	SVR	gamma:0.0001 C:100	T ₁₀₉₂	0.191	0.065	0.255	0.733	0.857
	SVR	gamma:0.0001 C:100	V ₁₂₁	0.195	0.074	0.272	0.756	0.870
	Perceptron	gamma:0.001 C:10	T₁₀₉₂	0.239	0.099	0.315	0.593	0.771
	Perceptron	gamma:0.001 C:10	V₁₂₁	0.212	0.087	0.295	0.713	0.844
	DT	gamma:0.005 C:1	T₁₀₉₂	0.255	0.118	0.343	0.517	0.719
	DT	gamma:0.005 C:1	V₁₂₁	0.215	0.097	0.311	0.680	0.833
RF	SVR	n:400	T₁₀₉₂	0.264	0.124	0.352	0.492	0.703
	SVR	n:400	V₁₂₁	0.213	0.093	0.305	0.693	0.851
	Perceptron	n:500	T₁₀₉₂	0.275	0.130	0.361	0.467	0.684
	Perceptron	n:500	V₁₂₁	0.225	0.102	0.319	0.663	0.832
	DT	n:400	T₁₀₉₂	0.266	0.124	0.350	0.475	0.705
	DT	n:400	V₁₂₁	0.218	0.103	0.321	0.66	0.825
kNN	SVR	k:3	T₁₀₉₂	0.251	0.119	0.346	0.511	0.725
	SVR	k:3	V₁₂₁	0.250	0.131	0.363	0.565	0.756
	Perceptron	k:7	T₁₀₉₂	0.267	0.127	0.356	0.480	0.695
	Perceptron	k:7	V₁₂₁	0.227	0.105	0.325	0.651	0.811
	DT	k:5	T₁₀₉₂	0.270	0.131	0.362	0.464	0.690
	DT	k:5	V₁₂₁	0.233	0.109	0.330	0.639	0.799
ANN	SVR	solver:lbfgs activation: identity	T ₁₀₉₂	0.192	0.064	0.253	0.738	0.862
	SVR	solver:lbfgs activation: identity	V ₁₂₁	0.190	0.070	0.265	0.781	0.894
	Perceptron	activation:logistic	T₁₀₉₂	0.249	0.108	0.329	0.556	0.751
	Perceptron	activation:logistic	V₁₂₁	0.216	0.086	0.293	0.716	0.844
	DT	activation: logistic	T₁₀₉₂	0.275	0.135	0.365	0.426	0.682
	DT	activation: logistic	V₁₂₁	0.241	0.111	0.332	0.634	0.798
XGBoost	SVR	n_estimators = 300, max_depth = 3, learning_rate = 0.141	T₁₀₉₂	0.249	0.111	0.334	0.544	0.738
	SVR	n_estimators = 300, max_depth = 3, learning_rate = 0.141	V₁₂₁	0.222	0.087	0.296	0.710	0.849
	Perceptron	n_estimators = 300, max_depth = 7, learning_rate = 0.058	T₁₀₉₂	0.272	0.132	0.362	0.448	0.678
	Perceptron	n_estimators = 300, max_depth = 7, learning_rate = 0.058	V₁₂₁	0.212	0.089	0.298	0.705	0.842
	DT	n_estimators = 200, max_depth = 3, learning_rate = 0.104	T₁₀₉₂	0.259	0.119	0.344	0.514	0.718
	DT	n_estimators = 200, max_depth = 3, learning_rate = 0.104	V₁₂₁	0.225	0.096	0.31	0.681	0.830
NS5
SVM	SVR	gamma:0.0001 C:400	T ₁₄₀	0.135	0.049	0.197	0.954	0.982
			V ₁₅	0.138	0.044	0.210	0.94	0.970
	Perceptron	gamma:0.0005 C:400	T₁₄₀	0.222	0.105	0.310	0.884	0.953
			V₁₅	0.24	0.137	0.370	0.814	0.904
	DT	gamma:0.005 C:10	T₁₄₀	0.429	0.399	0.591	0.632	0.802
			V₁₅	0.420	0.446	0.668	0.395	0.713
RF	SVR	n:400 depth: 12	T₁₄₀	0.399	0.324	0.544	0.659	0.840
			V₁₅	0.340	0.211	0.46	0.713	0.863
	Perceptron	n:300	T₁₄₀	0.360	0.294	0.513	0.680	0.852
			V₁₅	0.288	0.180	0.425	0.755	0.873
	DT	n:200 depth: None leaf:1	T₁₄₀	0.424	0.388	0.601	0.560	0.799
			V₁₅	0.367	0.308	0.555	0.582	0.771
kNN	SVR	k:3	T₁₄₀	0.343	0.235	0.468	0.727	0.889
			V₁₅	0.292	0.144	0.380	0.804	0.901
	Perceptron	k:3	T₁₄₀	0.335	0.235	0.468	0.753	0.895
			V₁₅	0.360	0.232	0.481	0.686	0.833
	DT	k:3	T₁₄₀	0.508	0.499	0.687	0.446	0.739
			V₁₅	0.563	0.609	0.781	0.173	0.663
ANN	SVR	solver: lbfgs activation: identity learning: invscaling	T ₁₄₀	0.159	0.073	0.271	0.928	0.964
			V ₁₅	0.160	0.048	0.219	0.935	0.977
	Perceptron	solver: lbfgs activation: logistic learning: adaptive	T₁₄₀	0.255	0.119	0.345	0.884	0.942
			V₁₅	0.337	0.238	0.488	0.710	0.854
	DT	solver: lbfgs activation: tanh learning: adaptive	T₁₄₀	0.532	0.505	0.711	0.508	0.762
			V₁₅	0.507	0.592	0.769	0.197	0.708
XGBoost	SVR	n_estimators = 300, max_depth = 3, learning_rate = 0.078	T₁₄₀	0.334	0.22	0.444	0.766	0.889
			V₁₅	0.388	0.274	0.523	0.628	0.818
	Perceptron	n_estimators = 300, max_depth = 3, learning_rate = 0.143	T₁₄₀	0.335	0.243	0.458	0.678	0.846
			V₁₅	0.313	0.200	0.447	0.729	0.870
	DT	n_estimators = 150, max_depth = 5, learning_rate = 0.038	T₁₄₀	0.406	0.326	0.548	0.609	0.812
			V₁₅	0.358	0.207	0.455	0.719	0.854

SVR, Support vector regression; DT, Decision tree; TT, Training or testing dataset; IV, Independent validation dataset; PCC, Pearson’s correlation coefficient, R ² - coefficient of determination, MAE, mean absolute error; MSE, mean squared error; RMSE, root mean squared error.

Bold values indicate the best-performing model(s) based on Pearson correlation coefficient (PCC) and coefficient of determination (R²) for each dataset.