Skip to main content
. 2024 Nov 20;17(22):5675. doi: 10.3390/ma17225675
Algorithm A1: Machine learning pipeline for model-type and aging-type classification

   Input: LSR dataset DLSR={XLSR,ymodel_LSR,yaging_LSR}, EPDM dataset DEPDM={XEPDM,ymodel_EPDM,yaging_EPDM}

   Output: Final accuracy Afinal, classification report, and confusion matrix Cfinal

1 Load datasets DLSR and DEPDM from CSV files;

2 Concatenate features: Xmodel[XLSR,XEPDM];

3 Concatenate model-type labels: ymodel[ymodel_LSR,ymodel_EPDM];

4 Encode model-type labels: ymodel_encodedLE(ymodel);

5 Normalize features: Xmodel_scaledSC(Xmodel);

6 Concatenate features: Xaging[XLSR,XEPDM];

7 Concatenate aging-type labels: yaging[yaging_LSR,yaging_EPDM];

8 Encode aging-type labels: yaging_encodedLE(yaging);

9 Normalize features: Xaging_scaledSC(Xaging);

10 Split Xmodel_scaled,ymodel_encoded and Xaging_scaled,yaging_encoded into 80–20 training and testing sets;

    Input: Model M, training and testing sets for model-type or aging-type classification task

    Output: Performance metrics: accuracy A, classification report, and confusion matrix C

11 Train model M on training data Xtrain,ytrain to minimize the loss function L;

12
MargminL(M(Xtrain),ytrain)

   Predict test labels y^test on Xtest:;

13
y^test=M(Xtest)

   Calculate accuracy A:;

14
A=1ni=1n(y^i=yi)

    Output classification report and confusion matrix C;

15 Define and initialize models for both classification tasks as follows:
  • XGBoost: Define with default hyperparameters.
  • Multilayer Perceptron (MLP): Define with 100 hidden units and ReLU activation.
  • Deep Neural Network (DNN): Define with hidden layers [150,100,50].
  • Stacked Model (RF + SVM): Use Random Forest (RF) and Support Vector Machine (SVM) as base estimators, with an RF as the final estimator.

    Input: Stacked model Mstacked, Parameter grid G

    Output: Optimal parameters G* and best model Mbest

16 Perform a randomized search on Mstacked with grid G;

17 for each parameter configuration gG do(

18 ˪ Evaluate Mstacked(g) on the training set and compute accuracy A(g);

19 Set G*argmaxA(g) and update best model Mbest=Mstacked(G*);

   Input: Optimized stacked model Mbest, testing set Xtest,ytest

   Output: Final accuracy Afinal, classification report, and confusion matrix Cfinal

20 Use Mbest to predict labels y^test on test data:;

21
y^test=Mbest(Xtest)

   Calculate and output final accuracy Afinal, classification report, and confusion matrix Cfinal;

22 return Afinal,Cfinal