Effect of capturing chain architecture and monomer stoichiometry information on D-MPNN performance. The average R2 values obtained from a 10-fold cross validation based on random splits, for the prediction of IP, are shown. Uncertainty is implied as the standard error of the mean was <0.005 in all cases. Under the header “Representation”, “monomers” indicates the model was provided with the graph structure of separate monomer units; “chain architecture” indicates the model was provided with information on how the monomer units may connect to one another to form an ensemble of possible sequences, via the definition of edge weights, used as shown in Fig. 2b and c; “stoichiometry” indicates the model was provided with information on monomer stoichiometry, which was used to weigh learnt node representations as shown in Fig. 2d. An extended version of this table, with results obtained also for EA and showing RMSE too as performance measure, is available in Table S1.
| Datasets | Representation | |||
|---|---|---|---|---|
| Monomers | Monomers + chain architecture | Monomers + stoichiometry | Monomers + chain architecture + stoichiometry | |
| Original dataset | 0.88 | 0.90 | 0.98 | 1.00 |
| Inflated chain architecture importance | 0.65 | 0.86 | 0.71 | 0.98 |
| Inflated stoichiometry importance | 0.26 | 0.27 | 0.97 | 0.99 |