A Schematic of the graph neural networks (GNNs) implemented within the geometric deep learning platform. Multi-layer Perceptron (MLP) modules are highlighted in gray, and the variable modules (2D/3D convolution), pooling, and outputs are highlighted in green. B Box plot illustrating trends observed for N-hereto arene (left) and carboxylic acids (right). N-hetero arenes: Meta-unsubstituted pyridines are observed with a reaction yield of 44 ± 15%, meta-substituted pyridines with 20 ± 6% (including 27 as an outlier observed at 6%), and five-membered N-heterocyclic ring systems with 2 ± 1%. Carboxylic acids: Cyclic ethers are observed with a reaction yield of 40 ± 12%, (including c as an outlier observed at 16%), cyclic alkanes with 42 ± 6%, and Boc-protected amines with 8 ± 6%. The error bars on both box plots represent 95% confidence intervals, the bottom and top of the box are the 25th and 75th percentiles, the line inside the box is the 50th percentile (median), and any outliers are shown as open circles. C Bar plot illustrating the number of successful and failed reactions from HTE. The substrates selected by the model resulted in 276 successful reaction outcomes. D Bar plot illustrating the number of unique alkylation opportunities identified per substrate. The majority of N-hetero arenes (10/17) allowed for successful transformation with 17–23 carboxylic acids. E Confusion matrix for reaction yield prediction. Reaction yields are divided into four bins, namely, no reaction (≤1%), poor (>1–11%), medium (>11–35%), and high reaction yield (>35%). The model accurately predicts 54.6 (±0.9)% of the reactions into the accurate bin, achieves a mean absolute error (MAE) of 18.7 (±0.2)% and a Pearson correlation coefficient (r) of 0.687 (±0.006). F Confusion matrix for binary reaction outcome prediction achieving an absolute accuracy of 80.8 (±1.2) and an F-score of 82.7 (±0.6)%.