. 2023 Oct 14;6:222. doi: 10.1038/s42004-023-01019-9

Table 3.

Summarizing the reproducibility of the experimental relative binding free energies and the accuracy of FEP+.

Accuracy metric	Experimental survey	FEP+ benchmark
Pairwise RMSE (kcal mol⁻¹)	0.91 [0.83, 1.11]	1.25 [1.17, 1.33]
Pairwise MUE (kcal mol⁻¹)	0.67 [0.61, 0.83]	0.98 [0.91, 1.05]
Edgewise RMSE (kcal mol⁻¹)	N/A	1.17 [1.08, 1.25]
Edgewise MUE (kcal mol⁻¹)	N/A	0.91 [0.84, 0.98]
R²	0.79 [0.75, 0.82]	0.56 [0.51, 0.60]
Kendall τ	0.71 [0.65, 0.74]	0.51 [0.48, 0.55]

The value of every metric, such as RMSE or R², is a weighted average. For the pairwise, R², and Kendall τ metrics, the weighting is equal to the number of compounds in the assay (in the experimental survey) or FEP graph. For the edgewise errors, the weighting is equal to the number of edges in each FEP graph. Square brackets encompass 95% confidence intervals that have been calculated by bootstrap sampling over the pairs of experimental series or FEP+ graphs. As the edgewise error is dependent on the topology of an FEP+ graph, there is no equivalent metric in the experimental survey.