Submissions were ranked according to RMSE, MAE, R2, and
τ. Many top methods were found to be statistically
indistinguishable considering uncertainties of error metrics. Moreover, sorting
of methods was influenced significantly by the choice of metric chosen. We
assessed top 20 methods according the each metric to determine which methods are
always among the top 20 according to all four statistical metrics chosen. A set
of consistently well-performing methods were determined: Four QM-based and four
empirical methods. Seven of these methods are blind submissions of SAMPL6
Challenge, and one of them (REF13) is a non-blind reference
calculation. Performance statistics are provided as mean and 95% confidence
intervals.