The coupling algorithms sample significantly different energy distributions. The data is reported numerically in Tables C and D in S3 Text. All results are from a water system of 900 TIP3P molecules. The temperature and pressure were controlled either using the VR algorithm in combination with a PR barostat (blue bars), or using the WC algorithm (orange bars). The left plot depicts the results of the non-strict kinetic energy test for simulations at 300 K. The distributions sampled by both algorithms are found to have the correct average temperature. The VR distribution also has the correct width, while the WC algorithms samples a distribution which is significantly too narrow in all cases. The right plot shows the intervals estimated by the ensemble check. The estimated temperature intervals for the VR algorithm lie all within 1.5 standard errors of the analytical value of 8 K (indicated by a dashed line). The WC algorithm, on the other hand, is estimated to have a temperature difference of 12.3 to 16.6 K, 11.4 to 34.6 standard errors from the true value. The pressure difference shows similar results. The true interval is 300 bar for the 1d-estimate, and 296.1 bar for the 2d-estimate. The VR estimates lie within 2.5 and 1.7 standard errors from the true value, respectively, while the WC estimates are found to be more than 20 standard errors from the true value.