The latent space of the VDE is able to discriminate between three significant states in the folding coordinate of villin: the folded (yellow), unfolded (orange), and a prominent misfolded (purple) states (shown in a). In contrast, an optimized tICA model requires two coordinates to differentiate these states. The autocorrelation loss is crucial for this; without it the VDE is unable to describe the landscape (shown in b). Comparing the free energies of the VDE coordinate (c) and the first tICA coordinate (d) indicates that the VDE is better able to separate the folded and unfolded state from the misfolded state. When comparing the timescales of MSMs constructed from both models (e), the VDE has a slower first process than an optimal tICA model with 4 tICA components and performs significantly better than a tICA model with a single coordinate, indicating a superior model. Error bars represent the range of 100 bootstrapped replicates.