enspara offers a flexible, well-scaling, and multipurpose clustering CLI. (a) A CLI invocation clustering trajectories with a shared topology with the k-hybrid algorithm using backbone RMSD, stopping k-centers at 3 Å, and with 20 rounds of k-medoids refinement. (b) A CLI invocation clustering trajectories with differing topologies by a small subset of shared atoms using the k-centers algorithm to discover 1000 states. (c) A CLI invocation clustering euclidean distances between feature vectors representing frames stored in a group of numpy NPY-format files using k-hybrid. (d) An MSM’s ability to predict the results of an experimental measurement of solvent exposure as a function of number of clusters. Dashed lines are models constructed using euclidean distance between vectors of residue sidechain solvent accessible surface area, whereas solid lines use backbone RMSD. Blue traces used k-centers, and red traces used k-hybrid. The experimental measurement is a previously published29 biochemical labeling assay that classifies a residue as exposed, buried, or transiently exposing. Residues exposure class was predicted as “buried” if no state exists where the residue was exposed, “exposed” if the residue is never buried, and “transient” if the residue populates both exposed and buried states in the MSM. The y-axis represents the fraction of these residues that were classified correctly. Error bars represent the standard deviation of three trials (k-centers are deterministic and have no error bars).