Figure - PMC

Skip to main content

An official website of the United States government

Here's how you know

Here's how you know

Official websites use .gov
A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS
A lock ( ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.

View full-text article in PMC

. 2019 Dec 21;118(3):765–780. doi: 10.1016/j.bpj.2019.12.016

Search in PMC
Search in PubMed
View in NLM Catalog
Add to search

© 2019 Biophysical Society.

This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).

PMC Copyright notice

(A) Toy model used to benchmark different ML approaches to extract important features from simulation data. Atomic coordinates are randomly generated, and a subset of the atoms, unique to every state, are displaced linearly from their initial positions. Artificial simulation frames are generated by adding noise to all atoms’ positions. Because only the relative and not the absolute positions of atoms are of significance in real biological systems, the system is rotated randomly around the origin. (B) Importance per atom for single instances of the toy model is shown. The index of the displaced atoms is highlighted as dashed vertical lines, coinciding with all peaks in importance in the case of an MLP and some of them for an RF. (C–E) Box plots of the performance of the different methods using either Cartesian coordinates (C), the full set of inverse interatomic distances (D), or a reduced set of inverse interatomic distances (E) are given as input features sampled over different instances of the toy model with linear displacement and 10% noise level. A high accuracy at finding all atoms signifies that every displaced atom has been identified as important and that other atoms have low importance. A high accuracy at ignoring irrelevant atoms signifies that only displaced atoms, although not necessarily all of them, have been marked important (Methods and Fig. S10). The best performing set of hyperparameters found after benchmarking every method (Figs. S4–S9) have been used. RAND stands for random guessing. The boxplots show the median (orange horizontal line), the interquartile range (box), the upper and lower whiskers (vertical lines), as well as the outliers (circles). To see this figure in color, go online.