Skip to main content
Scientific Reports logoLink to Scientific Reports
. 2025 Oct 23;15:37169. doi: 10.1038/s41598-025-24757-3

Automated parametrization of small molecules within the Martini 3 coarse-grained model guided by experimental log P values

Maria Kelidou 1, Kai Steffen Stroh 2, Herre Jelger Risselada 1,
PMCID: PMC12550069  PMID: 41131333

Abstract

Molecular dynamics simulations play an important role in investigating biological systems. However, simulating large-scale systems can be computationally expensive, which can be improved by the employment of a coarse-graining force field. This study focuses on the automated parametrization of small molecules within the CGCompiler framework. This optimization approach utilizes a mixed-variable particle swarm algorithm to avoid the manual tweaking of parameters. Particularly, the optimization focuses on matching experimentally known log P values of partitioning in water-octanol phases, reproducing atomistic density profiles in lipid bilayers, and optimizing overall shape and volume aspects of the modeled atomistic molecules. After the atomistic to coarse-grained mapping, the model’s accuracy is evaluated through a fitness function, which combines structural and dynamic targets, to accurately capture the shape and behavior of the small molecule in question. Through the investigation of the interactions between small molecules and cellular membranes, this optimization process supports the development of accurate coarse-grained models for small molecules relevant to drug discovery. Our work demonstrates promising results in automating the high-fidelity parametrization of small molecules using the Martini 3 force-field guided by experimental log P values.

Subject terms: Biological physics; Surfaces, interfaces and thin films; Biophysical chemistry; Computational biophysics; Membrane biophysics; Single-molecule biophysics

Introduction

Molecular dynamics (MD) simulations are a vital tool in the field of molecular biology and drug discovery, offering a highly-detailed insight of (bio)molecules at an atomic level1,2. To explore and analyze more complex systems over larger length and longer time scales, the use of coarse-grained strategies becomes essential3. The widespread adoption of coarse-grained force fields like the Martini model for biomolecular simulations stems from their ability to merge common chemical groups consisting of multiple heavy atoms into distinct single interaction sites47. This approach has become particularly popular because of its transferability across various applications in biomolecular science, soft matter, and nanoscience. The downside of the building block approach is that the parametrization of molecules within coarse-grained models is a highly frustrating and tedious task, as chemical groups must be encoded into one out of hundreds of predefined bead types.

Beyond ongoing efforts to create databases of already-parametrized molecules based on human parametrization efforts, such as the Martini database6, work is underway to fully automate this pipeline. This automation aims to enable the parametrization of existing small molecule databases widely used in pharmaceutical research for drug development purposes. To this end, multiple automated approaches have been proposed, including machine learning-based methods (e.g., graph neural networks) and artificial intelligence-driven techniques (e.g., evolutionary algorithms and swarm optimization)814. These automated approaches optimize molecular parametrization workflows, thereby accelerating drug discovery timelines through the efficient exploration of molecular configurations enabled by coarse-grained modeling methodologies15,16. Additionally, automated parametrization could help address the challenge of keeping up with the rapidly growing number of known compounds and targets in drug discovery. Yet, while automated approaches can generate initial parametrizations quickly, they often lack the nuanced understanding of molecular behavior that comes from careful reproduction of properties derived from atomistic simulations and experiments.

To this end, we are developing the CGCompiler12 approach that automizes high-fidelity (re)parametrization within the Martini 3 model using mixed-variable particle swarm optimization. This method circumvents the problem of assigning predefined nonbonded interaction types (discrete variables) while simultaneously optimizing bond length (continuous variables). By overcoming the inherent dependency between nonbonded and bonded interactions, CGCompiler performs a multiobjective optimization that matches provided targets derived from atomistic simulations as well as experimentally derived targets.

The standard parametrization procedure entails the manual setting of all initial (trial) force-field parameters and their subsequent changes to fit the desired properties. The CGCompiler requires only the initial mapping of the atomistic structure and its coarse-grained parametrization. This step can greatly benefit from the development of automated mapping schemes13,14, whose crude parametrization also provides a valuable starting point for refinement by CGCompiler. Afterwards a mixed-variable particle swarm optimization algorithm is employed to accomplish the molecule’s optimization, thereby overcoming the hurdles of tweaking the parameters by hand and facilitating a more accurate and efficient parametrization. The model is evaluated based on a list of properties and their target values provided by the user (fitness function).

Partition coefficients, particularly octanol-water partition coefficients, play a crucial role in small molecule and drug design1719. They serve as primary indicators of hydrophobicity and membrane permeability, making them essential tools in assessing a compound’s potential as a drug candidate. Given that the octanol-water partition coefficients of common small molecules have been well experimentally determined, reproducing these coefficients represents the primary goal in guiding the parametrization of small molecules.

In addition to partition coefficients, atomistic density profiles within lipid bilayers provide a complementary and membrane-specific target for parametrization. Unlike bulk partitioning, density profiles investigate the spatial distribution and orientation of molecules across the heterogeneous lateral membrane interface directly, capturing interactions with different chemical groups within the lipid and the insertion depth within the bilayer2023. Incorporating such information allows coarse-grained models to more precisely account for additional structural and electrostatic effects that are often absent when optimizing solely against octanol–water partitioning free energies. Furthermore, the density profiles of individual beads correspond to the orientation of molecules in the membrane, enabling more precise parametrization of the local molecular chemistry within molecules that are not uniquely determined by log P values alone.

For this purpose, we extended CGCompiler to optimize molecules based on the free energy of transfer between octanol and water phases, as well as based on the atomistic density profiles within lipid bilayers. We also incorporated a scheme for the bonded parameters to simultaneously match the Solvent Accessible Surface Area (SASA). Our focus is on the parametrization of dopamine and serotonin, two biologically highly relevant neurotransmitters. Their roles in mediating both physiological and psychological processes make them important targets for parametrization2427. Furthermore, the investigation of the interactions between dopamine and serotonin and cellular membranes, as well as their receptors, is fundamental to understanding and treating a variety of neurological disorders28,29.

We report a significant advance in the automated parametrization of small molecules within the Martini 3 force field by extending CGCompiler to simultaneously optimize against experimental log P values and atomistic density profiles in lipid bilayers. The inclusion of the density profiles of mapped interaction sites provides a direct membrane-specific target alongside bulk partitioning data, ensuring more accurate reproduction of molecular orientation and insertion behavior at biologically relevant interfaces. Incorporating diverse targets improves the accuracy of membrane interaction modeling and enhances the capability of coarse-grained parametrization to account for subtle but biologically relevant effects, such as electrostatic interactions(the presence of net charge) and local molecular chemistry.

Methods

The initial step in the coarse-graining process involved determining the grouping of atoms into beads. For dopamine, the process was carried out by hand and the result can be seen in Fig. 1. For the initial mapping of serotonin, we used Auto-Martini13, but a finer adjustment of the parameters was necessary for Martini 3.

Fig. 1.

Fig. 1

Snapshot of the coarse-grained models of dopamine (a) and serotonin (b). Atoms are colored by element type, while the blue coarse-grained bead marks the charged group. The bead labels serve as references to subsequent figures.

CGCompiler

Small molecule parametrization in Martini 3 requires careful adjustment of many parameters to match several goals. Identifying the right parameters to improve specific behaviors, especially in complex interactions, is a difficult and time-consuming task. Automation becomes crucial for handling large molecule databases, organizing the parametrization process into a clear, hierarchical system. One automated method is Particle Swarm Optimization (PSO), known for efficiently finding the best solutions in complex, multidimensional spaces. PSO is ideal for optimizing continuous variables in coarse-grained models, though it faces challenges with predefined, discrete parameters in building block models like Martini. The CGCompiler Python package12 provides efficient coarse-grained molecule parametrization through mixed-variable particle swarm optimization. This method optimizes both categorical (predefined bead types) and continuous (bonds, angles, dihedrals, etc.) variables simultaneously. Built on the GROMACS simulation engine3033, CGCompiler substantially simplifies force field parametrization, particularly for building-block approaches.

The parametrization workflow, which can be seen in Fig. 2, involves selecting mapping and bead sizes, assigning chemical bead types, and choosing bonded terms and parameters. The presented algorithm optimizes bead size, chemical bead type, and bonded parameters simultaneously. The workflow includes the user providing the target data and creating a set of CG training systems. The optimization algorithm iteratively generates candidate solutions, runs MD simulations, scores solutions based on how well targets are reproduced, updates solutions using the swarm’s knowledge, and repeats until termination criteria are met.

Fig. 2.

Fig. 2

Representation of the CGCompiler framework, adapted from12. The octanol training system is portrayed by a mint green colour, while water is portrayed by a light blue colour.

The parametrization of small molecules made it imperative to introduce new targets and devise alternative methodologies for their calculation within the domain of the CGCompiler. As small molecules are much smaller and more flexible than proteins or lipids, additional metrics are needed to capture their physical properties. One of the relevant targets for small molecules is the Solvent Accessible Surface Area (SASA). It is a widely used metric in molecular biology and computational chemistry that quantifies the extent of a molecule’s surface that is accessible to a solvent34. This measurement is crucial in understanding the interactions, dynamics, as well as the structures of biomolecules in various environments. The SASA target value was obtained through the GROMACS tool gmx sasa, computed as an average through high-sampling atomistic simulations of each small molecule. Due to the reduced resolution of coarse-grained models, perfect agreement with atomistic SASA is not expected. Nonetheless, including SASA as an objective provides a useful guide for capturing the overall molecular shape and solvent-exposed surface during parametrization.

Investigating the behavior and thermodynamic properties of small molecules across different solvents is an important task, which is often expressed through the partition coefficient or equivalently log P value. Therefore, we implemented the calculation of the partition coefficient into the CGCompiler through the application of the Multistate Bennett Acceptance Ratio (MBAR) method35,36. This approach allowed us to accurately compute the necessary free energy of transfer for determining the partition coefficient, as defined by the equation adapted from37 to account for the free energy transfer from octanol to water instead of water to octanol:

graphic file with name 41598_2025_24757_Article_Equ1.gif 1

where

graphic file with name 41598_2025_24757_Article_Equ2.gif 2

The calculation of solvation free energies in octanol and water typically employs a thermodynamic cycle involving transfer to the gas phase to establish a system-independent reference state. However, this approach presents significant challenges in accuracy. The transfer free energy (Inline graphic) is computed as the difference between Inline graphic and Inline graphic, where the subscripts Inline graphic and Inline graphic denote transfer to the gas phase. These individual terms involve large values of several hundred kJ/mol due to the switching off of non-bonded interactions during the alchemical transformation. As a result, typical sampling errors of several kJ/mol become comparable to the magnitude of Inline graphic itself, rendering the calculations inherently inaccurate and computationally expensive for high-throughput applications.

To circumvent these limitations, we implement a chemical perturbation scheme utilizing a fixed reference topology with a predetermined Inline graphic value38. This approach enables the calculation of transfer free energies for newly parametrized molecules according to the equation:

graphic file with name 41598_2025_24757_Article_Equ3.gif 3

where Inline graphic represents the relative free energy difference between new molecule parameters and reference parameters, obtained through on-the-fly chemical perturbation, and Inline graphic denotes the known reference free energies of the predefined reference molecule. It is however important to emphasize that precise determination of Inline graphic is crucial as it establishes the fundamental reference point for all subsequent log P estimations obtained via chemical perturbation and therefore largely determines the systematic error.

Optimizing parametrization solely on the octanol-water partitioning free energies may lack several key membrane-specific interactions, including electrostatic effects with lipid headgroups and the ordered structural characteristics of the membrane interface. Therefore, we investigated the interfacial behavior of small molecules at lipid membranes by calculating local density profiles within a coarse-grained POPC membrane, as an additional objective in the multi-objective optimization scheme. These calculations were compared against atomistic simulations of our small molecules using the CHARMM36 force field, computed through the GROMACS tool gmx density. It is important to symmetrize the density profiles, as the slow binding and unbinding kinetics of small molecules can result in highly asymmetric profiles, skewing the density matching process. As the leaflet affinity is by definition identical for a symmetric bilayer, symmetrizing the density profile is fully justified.

The convergence behavior in swarm optimization algorithms exhibits a direct correlation with problem dimensionality. As the number of dimensions increases, maintaining sufficient population diversity requires proportionally larger swarm sizes. In our implementation, we employed swarm sizes of 72 particles for dopamine-related parameters (six interaction sites) and 48 particles for serotonin-related parameters (five interaction sites), balancing computational efficiency with the molecular complexity of each system. These specific choices of parameters were guided by a balance between computational efficiency and convergence quality. Larger swarms tend to improve global search capabilities, but beyond certain sizes, the computational cost is not reflected in the convergence quality. We implemented a consistent optimization protocol across both systems, utilizing 50 iterations per convergence cycle. Equal weights (1.0) were assigned to objective functions within each system to maintain balanced optimization dynamics.

Fitness function

The CGCompiler evaluates parametrization performance using a cost function.

graphic file with name 41598_2025_24757_Article_Equ4.gif 4

This function aims to be minimized and consists of multiple normalized objective functions (Inline graphic), each assigned a user-defined weight (Inline graphic). These weights enable users to prioritize and balance the significance of various parametrization goals. In our parametrization procedure we used four objective functions of equal weights, namely SASA, bond distributions39,40, the octanol-water partition coefficient, and mapped-bead density distributions.

Atomistic simulations

For the generation of target bond and density distribution data, we conducted atomistic simulations of the small molecule in a 3nm cubic box of water (Inline graphic 900 molecules) and in a (6, 6, 8)nm box of water-POPC (Inline graphic 6400 water and 85 POPC molecules) using the CHARMM36 force field4143. After an energy minimization and NPT equilibration of 100ns, followed a production run of 1Inline graphics. During production, we used a time step of 2fs, the velocity-rescale algorithm as the thermostat with a coupling time of Inline graphicps, and the Parinnelo-Rahman as a barostat with a coupling time of Inline graphicps.

Coarse-grained simulations

In the CGCompiler, we conducted three parallel sets of simulations of the small molecules in GROMACS 2023.2; once in water, once in two separate boxes of water and octanol for the free energy calculations, and once in water-POPC membrane. The same models of small molecules were used and evaluated in all training systems. In the first system, we chose the fitness as a function of SASA and the bond distribution data. In the second system, we used the value of log P as the fitness and we compared it to the value from experimental data listed in Table 1. In the third training system, we chose the fitness as a function of the mapped-bead density distribution. In the first and third training systems, we obtained the reference data from our high-sampling atomistic simulations.

Table 1.

Comparison of experimental and computational partition coefficient value. Experimental values were taken from44,45. The errors represent the standard deviation of the free energy difference, propagated to log P.

logP Dopamine Serotonin
Experimental Value -0.99 0.21
CGCompiler Value Inline graphic Inline graphic

For the first and third training system, we conducted the required energy minimization and two stages of NPT equilibration with time steps Inline graphicfs and 20fs, followed by a production run of 400ns with a time step of 20fs. During production, we used the velocity-rescale algorithm as the thermostat with a coupling time of Inline graphicps, and the Parinnelo-Rahman as a barostat with a coupling time of Inline graphicps. For the subsequent free energy calculation simulations, we chose the following lambda states: 0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.55, 0.6, 0.63, 0.65, 0.68, 0.7, 0.72, 0.73, 0.75, 0.77, 0.8, 0.85, 0.9, 0.95, 1.0. These simulations entailed a production run of 10ns for each different lambda state, using the same settings as in the initial training system.

Results

The CGCompiler generates optimized dopamine and serotonin parameters in .itp format. Figure 3 shows the convergence behavior of CGCompiler’s total cost function across 50 iterations. This composite metric combines four normalized objective functions: bond distributions, solvent accessible surface area (SASA), octanol-water partition coefficient (Inline graphic), and density distributions.

Fig. 3.

Fig. 3

Cost function convergence for CG dopamine (a) and serotonin (b). g8best indicates the 8 best candidate solutions of the swarm. The mean value of the cost function of the whole swarm is portrayed in blue scatter points, while the entire range of cost function values is shown as a shaded gray region.

In Fig. 4, we have plotted the bond distribution comparison between the best candidate solutions from the CGCompiler with the atomistic reference data. Distribution overlap was optimized using the earth mover’s distance criterion12. The Earth Mover’s Distance (EMD) is a measure of dissimilarity between two probability distributions that captures the minimum amount of work needed to transform one distribution into another. Using EMD rather than peak fitting has the advantage of effectively capturing both the peak position and width of a distribution within a single parameter. In our dopamine model the bonds are between beads C1-C5 and C5-N, which are located in the dopamine tail. In our serotonin model, the bond is between C4-Q1, where Q1 is the serotonin tail. The relevant beads are labeled in Fig. 1. We can see that there is good agreement with the means of the atomistic target distributions though the width of the distribution generally tends to be somewhat wider within the coarse-grained model particularly in case of C4-Q1. Though a stronger force constant would result in a narrower bond distribution, it may potentially compromise simultaneous optimization of other objectives, including SASA calculations that determine molecular shape and volume and even log P values and local density profiles. We, however, note that the optimization outcome represents the best balance among four objectives rather than optimal performance for any single objective.

Fig. 4.

Fig. 4

Bond distribution comparison between AA target data and CGCompiler output of the beads in the dopamine (a) and serotonin tail (b). The bonds are between the beads that are labeled in Fig. 1. gbest portrays the best candidate solution. The next best candidate solutions are portrayed in gray.

The log P values presented in Table 1 demonstrate strong agreement between experimental and calculated values, indicating that the models are expected to effectively capture the overall oil-water partitioning tendencies of these small molecules, including their insertion behavior within biological lipid membranes. The composition of the optimized molecules and their relative hydrophobicity can be seen in Fig. 5.

Fig. 5.

Fig. 5

Visualization of the hydrophobicity scale for bead types in the optimized CG models of dopamine (a) and serotonin (b), adapted from46.

In biology, dopamine and serotonin tend to bind strongly to lipid membranes47,48. In our simulations, as can be seen in Fig. 6, the density profile of dopamine shows peak positions at slightly shallower insertion depths compared to the atomistic reference. Serotonin exhibits a closer matching of insertion depths overall, although there is still a noteworthy shift toward shallower insertion depths. Both molecules show slightly elevated binding energies compared to the atomistic reference.

Fig. 6.

Fig. 6

Direct density comparison of mapped beads in atomistic and coarse-grained dopamine (a) and serotonin (b) as a function of the distance from the center of the POPC membrane. gbest portrays the best candidate solution. The next best candidate solutions are portrayed in gray.

To validate our automated parametrization approach against human efforts, we parametrized small molecules that are already available in the Martini 3 database6. For dopamine and serotonin there still exists no corresponding human-made model. Pyrrolidine and phenol were selected as small molecules because they are small, favorably interact with lipid membranes, and somewhat resemble dopamine and serotonin. As a starting point of the optimization we used the corresponding models from the Martini 3 small molecule database. We followed the same parametrization protocol as with dopamine and serotonin, maintaining the same components of the cost function, with the exception of phenol, whose ring structure is based on bond constraints of a fixed length (no bond distribution). Target partition coefficients for both molecules were obtained from the PubChem XLogP3 3.0 tool49. Figure 7 shows the convergence behavior of CGCompiler’s total cost function across 50 iterations.

Fig. 7.

Fig. 7

Cost function convergence for CG pyrrolidine (a) and phenol (b). g8best indicates the 8 best candidate solutions of the swarm. The mean value of the cost function of the whole swarm is portrayed in blue scatter points, while the entire range of cost function values is shown as a shaded gray region.

The CGCompiler log P values presented in Table 2 are sufficiently close to the predicted values, indicating that the models are expected to effectively capture the overall oil-water partitioning tendencies of pyrrolidine and phenol. The composition of the optimized molecules and their relative hydrophobicity can be seen in Fig. 8.

Table 2.

Comparison of predicted, CGCompiler, and Martini 3 database partition coefficient value. Predicted data were taken from49. The errors represent the standard deviation of the free energy difference, propagated to log P.

logP Pyrrolidine Phenol
Predicted Value 0.5 1.5
CGCompiler Value Inline graphic Inline graphic
M3 Database Value Inline graphic Inline graphic

Fig. 8.

Fig. 8

Visualization of the hydrophobicity scale for bead types in the optimized CG models of pyrrolidine (a) and phenol (b), adapted from46.

In Fig. 9, pyrrolidine membrane insertions match well with the atomistic reference, although the human-made coarse-grained reference (CG reference) exhibits elevated values. However, the insertion depth of bead SC5 shows slightly poorer agreement with the human-made coarse-grained reference based on peak position. Based on peak height and concomitant distribution width, however, performance is better. This is because the EMD criterion considers all overall distribution features, not just the peak position. For phenol, both insertion depths and binding energies display closer agreement overall. The human-made phenol model is clearly too hydrophilic, as evidenced by a log P value that is too small and membrane insertion that is too shallow.

Fig. 9.

Fig. 9

Direct density comparison of mapped beads in atomistic, CGCompiler output and coarse-grained Martini 3 database pyrrolidine (a) and phenol (b) as a function of the distance from the center of the POPC membrane. The next best candidate solutions are portrayed in gray.

For both molecules, a consistent tendency toward shallower insertion depths compared to atomistic simulation remains, similar to what was previously observed for dopamine and serotonin. This suggests that matching log P values natively results in a somewhat shallower membrane insertion and therefore molecules behave effectively too hydrophilic when interacting with lipid membranes. Interestingly, this tendency aligns with recent reports of overly hydrophilic protein-membrane interactions in Martini 35052, indicating that this issue may extend beyond amino acids. Some care must be taken, as our log P value simulations were based on dry octanol, in accordance with Ref.37, whereas the human-based model used hydrated octanol containing a 0.3 mole fraction of water6. For small molecules with log P values close to 0 (e.g. pyrrolidine with a log P value of 0.5), the difference in solvation free energy between wet and dry octanol is expected to be negligible.

Finally, in Fig. 10, we plot the bond distribution comparison between the best candidate solutions from CGCompiler and the atomistic reference data. The overlap of the distributions was optimized using the Earth Mover’s Distance criterion12. As can be seen, there is good agreement with the mean of the atomistic target distribution.

Fig. 10.

Fig. 10

Bond distribution comparison between AA target data and CGCompiler output of the beads in pyrrolidine. gbest portrays the best candidate solution. The next best candidate solutions are portrayed in gray.

Discussion

The parametrization of molecules within building block coarse-grained models is a highly laborious and tedious task, as chemical groups must be encoded into one out of hundreds of predefined bead types. Recent advances in computational chemistry have led to the development of several automated approaches for molecular parametrization, including machine learning-based methods and artificial intelligence-driven techniques814. Building upon our CGCompiler framework12, we have enhanced the parametrization capabilities within the Martini 3 force field through the integration of mixed-variable particle optimization. This advancement specifically targets high-fidelity parametrization of small molecules by incorporating experimental partitioning data, atomistic density profiles, and molecular volume/shape considerations into the optimization process. The current implementation demonstrates strong potential for automated parametrization of small molecules in the Martini 3 force field, offering significant advantages over ongoing manual parametrization efforts6. It enables precise molecular characterization through systematic integration of experimental partitioning data with structural and dynamical information from atomistic simulations regarding molecular flexibility, volume, and shape. This simultaneous optimization of multiple competing objectives can exceed human capabilities.

The octanol-water partition coefficient represents a fundamental metric in molecular characterization, providing essential insights into solubility properties across different solvents and interfacial behaviors. As a cornerstone of building-block coarse-grained force field methodology, the log P value delivers a comprehensive measure of molecular partitioning. While this metric offers valuable predictions regarding membrane permeation and insertion properties, solely parametrizing molecules based on reproducing log P values faces two critical limitations: (i) Chemical locality: The log P value contains limited information about chemical locality effects across the molecule, particularly concerning hydrophobicity distribution around interaction sites. (ii) Effect of charge: Charged molecules such as serotonin and dopamine exhibit amphiphilic nature at the octanol-water interface, yet the explicit effect of charge itself is not captured in the parametrization due to the absence of partial charges within the coarse-grained model.

To address these limitations while maintaining accurate log P values, we have additionally implemented local density profile comparison of mapped beads within lipid membranes as an additional objective function in CGCompiler. The lipid membrane interface provides a more physiologically relevant environment and features additional interactions with zwitterionic head groups as well as the presence of a distinct liquid crystalline ordering. Although experimental measurements of membrane-molecule interactions remain more challenging to obtain than octanol-water partitioning data, this limitation can be effectively bridged through strategic application of atomistic simulations. In our optimization framework, experimental octanol–water partitioning free energies are reproduced alongside atomistic density profiles of membrane interactions, ensuring accurate parametrization of both bulk partitioning and membrane-specific behaviors. This dual-target strategy enhances predictive accuracy while maintaining computational tractability by leveraging the complementary strengths of experimental and atomistic references.

Our computational analysis shows that combining log P values with accurately reproducing local density profiles for individually mapped beads in coarse-grained simulations provides valuable insight into the overall molecular orientation and behavior at membrane interfaces. This serves as a benchmark for model quality. However, the question remains as to which matched features are most important for the quality of the model, as well as how to define model quality. For now, this is still human-determined. In our current simulations, we assigned the same weight to matching bond distributions, log P values and membrane density profiles. We observed that matching (dry) octanol log P values results in a tendency for shallower membrane insertion than in atomistic simulations. This is consistent with an inherent more hydrophilic nature. Similarly, precise matching of density profiles is anticipated to result in molecules that are inherently too hydrophobic, according to their log P value in (dry) octanol. Our log P value simulations were based on dry octanol according to the puristic physical chemical standard, other studies often include a 0.3 mole fraction of water conform with more common pharmaceutical practices. However, when both are available, we would argue that dry octanol log P values should always be preferred to wet octanol log P values. This is because coarse-grained models are unable to model either the local interfacial structure or the substantial concomitant entropic surfactant effects caused by water-octanol micellization, which significantly affects the solvation of small molecules53.

Ultimately, due to the inherent uncertainty surrounding the accuracy of the modeled reference systems, it is surprisingly difficult to make a fair comparison of model quality. Within our limited framework of reference, the resulting optimized models performed better overall than human-made models, which is not surprising given that optimization aims to improve performance within such a framework. In this study, equal weights (1.0) were initially assigned to all objective functions. Thoroughly optimizing these weights would require a computationally demanding process that is beyond the scope of this study. As part of the force field’s philosophy, matching some targets, such as log P values, may be deemed more essential than matching others, such as bond distributions, whose width tends to deviate inherently from atomistic simulations. This choice of weighting could be improved in future studies to better align with the philosophy of force fields54. However, the quality of the (automated) parametrization remains natively restricted by a limited, human-defined target set. The models that provide the best fit within that benchmarking subset are not necessarily the models that perform best in other domains. This is the prevailing problem in force-field parametrization. It is debatable whether a model optimized for most of the domains can be considered optimal when it performs more weakly in an individual domain of interest. Similarly, we anticipate that our models will natively perform best in the area of lipid membrane interactions with small molecules, as well as the subsequent change in membrane properties55.

While the automated high-fidelity parametrization of small molecules using mixed-variable swarm optimization represents a significant technological advance, it remains a computationally intensive endeavor that requires substantial computational resources. Even with access to dedicated computing infrastructure, parametrization of individual molecules necessitates several days of computational time. Consequently, systematic application of this methodology to extensive molecular databases containing millions of compounds is computationally prohibitive. Instead, we envision its primary utility in research contexts requiring highly accurate coarse-grained models for focused studies involving smaller sets of specifically targeted small molecules.

Supplementary Information

Acknowledgements

Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - EXC-2033 – 390677874 -RESOLV. The authors gratefully acknowledge the Gauss Centre for Supercomputing e.V. (www.gauss-centre.eu) for funding this project by providing computing time through the John von Neumann Institute for Computing (NIC) on the GCS Supercomputer JUWELS56 at Jülich Supercomputing Centre (JSC).

Author contributions

H.J.R. and K.S.S designed the work. M.K. conducted the simulations. M.K. and K.S.S. performed all analysis. All authors wrote and reviewed the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

The itp files for dopamine, serotonin, pyrrolidine and phenol are provided in the appendix. Remaining datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Declarations

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

The online version contains supplementary material available at 10.1038/s41598-025-24757-3.

References

  • 1.De Vivo, M., Masetti, M., Bottegoni, G. & Cavalli, A. Role of molecular dynamics and related methods in drug discovery. J. Med. Chem.59, 4035–4061. 10.1021/acs.jmedchem.5b01684 (2016) (PMID: 26807648). [DOI] [PubMed] [Google Scholar]
  • 2.Durrant, J. D. & McCammon, J. A. Molecular dynamics simulations and drug discovery. BMC Biol.9, 71. 10.1186/1741-7007-9-71 (2011). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Pak, A. J. & Voth, G. A. Advances in coarse-grained modeling of macromolecular complexes. Curr. Opin. Struct. Biol.52, 119–126 (2018). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Marrink, S. J., Risselada, H. J., Yefimov, S., Tieleman, D. P. & de Vries, A. H. The martini force field: Coarse grained model for biomolecular simulations. J. Phys. Chem. B111, 7812–7824. 10.1021/jp071097f (2007). [DOI] [PubMed] [Google Scholar]
  • 5.Souza, P. C. T. et al. Martini 3: A general purpose force field for coarse-grained molecular dynamics. Nat. Methods18, 382–388. 10.1038/s41592-021-01098-3 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Alessandri, R. et al. Martini 3 coarse-grained force field: Small molecules. Adv. Theory Simul.5, 2100391. 10.1002/adts.202100391 (2022). [Google Scholar]
  • 7.Marrink, S. J. et al. Two decades of martini: Better beads, broader scope. WIREs Comput. Mol. Sci.13, e1620. 10.1002/wcms.1620 (2023). [Google Scholar]
  • 8.Empereur-Mot, C. et al. Swarm-cg: Automatic parametrization of bonded terms in martini-based coarse-grained models of simple to complex molecules via fuzzy self-tuning particle swarm optimization. ACS Omega5, 32823–32843 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Empereur-Mot, C. et al. Automatic multi-objective optimization of coarse-grained lipid force fields using swarmcg. J. Chem. Phys.156, 024801 (2022). [DOI] [PubMed] [Google Scholar]
  • 10.Empereur-Mot, C. et al. Automatic optimization of lipid models in the martini force field using swarmcg. J. Chem. Inf. Model.63, 3827–3838 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Perrone, M., Capelli, R., Empereur-Mot, C., Hassanali, A. & Pavan, G. M. Lessons learned from multiobjective automatic optimizations of classical three-site rigid water models using microscopic and macroscopic target experimental observables. J. Chem. Eng. Data68, 3228–3241 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Stroh, K. S., Souza, P. C. T., Monticelli, L. & Risselada, H. J. Cgcompiler: Automated coarse-grained molecule parametrization via noise-resistant mixed-variable optimization. J. Chem. Theory Comput.19, 8384–8400. 10.1021/acs.jctc.3c00637 (2023) (PMID: 37971301). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bereau, T. & Kremer, K. Automated parametrization of the coarse-grained martini force field for small organic molecules. J. Chem. Theory Comput.11, 2783–2791. 10.1021/acs.jctc.5b00056 (2015). [DOI] [PubMed] [Google Scholar]
  • 14.Potter, T. D., Barrett, E. L. & Miller, M. A. Automated coarse-grained mapping algorithm for the martini force field and benchmarks for membrane-water partitioning. J. Chem. Theory Comput.17, 5777–5791 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Souza, P. C. et al. Protein-ligand binding with the coarse-grained martini model. Nat. Commun.11, 3714 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Kjølbye, L. R. et al. Towards design of drugs and delivery systems with the martini coarse-grained model. QRB Discov.3, e19 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Kujawski, J., Popielarska, H., Myka, A., Drabińska, B. & Bernard, M. The log p parameter as a molecular descriptor in the computer-aided drug design: An overview. Comput. Methods Sci. Technol.18, 81–88. 10.12921/cmst.2012.18.02.81-88 (2012). [Google Scholar]
  • 18.Arslan, E., Findik, B. K. & Aviyente, V. A blind SAMPL6 challenge: insight into the octanol-water partition coefficients of drug-like molecules via a DFT approach. J. Comput. Aided Mol. Des.34, 463–470 (2020). [DOI] [PubMed] [Google Scholar]
  • 19.Wu, Y. M., Salas, Y. L., Leung, Y. C., Hunter, L. & Ho, J. Predicting octanol-water partition coefficients of fluorinated drug-like molecules: A combined experimental and theoretical study. Aust. J. Chem.73, 677–685. 10.1071/CH19648 (2020). [Google Scholar]
  • 20.Gul, G., Faller, R. & Ileri-Ercan, N. Coarse-grained modeling of polystyrene-modified cnts and their interactions with lipid bilayers. Biophys. J .122, 1748–1761. 10.1016/j.bpj.2023.04.005 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Soleimani, A. & Risselada, H. J. Smartini3 parametrization of multi-scale membrane models via unsupervised learning methods. Sci. Rep.14, 25714. 10.1038/s41598-024-75490-2 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Centi, A., Dutta, A., Parekh, S. H. & Bereau, T. Inserting small molecules across membrane mixtures: Insight from the potential of mean force. Biophys. J .118, 1321–1332. 10.1016/j.bpj.2020.01.039 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Wang, K. W., Wang, Y. & Hall, C. K. Development of a coarse-grained lipid model, LIME 2.0, for DSPE using multistate iterative Boltzmann inversion and discontinuous molecular dynamics simulations. Fluid Phase Equilib.521, 112704 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Volkow, N. D., Fowler, J. S., Wang, G.-J., Swanson, J. M. & Telang, F. Dopamine in drug abuse and addiction: Results of imaging studies and treatment implications. Arch. Neurol.64, 1575–1579. 10.1001/archneur.64.11.1575 (2007) https://jamanetwork.com/journals/jamaneurology/articlepdf/794743/nnr70005_1575_1579.pdf. [DOI] [PubMed] [Google Scholar]
  • 25.Channer, B. et al. Dopamine, immunity, and disease. Pharmacol. Rev.75, 62–158 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 26.Moncrieff, J. et al. The serotonin theory of depression: A systematic umbrella review of the evidence. Mol. Psychiatry28, 3243–3256 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.de Vries, L. P., van de Weijer, M. P. & Bartels, M. The human physiology of well-being: A systematic review on the association between neurotransmitters, hormones, inflammatory markers, the microbiome and well-being. Neurosci. Biobehav. Rev.139, 104733. 10.1016/j.neubiorev.2022.104733 (2022). [DOI] [PubMed] [Google Scholar]
  • 28.Lolicato, F. et al. Membrane-dependent binding and entry mechanism of dopamine into its receptor. ACS Chem. Neurosci.11, 1914–1924. 10.1021/acschemneuro.9b00656 (2020) (PMID: 32538079). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Kalinichenko, L. S., Kornhuber, J., Sinning, S., Haase, J. & Müller, C. P. Serotonin signaling through lipid membranes. ACS Chem. Neurosci.15, 1298–1320 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Abraham, M. J. et al. Gromacs: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX1, 19–25 (2015). [Google Scholar]
  • 31.Van Der Spoel, D. et al. Gromacs: Fast, flexible, and free. J. Comput. Chem.26, 1701–1718 (2005). [DOI] [PubMed] [Google Scholar]
  • 32.Pronk, S. et al. Gromacs 4.5: A high-throughput and highly parallel open source molecular simulation toolkit. Bioinformatics29, 845–854 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Lindahl, E., Hess, B. & Van Der Spoel, D. Gromacs 3.0: A package for molecular simulation and trajectory analysis. Mol. Model. Annual7, 306–317 (2001). [Google Scholar]
  • 34.Borges-Araójo, L., Souza, P. C. T., Fernandes, F. & Melo, M. N. Improved parameterization of phosphatidylinositide lipid headgroups for the martini 3 coarse-grain force field. J. Chem. Theory Comput.18, 357–373. 10.1021/acs.jctc.1c00615 (2022) (PMID: 34962393). [DOI] [PubMed] [Google Scholar]
  • 35.Shirts, M. R. & Chodera, J. D. Statistically optimal analysis of samples from multiple equilibrium states. J. Chem. Phys.129, 124105. 10.1063/1.2978177 (2008) https://pubs.aip.org/aip/jcp/article-pdf/doi/10.1063/1.2978177/15418484/124105_1_online.pdf. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 36.Wu, Z. et al. alchemlyb: The simple alchemistry library. J. Open Source Softw.9, 6934. 10.21105/joss.06934 (2024). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Bannan, C. C., Calabró, G., Kyu, D. Y. & Mobley, D. L. Calculating partition coefficients of small molecules in octanol/water and cyclohexane/water. J. Chem. Theory Comput.12, 4015–4024. 10.1021/acs.jctc.6b00449 (2016) (PMID: 27434695). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Mey, A. S. J. S. et al. Best practices for alchemical free energy calculations [article v1.0]. Living J. Comput. Mol. Sci.2 (2020). [DOI] [PMC free article] [PubMed]
  • 39.Michaud-Agrawal, N., Denning, E. J., Woolf, T. B. & Beckstein, O. Mdanalysis: A toolkit for the analysis of molecular dynamics simulations. J. Comput. Chem.32, 2319–2327. 10.1002/jcc.21787 (2011) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21787. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 40.Richard J. Gowers et al. MDAnalysis: A Python Package for the Rapid Analysis of Molecular Dynamics Simulations. In Proceedings of the 15th Python in Science Conference (eds Sebastian, B. & Scott, R.) 98–105. 10.25080/Majora-629e541a-00e (2016).
  • 41.Lee, J. et al. Charmm-gui input generator for namd, gromacs, amber, openmm, and charmm/openmm simulations using the charmm36 additive force field. J. Chem. Theory Comput.12, 405–413. 10.1021/acs.jctc.5b00935 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Brooks, B. R. et al. Charmm: The biomolecular simulation program. J. Comput. Chem.30, 1545–1614. 10.1002/jcc.21287 (2009) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.21287. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 43.Jo, S., Kim, T., Iyer, V. G. & Im, W. Charmm-gui: A web-based graphical user interface for charmm. J. Comput. Chem.29, 1859–1865. 10.1002/jcc.20945 (2008) https://onlinelibrary.wiley.com/doi/pdf/10.1002/jcc.20945. [DOI] [PubMed] [Google Scholar]
  • 44.Mack, F. & Bönisch, H. Dissociation constants and lipophilicity of catecholamines and related compounds. Naunyn Schmiedebergs Arch. Pharmacol.310, 1–9. 10.1007/BF00499868 (1979). [DOI] [PubMed] [Google Scholar]
  • 45.Duffy, E. M. & Jorgensen, W. L. Prediction of properties from simulations: Free energies of solvation in hexadecane, octanol, and water. J. Am. Chem. Soc.122, 2878–2888. 10.1021/ja993663t (2000). [Google Scholar]
  • 46.Lütge, S., Krebs, M. & Risselada, H. J. Toward the evolutionary optimisation of small molecules within coarse-grained simulations: Training molecules to hide behind lipid head groups. J. Phys. Chem. B10.1021/acs.jpcb.4c08200 (2025). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Postila, P. A., Vattulainen, I. & Róg, T. Selective effect of cell membrane on synaptic neurotransmission. Sci. Rep.6, 19345. 10.1038/srep19345 (2016). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 48.Sengupta, D. & Huster, D. The dynamic structure of the lipid bilayer and its modulation by small molecules. J. Phys. Chem. B129, 8639–8640. 10.1021/acs.jpcb.5c04373 (2025). [DOI] [PubMed] [Google Scholar]
  • 49.Cheng, T. et al. Computation of octanol-water partition coefficients by guiding an additive model with knowledge. J. Chem. Inf. Model.47, 2140–2148 (2007). [DOI] [PubMed] [Google Scholar]
  • 50.van Hilten, N., Stroh, K. S. & Risselada, H. J. Efficient quantification of lipid packing defect sensing by amphipathic peptides: Comparing martini 2 and 3 with charmm36. J. Chem. Theory Comput.18, 4503–4514 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Thomasen, F. E., Pesce, F., Roesgaard, M. A., Tesei, G. & Lindorff-Larsen, K. Improving martini 3 for disordered and multidomain proteins. J. Chem. Theory Comput.18, 2033–2041 (2022). [DOI] [PubMed] [Google Scholar]
  • 52.Claveras Cabezudo, A., Athanasiou, C., Tsengenes, A. & Wade, R. C. Scaling protein-water interactions in the martini 3 coarse-grained force field to simulate transmembrane helix dimers in different lipid environments. J. Chem. Theory Comput.19, 2109–2119 (2023). [DOI] [PubMed] [Google Scholar]
  • 53.Soleimani, A. & Risselada, H. J. Pure graphene acts as an “entropic surfactant’’ at the octanol-water interface. ACS Nano17, 13554–13562 (2023). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Risselada, H. J. Martini 3: A coarse-grained force field with an eye for atomic detail. Nat. Methods18, 342–343 (2021). [DOI] [PubMed] [Google Scholar]
  • 55.Saha Roy, D. et al. Serotonin promotes vesicular association and fusion by modifying lipid bilayers. J. Phys. Chem. B128, 4975–4985 (2024). [DOI] [PubMed] [Google Scholar]
  • 56.Jülich Supercomputing Centre. JUWELS Cluster and Booster: Exascale Pathfinder with Modular Supercomputing Architecture at Juelich Supercomputing Centre. J. Large-Scale Res. Facil.10.17815/jlsrf-7-183 (2021). [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Data Availability Statement

The itp files for dopamine, serotonin, pyrrolidine and phenol are provided in the appendix. Remaining datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.


Articles from Scientific Reports are provided here courtesy of Nature Publishing Group

RESOURCES