Skip to main content
Scientific Data logoLink to Scientific Data
. 2023 Jul 27;10:495. doi: 10.1038/s41597-023-02369-8

Data scheme and data format for transferable force fields for molecular simulation

Gajanan Kanagalingam 1, Sebastian Schmitt 1, Florian Fleckenstein 1, Simon Stephan 1,
PMCID: PMC10374650  PMID: 37500652

Abstract

A generalized data scheme for transferable classical force fields used in molecular simulations, i.e. molecular dynamics and Monte Carlo simulation, is presented. The data scheme is implemented in an SQL-based data format. The data scheme and data format is machine readable, re-usable, and interoperable. A transferable force field is a chemical construction plan specifying intermolecular and intramolecular interactions between different types of atoms or different chemical groups and can be used for building a model for a given component. The data scheme proposed in this work (named TUK-FFDat) formalizes digitally these chemical construction plans, i.e. transferable force fields. It can be applied to all-atom as well as united-atom transferable force fields. The general applicability of the data scheme is demonstrated for different types of force fields (TraPPE, OPLS-AA, and Potoff). Furthermore, conversion tools for translating the data scheme between .xls spread sheet format and the SQL-based data format are provided. The data format can readily be integrated in existing workflows, simulation engines, and force field databases as well as for linking such.

Subject terms: Research data, Cheminformatics, Atomistic models, Computational models

Introduction

Molecular simulation is a powerful tool for predicting macroscopic thermophysical properties as well as for the modeling of nanoscopic processes. Molecular simulation, namely molecular dynamics (MD) and Monte Carlo (MC) simulation, have become an indispensable tool in many scientific disciplines such as computational physics14, physical chemistry58, molecular biology913, and engineering1417. In MD and MC simulations, matter is modeled on the atomistic level based on molecular interactions, which are described by so-called force fields. A force field is the mathematical description of the molecular interactions. The quality of molecular simulation results primarily depends on the quality of the employed force field1824. Hence, an important focus has been in the past decades on the force field development and, accordingly, a large number of force fields is available today25. Also, the development of new force fields is still a very active field. Yet, the electronic availability, transparency, and usability of molecular force fields remains unsatisfactory26. Despite their importance, data science aspects (databases, data formats, interoperability, ontologies, FAIR principles27 etc.) of force fields are still in their infancy.

While molecular interactions can be modeled today using first principle quantum mechanics, such simulation methods are computationally too expensive for the simulation of many particle systems as required for example in molecular biology. Therefore, molecular simulations based on Newton’s mechanics and classical force fields are widely used today. In classical force fields, the molecular interactions are modeled by interaction potentials describing the potential energy as a function of the distance and orientation U(r_). These interaction potentials provide a relatively simple approximation of the ‘true’ molecular interactions. Yet, these force fields have proven very powerful and are successfully used across many scientific fields today.

A force field is a collection of parametric equations and corresponding parameter values describing the interaction potentials between interaction sites representing atoms or groups of atoms. Force fields are used in molecular dynamics simulations to calculate forces between interaction sites. Based on these forces, the trajectories of the interaction sites are computed. Alternatively, the potential energy is directly used in Monte Carlo simulations for evaluating the probability that a given randomly generated atomistic configuration exists.

Transferable force fields for molecular substances are a particularly powerful tool as they can be used for modeling a large number of substances. A transferable force field is a generalized chemical construction plan for substance classes, e.g. characterizing the interaction between two chlorine atoms or the angle potential in an aromatic ring. Therefore, a transferable force field itself cannot be directly used for carrying out molecular simulations. However, based on a transferable force field, component-specific force fields can be uniquely derived by a user and then employed in a simulation. Hence, the strength of transferable force fields lies in their generalized description of molecular interactions, which comes at the cost of a high abstraction level and challenges in the usability.

A large number of transferable force fields, i.e. construction plans, is available today, for example DREIDING28, UFF29, AMBER30, PCFF831, TraPPE-UA3243, OPLS-AA4448, Potoff4952, and CVFF53. They are mostly used for modeling fluid states. The coverage of the transferable force fields for modeling different types of substances strongly varies, i.e. the variety of chemical groups and interactions captured in the construction plan. For example, some force fields are restricted to hydro- or halocarbons49 and others cover a large range of the periodic system44. Hence, transferable force fields can consist of hundreds of parameters. Moreover, these parameter data are heterogeneous as the potentials of a transferable force field describe different types of interactions, e.g. intermolecular and intramolecular.

Different data aspects of molecular simulations have been addressed in recent years for increasing the transparency, reproducibility26,5456, and interoperability of molecular simulations5765. Yet, these attempts mostly focus on the simulation scenario setup and the simulation results. Thereby, multiple data formats for atomistic configurations, i.e. snapshots of simulations, have been well established, e.g. the .xyz file format or the .pdb file format for proteins66. Also, data formats for specific individual molecules are available which includes data formats for (small) molecules such as CML67 format, SYBYL Line Notation68, SMIRNOFF format69, MCDL70, and SMILES71 as well as for macromolecules such as proteins, peptides, and polymers such as HELM72 and SPICES73. Moreover, some transferable force fields are electronically accessible for users, e.g. the CHARMM force field in ref. 74, the Amber force field in ref. 75, the AMOEBA force field in ref. 75, the TraPPE force field in refs. 76,77, the Merck force field in ref. 78, and the OPLS force field in refs. 77,79. Yet, most of these use individual data formats designed for the respective force field or computational framework. Also, most of these tools provide component-specific force field files (built from an implemented transferable force field), i.e. they are atom typing tools for generating force fields for a given individual molecule. The OpenKIM80, the OpenMM75,81, and the MoSDeF59,77,82 platform provide a digital infrastructure for atom typing and storing force field parameters, which can also be used for different molecular modeling and simulations tasks, e.g. setting up simulation scenarios and coupling with simulation engines.

For building a component-specific force field from a transferable force field construction plan, multiple challenges arise. Publications on transferable force fields use many different notations, units systems, mathematical forms of interaction potentials etc., which makes it difficult to use different force fields in one workflow. Also, the atomistic coordinates of the interaction sites in a molecule are only implicitly described by transferable force fields by the global minimum of the intramolecular interaction potentials. Moreover, different atomistic configurations, i.e. conformations, of a given molecule are often feasible and the equilibrium conformation (or distribution of conformations) is usually not a priori known. Furthermore, several force field features are treated and implemented differently in different simulation engines, e.g. electrostatic multipoles, long-range forces, and rigidity constraints, which can cause deviations in the results54. Moreover, important differences are present in the design concepts of different transferable force fields, which makes switching from one to another transferable force field in a workflow tedious and error-prone. Accordingly, there are only very few force field databases76,79,83 available today, which mostly cover the force fields developed by the creators of the database.

In this work, a generalized data scheme for transferable force fields is proposed, which formalizes the underlying general chemical construction plan and is applicable for a large variety of transferable force fields. Based on the developed data scheme, a concrete SQL-based data format is proposed. The data scheme developed in this work is based on identifiers that are both human-readable as well as machine-readable. The latter in particular enables the integration in automated workflows. Also, the syntax is chemically consistent such that for example bond order rules are correctly captured. The data scheme is moreover designed to be simple, flexible, and extendable. The applicability of the data scheme and data format is demonstrated for different types of transferable force fields. The data scheme and data format proposed in this work (termed TUK-FFDat) enables an interoperable data exchange between publications of new transferable force fields, users of different molecular simulation engines, and force field databases (cf. Figure 1).

Fig. 1.

Fig. 1

Applicability of the TUK-FFDat data scheme and data format for establishing a link between databases, simulation engines, and force field publications.

This paper is organized as follows: First, different classification approaches and features of transferable force fields are introduced. Based on this ontology, the novel data scheme is built. Then, the implementation of the data scheme in an SQL-based data format is presented followed by an exemplary application of the presented data format to three transferable force fields. Conversion tools that translate the data scheme information from a user-friendly .xls spread sheet format to the SQL database format is described in the Methods section.

Results

Classification of force fields

Force fields can be classified using different attributes. Figure 2 shows a systematic classification of force fields regarding the modeling approach, the model detail level, the interaction potential types, and the parametrization approach. Blue highlights in the ontology (Figure 2) indicate the coverage of the data scheme developed in this work.

Fig. 2.

Fig. 2

Force field ontology and classification used in this work. Blue indicates attributes covered by the TUK-FFDat data scheme and data format.

There are two main modeling approaches for molecular force fields: (i) component-specific, where the layout of the interaction sites, the choices for the parameter functions as well as the parametrization procedure is carried out for a specific substance, e.g. ethanol. This usually results in a relatively accurate model since the focus was on that substance alone. The downside of that approach is that the developed model is only valid for that substance and no parts of the model can in general be transferred and re-used for modeling other substances. In the transferable force field approach (ii), molecular features and interactions are modeled in a generalized way based on building blocks, e.g. single atoms or groups of atoms. These force fields will usually (but not necessarily) be less accurate than component-specific force fields for a given substance since the objective during the development was broader. Yet, transferable force fields can be applied in a wider sense since the molecular features are captured in building blocks.

Different modeling levels can be used for developing force fields, namely (i) all-atom; (ii) united-atom; and (iii) coarse grain. Figure 3 shows these different approaches – using n-butane as an example. Going from (i) to (iii), the degree of abstraction of the molecular model increases, which also increases the computational efficiency as less details are included. However, the accuracy for predicting macroscopic thermophysical properties does not necessarily depend on the degree of abstraction19,84. Usually, the ability to extrapolate to state regions that were not considered in the fit usually decreases with increasing the degree of abstraction. In all-atom force fields, each atom in a molecule is explicitly modeled by an interaction site, including small hydrogen atoms. In united-atom force fields, small groups of atoms are modeled as an interaction site. In this approach, usually, chemical groups, e.g. methyl or methylene groups, are fused to a single interaction site, cf. Figure 3. In united-atom force fields, especially hydrogen atoms are often substituted within the nearest larger neighbor atom. In coarse grain force fields, larger sections of molecules (or even multiple molecules) are modeled as an interaction site, cf. Figure 3. For each modeling level, an interaction site is represented by a geometrical point. However, in visualizations, interaction sites are usually represented by spheres, cf. Figure 3, representing the extend of the repulsive interactions of the respective potential (in a simplified way).

Fig. 3.

Fig. 3

Classification of force fields according to the modeling level used to model molecules based on interaction sites (spheres).

The mathematical form of the interaction potentials is an important force field attribute (cf. Figure 2). Interaction potentials are parametric functions that describe the potential energy between the interaction sites. Both intramolecular interaction potentials (between sites of the same molecule) and intermolecular interaction potentials (between sites of different molecules) exist, cf. Figure 2. The intramolecular interaction potentials establish the molecule flexibility and allow molecular vibrations. Different types of intramolecular interactions can be applied for a force field: A molecule can be fully flexible, meaning that all interaction sites have three independent translational degrees of freedom. Force fields that have intramolecular potentials, but have certain fixed bond lengths, fixed bond angles, or fixed torsion angles are called semi-flexible. Thereby, stretching between direct neighbor interaction sites is often constraint to be rigid (this allows the use of a larger time step and faster exploration of the phase space25). In the limiting case where all intramolecular interactions are constraint, the force field is rigid and no intramolecular degrees of freedom, i.e. no change in the molecular geometry and vibrations, occur. This is usually only meaningful for relatively small molecules. Reactive force fields are a special type of flexible force fields. In reactive force fields85, bonds are modeled by bond order potentials, which describe the state of a bond between two interaction sites. This enables a dynamic mapping of interaction sites during a simulation and thereby chemical reactions. Most available transferable force fields are of the flexible or semi-flexible type.

Force fields consist of different types of intramolecular and intermolecular interaction potentials, Figure 4. For fully flexible force fields, different types of intramolecular potentials can occur: Interaction potentials describing the potential energy between two bonded interaction sites are called bond potentials – modeling a strongly localized chemical bond86. Bond potentials are parametric functions that usually depend on the bond length of the bond between the interaction sites under consideration. Intramolecular potentials describing the potential energy between three directly neighbored interaction sites are called angle potentials. The angle potentials are a function of the angle between three sites. Intramolecular potentials describing the potential energy between four directly neighbored interaction sites (for example the four carbon atoms in n-butane, cf. Figure 3) are called torsion potentials. Dihedral potentials have an important impact on the molecular configurations and the macroscopic thermophysical properties. In force fields describing branched molecules, so-called improper torsion potentials are used at times. These potentials describe the potential energy between four directly neighbored interaction sites, whereby three interaction sites are bonded to a fourth central interaction site. Improper torsion or dihedral potentials are usually formulated as a function of the ‘out of plane’ angle, cf. Figure 4. Intramolecular potentials describing the potential energy between two interaction sites that belong to the same molecule and have a distance of n−1 bonds, are called 1, n interaction potentials (where n > 1). The 1, n potentials model dispersive and repulsive interactions between interaction sites in a molecule that are not close neighbors. This is particularly relevant for large curled molecules. Usually, the 1, n interactions are described by scaled intermolecular potentials (see below). The van der Waals and the electrostatic interactions are usually scaled individually.

Fig. 4.

Fig. 4

Classification of force fields based on the potential types.

There are (in practically all cases) two types of intermolecular interactions: Electrostatic interactions, dispersive (attractive) interactions, and repulsive interactions. The latter two model attractive forces at moderate distances (a.k.a. van der Waals forces) and repulsive forces at short distances (mimicking the overlap of electron orbitals)25,86. In most cases, effective pair potentials are used for describing intermolecular interactions. For these interactions, mostly the Lennard-Jones8789 potential or the Mie90 potential is used. The electrostatic interactions are mostly modeled by simple point charges, but also higher multipole interaction sites are used in force fields at times. These relatively simple electrostatic interactions model the molecular orbital charge distribution (that is in reality much more complex), e.g. the charge distribution in alcohol groups and π-orbitals in aromatic components. To describe the potential energy between different types of interaction sites (kinds of atoms or groups of atoms), in practically all cases, the same mathematical functions are used within a given transferable force field and the cross-interaction parameters are determined using combination rules.

Both the intermolecular and the intramolecular potential functions have parameters that – together – describe the chemical and physical nature of the interactions. For the development of force fields, different strategies for determining the parameter values have been applied in the literature (cf. Figure 2). Two main routes are established today: (i) a bottom-up approach and (ii) a top-down approach.

In the bottom-up approach, the ‘true’ molecular interactions are determined using quantum mechanical simulations9194. Based on the results, both the intermolecular and the intramolecular interactions in force fields can in general be determined. The parameter values of the intramolecular potentials are often fitted to first principle quantum chemical simulation results for the potential energy surface (PES). Yet, using quantum mechanical simulations for fitting the intermolecular potential parameters is conceptually and computationally challenging, e.g. since multi-body interactions are mapped to pair interactions.

In the top-down approach, the parameter values of the potential functions are determined using macroscopic thermophysical property data. The parameters are tuned such that the force field describes a given set of macroscopic properties well. For force fields for fluids, mostly vapor–liquid equilibrium properties and self-diffusion data is used for the parametrization. In many cases, the top-down approach and the bottom-up approach are combined such that intramolecular interactions are determined from quantum chemical simulation results and intermolecular interactions using macroscopic thermophysical property data.

Furthermore, force fields can be sub-classified based on the mathematical functions employed in a force field. Also, machine learning force fields have been developed in recent years as a novel class95. In machine learning force fields, the potential functions and their parameters are determined using machine learning (mostly using large PES data sets). Machine learning force fields can be considered a sub-type of the bottom-up parametrization strategy.

The generalized data scheme proposed in this work captures a large variety of transferable force field types (blue highlighting in Figure 2). Based on the ontology and terminology introduced in Figure 2, the new data scheme is presented in the following.

Definition of data scheme

The data scheme proposed in this work consists of seven sections that formalize the definition of a transferable force field construction plan. Figure 5 gives an overview of the data scheme. In the i = 1‥.7 sections, the interaction potentials constituting a transferable force field are stored as follows: (1) intermolecular interactions; (2) bond intramolecular interactions; (3) angle intramolecular interactions; (4) torsion intramolecular interactions; (5) improper intramolecular interactions; (6) 1, n interactions; and (7) special case interactions.

Fig. 5.

Fig. 5

Schematic overview of TUK-FFDat data scheme for transferable force fields.

A ‘tag’ notation is introduced defining the interaction site type, i.e. atom or group of atoms (in the case of a united-atom force field). Tag tuples are used in the different sections to indicate the combination of interaction site types defining a specific interaction, e.g. a bond between a hydrogen atom and a carbon atom. Using the tag notation and the bond order between the interaction sites, the interaction potentials acting between a given set of sites is defined in a generalized way.

A tag consists of four parts that are separated by a hyphen ‘-’. The first two parts are strings and the third and fourth part are integer values. Details are given in Table 1. Figure 6 shows a united-atom 3-methyl-1-butene (C5H10) molecule model illustrating the definition of the tag. The first part of the tag is an abbreviation representing the functional group to which the interaction site is assigned. Table 2 gives a list of chemical groups and their abbreviations used in the data scheme. The second part of the tag indicates the type of atom or group of atoms modeled by the interaction site under consideration. For atoms, the classical periodic table notation is used96. For sites modeling a group of atoms (in an united-atom force field), fused hydrogen and carbon atoms are indicated by a ‘C’. Hence, in this part of the tag hydrogen atoms are neglected in united-atom models unless a site explicitly models a single hydrogen atom. The third part of the tag is the number of bonds the interaction site forms with other (non-hydrogen) interaction sites. The fourth part of the tag indicates the highest bond order the interaction site under consideration enters into. The tag ‘A-C-2-1’, cf. Figure 6, for example indicates a carbon atom C (fused with the substituted hydrogen atoms) in an alkane group A forming one ‘1’ bond with (non-hydrogen) interaction sites, which has a bond order of ‘2’, i.e. a double bond. The tag notation also enables a direct distinction of a particular atom type that is modeled differently, i.e. different parameters, in different chemical environments. Details on the tag notation are given in the Supplementary Material.

Table 1.

Definition of tag notation part1-part2-part3-part4 characterizing a given interaction site and data type of the individual tag entries.

part value description
part1 string functional group of which interaction site is part of (cf. Table 2)
part2 string atom or group of atoms modeled by interaction site
part3 integer number of bonds of interaction site (with non-hydrogen atoms)
part4 integer highest bond order of interaction site

Fig. 6.

Fig. 6

Exemplaric definition of tag identifier notation (cf. Table 1) for interaction sites (atoms or groups of atoms) using 3-methyl-1-butene: (a) last two parts of the tag specifying bond structure in a molecule (details given in the text); (b) first two parts of the tag specifying the atom type and site structure of the model.

Table 2.

Functional groups included in the data scheme (first part of the tag, cf. Table 1).

abbreviation type functional group
A* CHx–CHxa, CHx = CHxb, CHx ≡ CHxc alkane
Ac CHx–O–C(=O)–CH=CH2a acrylate
Ace CHx–O–C(–X)2–O—CHxa,b acetal
Ad CHx–C( = O)–N–X2a, d amide
Ak CHx–O–Ha alcohol
Al X–C(–H) = Oa, b aldehyde
Am CHx–N–X2a, d amine
B** CH–CH (arom.) benzene
CA** CH2—CH2 (cyc.) cycloalkane with 6 < (ring size) < 18
CA5** CH2—CH2 (cyc.) cycloalkane with ring size 5
CA6** CH2—CH2 (cyc.) cycloalkane with ring size 6
Cac CHx–C( = O)–O—Ha carboxylic acid
DS CHx–S–S–CHxa disulfide
E CHx–O–CHxa ether
Es CHx–C( = O)–O–CHxa ester
K CHx–C( = O)–CHxa ketone
mAc CHx–O–C( = O)–C(–CH3) = CHxa methacrylate
Nl CHx–C ≡ Na nitrile
No CHx–N–O2a nitro
Sd CHx–S–CHxa sulfide
Tl CHx–S—Ha thiol

ax ∈ [0, 1, 2, 3], bx ∈ [0, 1, 2], cx ∈ [0, 1], dX ∈ [H, CHx].

*Both, alkenes (sp2) and alkynes (sp1) are abbreviated ‘A’ in the first part of the tag.

**Functional groups inside cycloalkanes or aromatic benzene rings are also abbreviated

‘CA’ and ‘B’, respectively, in the first part of the tag.

In the seven sections of the data scheme (cf. Figure 5), chemical sub-structures (i.e. formations of two sites (bonds), three sites (angles) etc.) are characterized using tuples of tags indicating the participating interaction sites. This constitutes the chemical construction plan. Each of the seven sections of the data scheme has a list of entries defining the interaction potentials and their parameters assigned to a given chemical structure, i.e. combination of types of interaction sites. The interaction potentials are represented by parametric functions with the parameters p0, p1,…, pn (cf. Figure 5). The mathematical functions used for describing a given interaction are represented by the ‘IDi’ with i = 1.‥7. Each section has its own ID and interaction potential list. For example, for the bond potential i = 2, the classical harmonic function has the ID2 = 1. Moreover, meta data indicating the origin of the data (in most cases the parameter values) is appended for each structural information. For this purpose, the DOI numbers are used as references, which provide a unique link to the respective references97.

In the following, the structure and syntax of each of the seven sections is introduced in detail. It should be noted that the equilibrium structure (bonds, bond angles,…) of a given molecule is implicitly given by a global minimum of its total potential energy, which is therefore not explicitly described by the data scheme.

The first section of the data scheme is termed intermolecular and contains the information on the intermolecular interaction potentials between interaction sites. The assignment of the individual intermolecular potential functions by the corresponding IDs is given in Table 3. The intermolecular section explicitly lists potential functions with its corresponding parameters and a combination rule. The interaction sites in the first section of the data scheme are defined by a single corresponding tag. The potential functions used for modeling the interactions between given site types are encoded in the ID1 (cf. Table 3). Also the combination rule type describing the interaction potential between unlike interaction sites is comprised in the ID1. For a given transferable force field, the ID1 is constant. In the list of intermolecular interaction potential functions (cf. Table 3), also the meaning of the parameter values is specified.

Table 3.

Intermolecular potential functions and their parameters (first section of data scheme, cf. Figure 5), where rij indicates the distance between the considered interaction sites i and j, ε0 the electric constant, kB the Boltzmann constant, q the charge, ε the dispersion energy, σ the size parameter, and n the potential exponent.

ID1 function p1 p2 p3 p4
1 4εijσijrij12σijrij6+14ε0πqijrij qii εii σii
with:
qij=qiiqjj,εij=εiiεjj,σij=σii+σjj2
2 Cnεijσijrijnijσijrij6+14ε0πqijrij qii εii σii nii
with:
nij=nii+njj2, Cn=nijnij6nij66nij6qij=qiiqjj, εij=εiiεjj,σij=σii+σjj2
3 4εijσijrij12σijrij6+e2qijrij qii εii σii
with:
qij=qiiqjj,εij=εiiεjj,σij=σii+σjj2
4 εijrmin,ijrij12rmin,ijrij6+1εlqijrij qii εii rmin, ii
with:
qij=qiiqjj,εij=εiiεjj,rmin,ij=rmin,ii+rmin,jj2

The second section of the data scheme is termed bond and contains the specifications for the bond potentials for different combinations of two directly neighbored interaction sites. Hence, all information on intramolecular bond potentials within the given transferable force field are stored in the second data scheme section. A bond interaction is specified by the tags of the two involved interaction sites ‘tag 1’ and ‘tag 2’ as well as the bond ‘order’ between the considered interaction sites (cf. Figure 5). The bond potential specification for two interaction sites consists of a bond potential function and its parameters – analogously to the intermolecular potential section. The bond potential function is encoded by the ID2. Details on the potential functions are given in Table 4.

Table 4.

Bond potential functions and their parameters (second section of data scheme, cf. Figure 5), where rij is the distance between the considered interaction sites i and j, and k parameters of the potentials.

ID2 function p1 p2 p3 p4
1 k22rijr02 k2 r0
2 k2rijr02+k3rijr03+k4rijr04 k2 k3 k4 r0
3 k44rij2r022 k4 r0

The third section of the data scheme is termed angle. It contains the specifications for the angle potentials for different combinations of three directly neighbored interactions sites. An angle interaction potential is specified by the tags of the three involved types of interaction sites ‘tag 1’, ‘tag 2’, and ‘tag 3’ and the two bond orders ‘order 1’ and ‘order 2’. The ‘order 1’ indicates the bond order between the central interaction site indicated by ‘tag 2’ and the first interaction site ‘tag 1’. The ‘order 2’ indicates the bond order between the ‘tag 2’ and ‘tag 3’ interaction sites. The interaction potential functions are encoded by the ID3. The list of mathematical functions and the corresponding parameters is given in Table 5.

Table 5.

Angle potential functions and their parameters (third section of data scheme, cf. Figure 5), where i and k are the interaction sites that are bond to the interaction site j, such that i, j and k form the bond angle Θ, rij is the distance between the interaction sites i and j, rjk is the distance between the interaction sites j and k.

ID3 function p1 p2 p3 p4 p5 p6 p7 p8 p9
1 l22ΘΘ02 l2 Θ0
2 l2ΘΘ02+l3ΘΘ03+l4ΘΘ04+k2rijr1rjkr2+N1rijr1ΘΘ0+N2rjkr2ΘΘ0 l2 l3 l4 Θ0 k2 r1 r2 N1 N2
3 ccosΘcosΘ022 Θ0 c

The fourth section of the data scheme is termed torsion and contains the specifications for the torsion potentials for different combinations of four directly neighbored in-line (no branching) interaction sites. This type of interaction is also often named dihedral. A torsion potential is specified by the tags of the four involved types of interaction sites ‘tag 1’, ‘tag 2’, ‘tag 3’, and ‘tag 4’ and the three bond orders ‘order 1’, ‘order 2’, and ‘order 3’. The interaction sites indicated by ‘tag 1’ and ‘tag 4’ are the tail interaction sites of a torsion structure; the interaction sites indicated by ‘tag 2’ and ‘tag 3’ are the central interaction sites. Accordingly, the ‘order 1’ and ‘order 3’ specify the bond order of the tail bonds of a torsion structure; the ‘order 2’ specifies the bond order of the central bond. The potential function types are encoded by the ID4. The list of mathematical functions and the corresponding parameters is given in Table 6. Details on the specifications of special cis/trans isomerism-dependent torsion potentials are given in the Supplementary Material.

Table 6.

Torsion potential functions and their parameters (fourth section of data scheme, cf. Figure 5), where Φ is the torsion angle formed by the interaction sites under consideration and c and n are potential parameters.

ID4 function p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12
1 c0+c11+cosΦ+c21cos2Φ+c31+cos3Φ c0 c1 c2 c3
2 cΦΦ022 c Φ0
3 i=06cicosiΦ c0 c1 c2 c3 c4 c5 c6
4 c01cos2Φ+Φ0 c0 Φ0
5 i=07cicosiΦ c0 c1 c2 c3 c4 c5 c6 c7
6 i=14ci1+cosniΦΦi c1 n1 Φ1 c2 n2 Φ2 c3 n3 Φ3 c4 n4 Φ4

The fifth section of the data scheme is termed improper. It contains the specifications for improper torsion potentials of a branching intersection of four directly neighbored interaction sites. Hence, the improper torsion potential is specified by the four involved types of interaction sites ‘tag 0’, ‘tag 1’, ‘tag 2’, and ‘tag 3’ and the three bond orders ‘order 1’, ‘order 2’, and ‘order 3’ – as for the in-line torsion potential (see above). In a branched structure modeled by an improper torsion, one interaction site is the central one – indicated by the ‘tag 0’ in the data scheme. The three remaining interaction sites ‘tag 1’, ‘tag 2’, and ‘tag 3’ have a direct bond to the central one. Accordingly, ‘order 1’, ‘order 2’, and ‘order 3’ specify the bond order from the central interaction site to the respective neighboring interaction site. The three interaction sites indicated by ‘tag 0’, ‘tag 1’, and ‘tag 2’ span a specific plane (which is relevant for some improper torsion potential functions). The potential functions used for modeling the improper torsion differs in most cases from those used for modeling the in-line torsion. The improper torsion potential function types are encoded by the ID5. The list of mathematical functions and the corresponding parameters is given in Table 7.

Table 7.

Improper torsion potential functions and their parameters (fifth section of data scheme, cf. Figure 5), where Ψ is the out of the plane angle formed by the interaction sites under consideration and l are potential parameters.

ID5 function p1 p2
1 l2ΨΨ022 l2 Ψ

The sixth section of the data scheme is termed 1,n. It contains the information on the 1, n intramolecular interaction potentials, i.e. the potential acting between an interaction site and its nth neighbor. For modeling these intramolecular interactions, scaled intermolecular potentials are used. The individual parts modeling the van der Waals interactions and the electrostatic interaction of the intermolecular potential are scaled individually. Hence, the mathematical functions are adopted from the first section, but scaled by a factor. The 1,n section of the data scheme contains two values, i.e. n indicating the distance of two sites in a molecule and two corresponding ‘scaling’ values. The ‘scaling 1’ contains the information on the scaling for the van der Waals interactions and ‘scaling 2’ the information on the scaling for the electrostatic interactions. If not specified otherwise, the scaling factor is taken to be 0 for n≤4 and 1 for n > 4 for both the van der Waals and the electrostatic potentials within the data scheme.

The seventh section of the data scheme is termed special and contains special interaction potential cases that may occur in specific transferable force fields that are not covered within the sections one to six. The syntax used for the special potential cases is similar to the 1,n interactions introduced above. Hence, special interaction potentials are specified between two interaction sites. Special potentials model the potential energy between specific interaction sites, which have a certain distance with respect to direct bonding neighbors. The information structure in the special potential section is similar to the bond section. A special interaction is specified by the tags of the two involved types of interaction sites ‘tag 1’, ‘tag 2’, and ‘dist’ (cf. Figure 5). The latter specifies distance of the involved sites by counting the number of direct bonds between the sites ‘tag 1’ and ‘tag 2’. The potential functions and the corresponding parameters are encoded by the ID7. The list of mathematical functions and the corresponding parameters is given in Table 8. The dimensions of the parameters used in Tables 38 are given in Table 9.

Table 8.

Special potential functions and their parameters (seventh section of data scheme, cf. Figure 5), where rij indicates the distance between the considered interaction sites i and j, and k parameters of the potentials.

ID7 function p1
1 k12rij12 k12

Table 9.

Force field parameters (cf. Tables 38) and their physical dimensions as well as their units used in the TUK-FFDat data format.

parameter dimension unit
εii, c energy eV
σ, r length Å
n n 1
q charge e
ki energy/lengthi eV/Åi
li energy/anglei eV/degi
Θ, Φ, Ψ angle deg
N energy/(angle length) eV/(Å deg)

The seven data scheme sections generalize and formalize a transferable force field construction plan. Therein, for a given transferable force field, the ID-vector ID = {ID1, ID2… ID7} specifies the mathematical structure of the model. The outlined data scheme can be applied to all-atom and united-atom force fields. Also, force fields parameterized by the bottom-up and top-down approach can be described using the data scheme. Regarding the molecular architecture and potentials, rigid, flexible, and semi-flexible force fields can be described by the data scheme. For semi-flexible force fields it is possible that individual bond lengths, bond angles or torsion angles are constrained. Details are given in the Supplementary Material.

The tag notation in combination with the bond order and the systematization of the potential types provides a formalization for transferable force field construction plans. The proposed data scheme can be used for electronically documenting and defining a large variety of transferable force fields, cf. Figure 2. Therefore, the data scheme is implemented in an SQL-based data format.

SQL-based data format

The data scheme introduced above is implemented as an SQL-based data format to make it interoperable and directly usable in automated workflows, e.g. in simulation engines, databases, and for publishing new transferable force fields.

The information contained in each of the seven sections of the data scheme is translated into an SQL table structure in the data format. The data comprised in each of the sections of the data scheme (cf. Figure 5) are translated to the columns of the tables. The tag notation (cf. Table 1) introduced above is used for specifying interaction sites within the tables.

The data format syntax and data type used in the seven tables is specified in Tables 10, 11. For each table, the name of each column and the data type (string, real number, integer, etc.) stored in the column is specified in Tables 10, 11.

Table 10.

Data structure of TUK-FFDat data format (Part A).

column value description
First table: intermolecular
tag tag tag of atom or group of atoms of interaction site (cf. Table 1)
ID1 integer identifier for potential function for intermolecular interactions and combining rule encoded in ID1 (cf. Table 3)
p1 real number parameter of intermolecular potential function
p2 real number parameter of intermolecular potential function
ref string DOI of the reference in which the potential parameters were published
Second table: bond
tag1 tag tag of interaction site (cf. Table 1) involved in the considered bond
order integer bond order of considered bond
tag2 tag tag of interaction site (cf. Table 1) involved in the considered bond
ID2 integer or “none” identifier for bond potential function encoded in ID2, cf. Table 4 (“none” indicating a fixed bond length)
p1 real number if ID2 =  = ‘none’: bond length, else: parameter of bond potential function
p2 real number parameter of bond potential function
ref string DOI of the reference in which the potential parameters were published
Third table: angle
tag1 tag tag of central interaction site (cf. Table 1) involved in the considered angle
order1 integer bond order of the bond between the sites represented by tag1 and tag2
tag2 tag tag of interaction site (cf. Table 1) involved in the considered angle
order2 integer bond order of the bond between the sites represented by tag2 and tag3
tag3 tag tag of the interaction site (cf. Table 1) involved in the considered angle
ID3 integer or “none” identifier for angle potential function encoded in ID3, cf. Table 5 (“none” indicating a fixed bond angle)
p1 real number if ID3 =  = ‘none’: bond angle, else: parameter of angle potential function
p2 real number parameter of angle potential function
ref string DOI of the reference in which the potential parameters were published
Fourth table: torsion
tag1 tag tag of interaction site (cf. Table 1) involved in the considered torsion angle
order1 integer bond order of the bond between the sites represented by tag1 and tag2
tag2 tag tag of interaction site (cf. Table 1) involved in the considered torsion angle
order2 integer bond order of the bond between the sites represented by tag2 and tag3
tag3 tag tag of interaction site (cf. Table 1) involved in the considered torsion angle
order3 integer bond order of the bond between the sites represented by tag3 and tag4
tag4 tag tag of interaction site (cf. Table 1) involved in the considered torsion angle
ID4 integer or “none” identifier for torsion angle potential function encoded in ID4, cf. Table 6 (“none” indicating a fixed torsion angle)
p1 real number if ID4 =  = ‘none’: torsion angle, else: parameter of torsion potential function
p2 real number parameter of torsion potential function
ref string DOI of the reference in which the potential parameters were published

Table 11.

Data structure of TUK-FFDat data format (Part B).

column value description
Fifth table: improper
tag0 tag tag of central interaction site (cf. Table 1) involved in the considered improper torsion angle
order1 integer bond order of the bond between the sites represented by tag0 and tag1
tag1 tag tag of interaction site (cf. Table 1) involved in the considered improper torsion angle
order2 integer bond order of the bond between the sites represented by tag0 and tag2
tag2 tag tag of interaction site (cf. Table 1) involved in the considered improper torsion angle
order3 integer bond order of the bond between the sites represented by tag0 and tag3
tag3 tag tag of interaction site (cf. Table 1) involved in the considered improper torsion angle
ID5 integer or “none” identifier for improper torsion angle potential function encoded in ID5, cf. Table 7 (“none” indicating a fixed improper torsion angle)
p1 real number if ID5 =  = ‘none’: improper torsion angle, or: parameter of improper torsion potential function
p2 real number parameter of improper torsion potential function
ref string DOI of the reference in which the potential parameters were published
Sixth table: 1n_potential
n integer distance between the two sites involved in the 1, n potential given in number of bonds between them
scaling1 real number scaling factor applied to the potential modeling van der Waals interactions
scaling2 real number scaling factor applied to the potential modeling electrostatic interactions
ref string DOI of the reference in which the potential parameters were published
Seventh table: special
tag1 tag tag of interaction site (cf. Table 1)
dist integer distance between the two sites involved in the special potential given in number of bonds between them
tag2 tag tag of second interaction site (cf. Table 1)
ID7 integer or “none” potential function for special potentials encoded in ID7
p1 real number parameter of the special potential function
p2 real number parameter of the special potential function
ref string DOI of the reference in which the potential parameters were published

To avoid redundant or duplicate entries within a section and to keep the tables compact, a short-hand notation is introduced. Thereby, an ‘X’ indicates either a part of a tag or a bond order. The ‘X’ syntax serves as a placeholder for an arbitrary entry. For example, the bond identifier (tag 1, order, tag 2) = (A-C-X-X, 1, A-C-X-X) specifies all types of bonds in alkanes. Hence, they would all be modeled by the same mathematical function and parameters.

Application of data format

The TUK-FFDat format proposed in this work is applied to three transferable force fields of different type. The three transferable force fields are:

  • the TraPPE-UA force field3243 (semi-flexible, united-atom),

  • the OPLS-AA force field4448 (flexible, all-atom), and

  • the Potoff force field4952 (semi-flexible, united-atom).

The TraPPE-UA and the Potoff transferable force field have been developed within the chemical engineering community. They are widely used for predicting thermodynamic properties – in particular of hydrocarbons32,33,49,50. The OPLS-AA transferable force field has been developed within the molecular biology community and is accordingly mostly used for modeling bio systems, e.g. predicting structural protein properties13.

The TUK-FFDat implementations of all three transferable force fields (TraPPE-UA, OPLS-AA, and Potoff) are available on Zenodo98. In the main body of this work, a representative part of the TraPPE-UA transferable force field is depicted and discussed as examples (cf. Tables 1216). This selection represents the alkane and alcohol part of the TraPPE-UA transferable force field. In the main body of the manuscript (Tables 1216), the manuscript references are used instead of the DOIs (see online repository98).

Table 14.

Third table (angles) of the data format, cf. Tables 10, 11, for the TraPPE-UA force field for alkanes and alcohols.

tag1 order1 tag2 order2 tag3 ID3 p1 p2 ref
X-C-X-X 1 X-C-2-1 1 X-C-X-X 1 62500 114 32
X-C-X-X 1 X-C-3-1 1 X-C-X-X 1 62500 112 33
X-C-X-X 1 X-C-4-1 1 X-C-X-X 1 62500 109.47 33
X-C-X-X 1 Ak-C-X-1 1 Ak-O-2-1 1 50400 109.47 35
Ak-C-X-1 1 Ak-O-2-1 1 Ak-H-1-1 1 55400 108.5 35

Table 15.

Fourth table (torsion) of the data format, cf. Tables 10, 11, for the TraPPE-UA force field for alkanes and alcohols.

tag1 order1 tag2 order2 tag3 order3 tag4 ID4 p1 p2 p3 p4 ref
X-C-X-X 1 X-C-2-1 1 X-C-2-1 1 X-C-X-X 1 0 355.03 −68.19 791.32 32
X-C-X-X 1 X-C-2-1 1 X-C-3-1 1 X-C-X-X 1 −251.06 428.73 −111.85 441.27 33
X-C-X-X 1 X-C-2-1 1 X-C-4-1 1 X-C-X-X 1 0 0 0 461.29 33
X-C-X-X 1 X-C-3-1 1 X-C-3-1 1 X-C-X-X 1 −251.06 428.73 −111.85 441.27 33
X-C-X-X 1 X-C-2-1 1 X-C-3-2 1 X-C-X-X 1 0 0 0 461.29 33
X-C-X-X 1 Ak-C-2-1 1 Ak-O-2-1 1 Ak-H-1-1 1 0 209.82 −29.17 187.93 35
X-C-X-X 1 Ak-C-3-1 1 Ak-O-2-1 1 Ak-H-1-1 1 215.96 197.33 31.46 −173.92 35
X-C-X-X 1 Ak-C-4-1 1 Ak-O-2-1 1 Ak-H-1-1 1 0 0 0 163.56 35
X-C-X-X 1 X-C-2-X 1 X-C-2-1 1 X-O-2-1 1 0 176.62 −53.34 769.93 35
X-C-X-X 1 X-C-X-1 1 X-O-2-1 1 X-C-X-1 1 0 725.35 −163.75 558.2 36
X-O-2-1 1 X-C-2-1 1 X-C-2-1 1 X-O-2-1 1 503.24 0 −251.62 1006.47 36

Table 12.

First table (intermolecular) of the data format, cf. Tables 10, 11, for the TraPPE-UA for field for alkanes and alcohols.

tag ID1 p1 p2 p3 ref
A-C-0-0 1 0 148 3.73 32
A-C-1-1 1 0 98 3.75 32
A-C-2-1 1 0 46 3.95 32
A-C-3-1 1 0 10 4.68 33
A-C-4-1 1 0 0.5 6.4 33
Ak-O-2-1 1 −0.7 93 3.02 35
Ak-H-1-1 1 0.435 0 0 35
Ak-C-1-1 1 0.265 98 3.75 35
Ak-C-2-1 1 0.265 46 3.95 35
Ak-C-3-1 1 0.265 10 4.33 35
Ak-C-4-1 1 0.265 0.5 5.8 35

Table 16.

Seventh table (special) of the data format, cf. Tables 10, 11, for the TraPPE-UA force field for alkanes and alcohols.

tag1 dist tag2 ID7 p1 ref
Ak-O-X-X 4 X-H-1-1 1 75000000 36
Ak-O-X-X 5 X-H-1-1 1 75000000 36

The TraPPE-UA transferable force field is a semi-flexible united-atom force field. In the TraPPE-UA force field, all bonds between interaction sites are constrained to be rigid. This translates in the data format as none entries in the second data format table, cf. Table 13. The TraPPE-UA transferable force field does not contain improper torsion potentials. Accordingly, the fifth table of the data format remains empty (not shown). Despite the fact that the TraPPE-UA is a united-atom force field, hydrogen atoms are explicitly modeled in some chemical structures, e.g. specific polar functional groups. Details are given in the Supplementary Material.

Table 13.

Second table (bonds) of the data format, cf. Tables 10, 11, for the TraPPE-UA force field for alkanes and alcohols.

tag1 order tag2 ID2 p1 ref
X-C-X-1 1 X-C-X-1 none 1.54 32
Ak-C-X-X 1 Ak-O-2-1 none 1.43 35
Ak-H-1-1 1 Ak-O-2-1 none 0.945 35

Discussion

A generalized data scheme for transferable force fields was presented that can be applied to various types of force fields such as rigid and flexible as well as all-atom and united-atom force fields. The data scheme is implemented into an SQL-based file format. Thereby, the data scheme is fully machine readable and provides uniquely defined data structures. It is called TUK-FFDat. The TUK-FFDat data scheme and data format is specifically designed for transferable force fields (opposite to component-specific force fields), i.e. it provides data structures for generalized chemical construction plans that define model building blocks for substance classes. Three applications of the data scheme and data format are given (the TraPPE-UA, OPLS-AA, and Potoff transferable force fields). These three examples show important differences, which demonstrates the general applicability of the data scheme. The data scheme and data format proposed in this work can be favorably used for increasing the force field interoperability in the molecular simulations community. The data scheme and data format can be used for sharing transferable force field data between different actors, e.g. database developers, force field developers, and simulators.

The data scheme and data format presented here can readily be extended in different directions. New interaction potentials can easily be added in the corresponding potential lists (cf. Tables 38) by adding a new IDi-value. Also, new chemical groups can be added in the corresponding functional groups list, cf. Table 2. Also, in the case that the topology of the transferable force field is to be extended, new sections can be added to the data scheme. Also, the ongoing development of a given transferable force field can favorably be carried out based on the data scheme by adding entries in the different section tables. If new interaction site types are added to a transferable force field, the new entries specifying the different potential interactions can be readily appended in the lists of the seven sections. For future work, the data scheme proposed in this work can be extended to coarse grain, reactive, and machine learned force fields.

Methods

Conversion tools

The SQL-based data format presented here can be favorably used for process automation. For human interaction and creating the tables, the classical .xls spreadsheet format can, however, be more convenient. An auxiliary tool is provided in the online repository98 for converting the data scheme from the .xls format to the SQL-based format and vice versa. Therefore, two Python scripts are provided in the online repository98. For testing, example .xls and SQL transferable force field files are also provided. The script named xlsx2SQL.py reads an .xls spreadsheet file in which a transferable force field is defined and creates an SQL database containing the corresponding transferable force field. The second script reads a transferable force field from an SQL database and creates the corresponding .xls spreadsheet files. The handling of these scripts is described in detail in the Supplementary Material. The .xls spread files are intended for constructing the actual SQL-based data format files of a given transferable force field.

Supplementary information

Supplementary Information (147.9KB, pdf)

Acknowledgements

The authors gratefully acknowledge funding of the present work by the BMBF under the grant WindHPC and financial support by the DFG within IRTG 2057 “Physical Modeling for Virtual Manufacturing Systems and Processes”. The calculations were carried out at the Regional University Computing Center Kaiserslautern (RHRZ) under the grant RPTU-MTD. The present research was conducted under the auspices of the Boltzmann-Zuse Society of Computational Molecular Engineering (BZS). The authors gratefully acknowledge housing by TU Kaiserslautern (TUK) in the past years.

Author contributions

G.K., Se.S., F.F. and Si.S. developed the data scheme and data format. The exemplaric force fields were implemented by G.K., Se.S. and F.F. The manuscript was written by G.K., Se.S. and Si.S. All authors reviewed the manuscript.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Data availability

The implemented force field files are publicly available in an online repository98.

Code availability

The code used for converting the data format files and building the SQL-based format are publicly available in an online repository98.

Competing interests

The authors declare no competing interests.

Footnotes

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

The online version contains supplementary material available at 10.1038/s41597-023-02369-8.

References

  • 1.Szlufarska I, Chandross M, Carpick RW. Recent advances in single-asperity nanotribology. Journal of Physics D: Applied Physics. 2008;41:123001. doi: 10.1088/0022-3727/41/12/123001. [DOI] [Google Scholar]
  • 2.Bitzek E, Kermode JR, Gumbsch P. Atomistic aspects of fracture. International Journal of Fracture. 2015;191:13–30. doi: 10.1007/s10704-015-9988-2. [DOI] [Google Scholar]
  • 3.Ruestes, C. J., Alhafez, I. A. & Urbassek, H. M. Atomistic studies of nanoindentation–a review of recent advances. Crystals7, 10.3390/cryst7100293 (2017).
  • 4.Ewen JP, Spikes HA, Dini D. Contributions of molecular dynamics simulations to elastohydrodynamic lubrication. Tribology Letters. 2021;69:24. doi: 10.1007/s11249-021-01399-w. [DOI] [Google Scholar]
  • 5.Getman RB, Bae Y-S, Wilmer CE, Snurr RQ. Review and Analysis of Molecular Simulations of Methane, Hydrogen, and Acetylene Storage in Metal–Organic Frameworks. Chemical Reviews. 2012;112:703–723. doi: 10.1021/cr200217c. [DOI] [PubMed] [Google Scholar]
  • 6.Stephan S, Hasse H. Enrichment at vapour-liquid interfaces of mixtures: Establishing a link between nanoscopic and macroscopic properties. Int. Rev. Phys. Chem. 2020;39:319–349. doi: 10.1080/0144235X.2020.1777705. [DOI] [Google Scholar]
  • 7.van Gunsteren WF, Berendsen HJC. Computer simulation of molecular dynamics: Methodology, applications, and perspectives in chemistry. Angewandte Chemie International Edition in English. 1990;29:992–1023. doi: 10.1002/anie.199009921. [DOI] [Google Scholar]
  • 8.Tuckerman ME, Martyna GJ. Understanding modern molecular dynamics: Techniques and applications. The Journal of Physical Chemistry B. 2000;104:159–178. doi: 10.1021/jp992433y. [DOI] [Google Scholar]
  • 9.Sponer J, et al. RNA Structural Dynamics as Captured by Molecular Simulations: A Comprehensive Overview. Chemical Reviews. 2018;118:4177–4338. doi: 10.1021/acs.chemrev.7b00427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Salo-Ahen, O. M. H. et al. Molecular Dynamics Simulations in Drug Discovery and Pharmaceutical Development. Processes9, 10.3390/pr9010071 (2021).
  • 11.Levitt M. The birth of computational structural biology. Nature Structural Biology. 2001;8:392–393. doi: 10.1038/87545. [DOI] [PubMed] [Google Scholar]
  • 12.Hollingsworth SA, Dror RO. Molecular dynamics simulation for all. Neuron. 2018;99:1129–1143. doi: 10.1016/j.neuron.2018.08.011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Mackerell AD. Empirical force fields for biological macromolecules: Overview and issues. Journal of Computational Chemistry. 2004;25:1584–1604. doi: 10.1002/jcc.20082. [DOI] [PubMed] [Google Scholar]
  • 14.Prausnitz JM, Tavares FW. Thermodynamics of fluid-phase equilibria for standard chemical engineering operations. AIChE Journal. 2004;50:739–761. doi: 10.1002/aic.10069. [DOI] [Google Scholar]
  • 15.Bedrov D, et al. Molecular Dynamics Simulations of Ionic Liquids and Electrolytes Using Polarizable Force Fields. Chemical Reviews. 2019;119:7940–7995. doi: 10.1021/acs.chemrev.8b00763. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Vrabec J, et al. Skasim–scalable HPC software for molecular simulation in the chemical industry. Chemie Ingenieur Technik. 2018;90:295–306. doi: 10.1002/cite.201700113. [DOI] [Google Scholar]
  • 17.Maginn EJ, Elliott JR. Historical perspective and current outlook for molecular dynamics as a chemical engineering tool. Industrial & Engineering Chemistry Research. 2010;49:3059–3078. doi: 10.1021/ie901898k. [DOI] [Google Scholar]
  • 18.Oliveira MP, et al. Comparison of the United- and All-Atom Representations of (Halo)alkanes Based on Two Condensed-Phase Force Fields Optimized against the Same Experimental Data Set. Journal of Chemical Theory and Computation. 2022;18:6757–6778. doi: 10.1021/acs.jctc.2c00524. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Schmitt, S., Fleckenstein, F., Hasse, H. & Stephan, S. Comparison of force fields for the prediction of thermophysical properties of long linear and branched alkanes. J. Phys. Chem. B10.1021/acs.jpcb.2c07997 (2023). [DOI] [PubMed]
  • 20.Ewen J, et al. A comparison of classical force-fields for molecular dynamics simulations of lubricants. materials. 2016;9:1–17. doi: 10.3390/ma9080651. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Vega C, Abascal JLF. Simulating water with rigid non-polarizable models: a general perspective. Phys. Chem. Chem. Phys. 2011;13:19663–19688. doi: 10.1039/C1CP22168J. [DOI] [PubMed] [Google Scholar]
  • 22.Guvench, O. & MacKerell, A. D. Comparison of Protein Force Fields for Molecular Dynamics Simulations, 63–88 (Springer-Humana Press, Totowa, NJ, 2008). [DOI] [PubMed]
  • 23.Levitt M, Hirshberg M, Sharon R, Daggett V. Potential energy function and parameters for simulations of the molecular dynamics of proteins and nucleic acids in solution. Computer Physics Communications. 1995;91:215–231. doi: 10.1016/0010-4655(95)00049-L. [DOI] [Google Scholar]
  • 24.Albaugh A, et al. Advanced potential energy surfaces for molecular simulation. The Journal of Physical Chemistry B. 2016;120:9811–9832. doi: 10.1021/acs.jpcb.6b06414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Allen, M. P. & Tildesley, D. J. Computer Simulation of Liquids, 2th edn (Oxford University Press, Oxford, United Kingdom, 2017).
  • 26.Maginn EJ. From discovery to data: What must happen for molecular simulation to become a mainstream chemical engineering tool. AIChE Journal. 2009;55:1304–1310. doi: 10.1002/aic.11932. [DOI] [Google Scholar]
  • 27.Wilkinson MD, et al. The FAIR guiding principles for scientific data management and stewardship. Scientific data. 2016;3:1–9. doi: 10.1038/sdata.2016.18. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 28.Mayo SL, Olafson BD, Goddard WA. DREIDING: A Generic Force Field for Molecular Simulations. Journal of Physical Chemistry. 1990;94:8897–8909. doi: 10.1021/j100389a010. [DOI] [Google Scholar]
  • 29.Rappé AK, Casewit CJ, Colwell K, Goddard WA, III, Skiff WM. UFF, a Full Periodic Table Force Field for Molecular Mechanics and Molecular Dynamics Simulations. Journal of the American Chemical Society. 1992;114:10024–10035. doi: 10.1021/ja00051a040. [DOI] [Google Scholar]
  • 30.Cornell WD, et al. A Second Generation Force Field for the Simulation of Proteins, Nucleic Acids, and Organic Molecules. Journal of the American Chemical Society. 1995;117:5179–5197. doi: 10.1021/ja00124a002. [DOI] [Google Scholar]
  • 31.Sun H, Mumby SJ, Maple JR, Hagler AT. An Ab Initio CFF93 All-Atom Force Field for Polycarbonates. Journal of the American Chemical Society. 1994;116:2978–2987. doi: 10.1021/ja00086a030. [DOI] [Google Scholar]
  • 32.Martin MG, Siepmann JI. Transferable Potentials for Phase Equilibria. 1. United-Atom Description of n-Alkanes. Journal of Physical Chemistry B. 1998;102:2569–2577. doi: 10.1021/jp972543+. [DOI] [Google Scholar]
  • 33.Martin MG, Siepmann JI. Novel Configurational-Bias Monte Carlo Method for Branched Molecules. Transferable Potentials for Phase Equilibria. 2. United-Atom Description of Branched Alkanes. Journal of Physical Chemistry B. 1999;103:4580–4517. doi: 10.1021/jp984742e. [DOI] [Google Scholar]
  • 34.Wick CD, Martin MG, Siepmann JI. Transferable Potentials for Phase Equilibria. 4. United-Atom Description of Linear and Branched Alkenes and Alkylbenzenes. Journal of Physical Chemistry B. 2000;104:8008–8016. doi: 10.1021/jp001044x. [DOI] [Google Scholar]
  • 35.Chen B, Potoff JJ, Siepmann JI. Monte Carlo Calculations for Alcohols and Their Mixtures with Alkanes. Transferable Potentials for Phase Equilibria. 5. United-Atom Description of Primary, Secondary, and Tertiary Alcohols. Journal of Physical Chemistry B. 2001;105:3093–3104. doi: 10.1021/jp003882x. [DOI] [Google Scholar]
  • 36.Strubbs JM, Potoff JJ, Siepmann JI. Transferable Potentials for Phase Equilibria. 6. United-Atom Description for Ethers, Glycols, Ketones, and Aldehydes. Journal of Physical Chemistry B. 2004;108:17596–17605. doi: 10.1021/jp049459w. [DOI] [Google Scholar]
  • 37.Wick CD, Strubb JM, Rai N, Siepmann JI. Transferable Potentials for Phase Equilibria. 7. Primary, Secondary, and Tertiary Amines, Nitroalkanes and Nitrobenzene, Nitriles, Amides, Pyridine, and Pyrimidine. Journal of Physical Chemistry B. 2005;109:18974–18982. doi: 10.1021/jp0504827. [DOI] [PubMed] [Google Scholar]
  • 38.Lubna N, Kamath G, Potoff JJ, Raij N, Siepmann JI. Transferable Potentials for Phase Equilibria. 8. United-Atom Description for Thiols, Sulfides, Disulfides, and Thiophene. Journal of Physical Chemistry B. 2005;109:24100–24107. doi: 10.1021/jp0549125. [DOI] [PubMed] [Google Scholar]
  • 39.Maerzke KA, Schultz NE, Ross RB, Siepmann JI. TraPPE-UA Force Field for Acrylates and Monte Carlo Simulations for Their Mixtures with Alkanes and Alcohols. Journal of Physical Chemistry B. 2009;113:6415–6425. doi: 10.1021/jp810558v. [DOI] [PubMed] [Google Scholar]
  • 40.Zhang L, Siepmann JI. Pressure dependence of the vapor-liquid-liquid phase behavior in ternary mixtures consisting of n-alkanes, n-perfluoroalkanes, and carbon dioxide. The Journal of Physical Chemistry B. 2004;109:2911–2919. doi: 10.1021/jp0482114. [DOI] [PubMed] [Google Scholar]
  • 41.Lee J-S, Wick CD, Stubbs JM, Siepmann JI. Simulating the vapour-liquid equilibria of large cyclic alkanes. Molecular Physics. 2005;103:99–104. doi: 10.1080/00268970412331303341. [DOI] [Google Scholar]
  • 42.Keasler SJ, Charan SM, Wick CD, Economou IG, Siepmann JI. Transferable potentials for phase equilibria-united atom description of five- and six-membered cyclic alkanes and ethers. The Journal of Physical Chemistry B. 2012;116:11234–11246. doi: 10.1021/jp302975c. [DOI] [PubMed] [Google Scholar]
  • 43.Wick CD, Siepmann J, Klotz WL, Schure MR. Temperature effects on the retention of n-alkanes and arenes in helium–squalane gas–liquid chromatography. Journal of Chromatography A. 2002;954:181–190. doi: 10.1016/s0021-9673(02)00171-1. [DOI] [PubMed] [Google Scholar]
  • 44.Jorgensen WL, Maxwell DS, Tirado-Rives J. Development and Testing of the OPLS All-Atom Force Field on Conformational Energetics and Properties of Organic Liquids. Journal of the American Chemical Society. 1996;118:11225–11236. doi: 10.1021/ja9621760. [DOI] [Google Scholar]
  • 45.Weiner SJ, Kollman PA, Nguyen DT, Case DA. An all Atom Force Field for Simulations of Proteins and Nucleic Acids. Journal of Computational Chemistry. 1986;7:230–252. doi: 10.1002/jcc.540070216. [DOI] [PubMed] [Google Scholar]
  • 46.Cornell WD, et al. A second generation force field for the simulation of proteins, nucleic acids, and organic molecules. Journal of the American Chemical Society. 1996;118:2309–2309. doi: 10.1021/ja955032e. [DOI] [Google Scholar]
  • 47.Damm, W., Frontera, A., Tirado-Rives, J. & Jorgensen, W. L. OPLS All-Atom Force Field for Carbohydrates. Journal of Computational Chemistry18, 1955–1970, 10.1002/(SICI)1096-987X(199712)18:16<1955::AID-JCC1>3.0.CO;2-L (1997).
  • 48.Jorgensen, W. L. & McDonald, N. A. Development of an All-Atom Force Field for Heterocycles. Properties of Liquid Pyridine and Diazenes. Journal of Molecular Structure: THEOCHEM424, 145–155, 10.1016/S0166-1280(97)00237-6. A Faithful Couple: Qualitative and Quantitative Understanding of Chemistry (1998).
  • 49.Potoff JJ, Bernard-Brunel DA. Mie potentials for phase equilibria calculations: Application to alkanes and perfluoroalkanes. The Journal of Physical Chemistry B. 2009;113:14725–14731. doi: 10.1021/jp9072137. [DOI] [PubMed] [Google Scholar]
  • 50.Mick JR, Soroush Barhaghi M, Jackman B, Schwiebert L, Potoff JJ. Optimized Mie Potentials for Phase Equilibria: Application to Branched Alkanes. Journal of Chemical & Engineering Data. 2017;62:1806–1818. doi: 10.1021/acs.jced.6b01036. [DOI] [PubMed] [Google Scholar]
  • 51.Potoff JJ, Kamath G. Mie Potentials for Phase Equilibria: Application to Alkenes. Journal of Chemical & Engineering Data. 2014;59:3144–3150. doi: 10.1021/je500202q. [DOI] [Google Scholar]
  • 52.Barhaghi MS, Mick JR, Potoff JJ. Optimised Mie Potentials for Phase Equilibria: Application to Alkynes. Molecular Physics. 2017;115:1378–1388. doi: 10.1080/00268976.2017.1297862. [DOI] [Google Scholar]
  • 53.Dauber-Osguthorpe P, et al. Structure and Energetics of Ligand Binding to Proteins: Escherichia Coli Dihydrofolate Reductase-Trimethoprim, a Drug-Receptor System. Proteins: Structure, Function, and Bioinformatics. 1988;4:31–47. doi: 10.1002/prot.340040106. [DOI] [PubMed] [Google Scholar]
  • 54.Schappals M, et al. Round Robin Study: Molecular Simulation of Thermodynamic Properties from Models with Internal Degrees of Freedom. Journal of Chemical Theory and Computation. 2017;13:4270–4280. doi: 10.1021/acs.jctc.7b00489. [DOI] [PubMed] [Google Scholar]
  • 55.Hocquet A, Wieber F. Epistemic issues in computational reproducibility: software as the elephant in the room. European Journal for Philosophy of Science. 2021;11:38. doi: 10.1007/s13194-021-00362-9. [DOI] [Google Scholar]
  • 56.Loeffler HH, et al. Reproducibility of free energy calculations across different molecular simulation software packages. Journal of Chemical Theory and Computation. 2018;14:5567–5582. doi: 10.1021/acs.jctc.8b00544. [DOI] [PubMed] [Google Scholar]
  • 57.Abraham M, et al. Sharing data from molecular simulations. Journal of Chemical Information and Modeling. 2019;59:4093–4099. doi: 10.1021/acs.jcim.9b00665. [DOI] [PubMed] [Google Scholar]
  • 58.Yong CW. Descriptions and Implementations of DL_F Notation: A Natural Chemical Expression System of Atom Types for Molecular Simulations. Journal of Chemical Information and Modeling. 2016;56:1405–1409. doi: 10.1021/acs.jcim.6b00323. [DOI] [PubMed] [Google Scholar]
  • 59.Thompson MW, et al. Towards molecular simulations that are transparent, reproducible, usable by others, and extensible (TRUE. Molecular Physics. 2020;118:e1742938. doi: 10.1080/00268976.2020.1742938. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 60.Gygli, G. & Pleiss, J. Simulation foundry: Automated and F.A.I.R. molecular modeling. Journal of Chemical Information and Modeling60, 1922–1927, 10.1021/acs.jcim.0c00018. PMID: 32240586 (2020). [DOI] [PubMed]
  • 61.Horsch, M. T., Chiacchiera, S., Cavalcanti, W. L. & Schembera, B. Data Technology in Materials Modelling (Springer Nature, Cham, Switzerland, 2021).
  • 62.Horsch MT, et al. Semantic interoperability and characterization of data provenance in computational molecular engineering. Journal of Chemical & Engineering Data. 2020;65:1313–1329. doi: 10.1021/acs.jced.9b00739. [DOI] [Google Scholar]
  • 63.Kanza S, Willoughby C, Bird CL, Frey JG. eScience infrastructures in physical chemistry. Annual Review of Physical Chemistry. 2022;73:97–116. doi: 10.1146/annurev-physchem-082120-041521. [DOI] [PubMed] [Google Scholar]
  • 64.Hildebrand PW, Rose AS, Tiemann JK. Bringing molecular dynamics simulation data into view. Trends in Biochemical Sciences. 2019;44:902–913. doi: 10.1016/j.tibs.2019.06.004. [DOI] [PubMed] [Google Scholar]
  • 65.Grunzke R, et al. Standards-based metadata management for molecular simulations. Concurrency and Computation: Practice and Experience. 2014;26:1744–1759. doi: 10.1002/cpe.3116. [DOI] [Google Scholar]
  • 66.Berman HM, et al. The Protein Data Bank. Nucleic Acids Research. 2000;28:235–242. doi: 10.1093/nar/28.1.235. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Murray-Rust P, Rzepa HS, Wright M. Development of Chemical Markup Language (CML) as a System for Handling Complex Chemical Content. New Journal of Chemistry. 2001;25:618–634. doi: 10.1039/B008780G. [DOI] [Google Scholar]
  • 68.Ash S, Cline MA, Homer RW, Hurst T, Smith GB. SYBYL Line Notation (SLN): A Versatile Language for Chemical Structure Representation. Journal of Chemical Information and Computer Sciences. 1997;37:71–79. doi: 10.1021/ci960109j. [DOI] [Google Scholar]
  • 69.Mobley DL, et al. Escaping atom types in force fields using direct chemical perception. Journal of Chemical Theory and Computation. 2018;14:6076–6092. doi: 10.1021/acs.jctc.8b00640. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 70.Gakh AA, Burnett MN. Modular Chemical Descriptor Language (MCDL): Composition, Connectivity, and Supplementary Modules. Journal of Chemical Information and Computer Sciences. 2001;41:1494–1499. doi: 10.1021/ci000108y. [DOI] [PubMed] [Google Scholar]
  • 71.Weininger D. SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules. Journal of Chemical Information and Computer Sciences. 1988;28:31–36. doi: 10.1021/ci00057a005. [DOI] [Google Scholar]
  • 72.Zhang T, Li H, Xi H, Stanton RV, Rotstein SH. HELM: A Hierarchical Notation Language for Complex Biomolecule Structure Representation. Journal of Chemical Information and Modeling. 2012;52:2796–2806. doi: 10.1021/ci3001925. [DOI] [PubMed] [Google Scholar]
  • 73.van den Broek K, et al. SPICES: A Particle-Based Molecular Structure Line Notation and Support Library for Mesoscopic Simulation. Journal of Cheminformatics. 2018;10:1–10. doi: 10.1186/s13321-018-0294-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 74.Yesselman JD, Price DJ, Knight JL, Brooks CL., III MATCH: An atom-typing toolset for molecular mechanics force fields. Journal of Computational Chemistry. 2012;33:189–202. doi: 10.1002/jcc.21963. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 75.Eastman P, et al. Openmm 7: Rapid development of high performance algorithms for molecular dynamics. PLOS Computational Biology. 2017;13:1–17. doi: 10.1371/journal.pcbi.1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 76.Eggimann BL, Sunnarborg AJ, Stern HD, Bliss AP, Siepmann JI. An online parameter and property database for the TraPPE force field. Molecular Simulation. 2014;40:101–105. doi: 10.1080/08927022.2013.842994. [DOI] [Google Scholar]
  • 77.Klein C, et al. Formalizing atom-typing and the dissemination of force fields with foyer. Computational Materials Science. 2019;167:215–227. doi: 10.1016/j.commatsci.2019.05.026. [DOI] [Google Scholar]
  • 78.Zoete V, Cuendet MA, Grosdidier A, Michielin O. SwissParam: A fast force field generation tool for small organic molecules. Journal of Computational Chemistry. 2011;32:2359–2368. doi: 10.1002/jcc.21816. [DOI] [PubMed] [Google Scholar]
  • 79.Dodda LS, Cabeza de Vaca I, Tirado-Rives J, Jorgensen WL. LigParGen web server: an automatic OPLS-AA parameter generator for organic ligands. Nucleic Acids Research. 2017;45:W331–W336. doi: 10.1093/nar/gkx312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 80.Tadmor EB, Elliott RS, Sethna JP, Miller RE, Becker CA. The potential of atomistic simulations and the knowledgebase of interatomic models. JOM–The Journal of The Minerals, Metals & Materials Society. 2011;63:17. doi: 10.1007/s11837-011-0102-6. [DOI] [Google Scholar]
  • 81.Eastman P, et al. Openmm 4: A reusable, extensible, hardware independent library for high performance molecular simulation. Journal of Chemical Theory and Computation. 2013;9:461–469. doi: 10.1021/ct300857j. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 82.Cummings PT, et al. Open-source molecular modeling software in chemical engineering focusing on the molecular simulation design framework. AIChE Journal. 2021;67:e17206. doi: 10.1002/aic.17206. [DOI] [Google Scholar]
  • 83.Stephan S, Horsch MT, Vrabec J, Hasse H. MolMod –s An Open Access Database of Force Fields for Molecular Simulations of Fluids. Molecular Simulation. 2019;45:806–814. doi: 10.1080/08927022.2019.1601191. [DOI] [Google Scholar]
  • 84.da Silva GCQ, Silva GM, Tavares FW, Fleming FP, Horta BAC. Are all-atom any better than united-atom force fields for the description of liquid properties of alkanes? Journal of Molecular Modeling. 2020;26:296. doi: 10.1007/s00894-020-04548-5. [DOI] [PubMed] [Google Scholar]
  • 85.Van Duin AC, Dasgupta S, Lorant F, Goddard WA. ReaxFF: A Reactive Force Field for Hydrocarbons. The Journal of Physical Chemistry A. 2001;105:9396–9409. doi: 10.1021/jp004368u. [DOI] [Google Scholar]
  • 86.Atkins, P., Atkins, P. W. & de Paula, J. Atkins’ Physical Chemistry (Oxford University Press, 2014).
  • 87.Jones JE. On the Determination of Molecular Fields.–I. From the Variation of the Viscosity of a Gas with Temperature. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character. 1924;106:441–462. doi: 10.1098/rspa.1924.0081. [DOI] [Google Scholar]
  • 88.Jones JE. On the Determination of Molecular Fields.–II. From the Equation of State of a Gas. Proceedings of the Royal Society of London. Series A, Containing Papers of a Mathematical and Physical Character. 1924;106:463–477. doi: 10.1098/rspa.1924.0082. [DOI] [Google Scholar]
  • 89.Stephan S, Thol M, Vrabec J, Hasse H. Thermophysical properties of the Lennard-Jones fluid: Database and data assessment. J. Chem. Inf. Model. 2019;59:4248–4265. doi: 10.1021/acs.jcim.9b00620. [DOI] [PubMed] [Google Scholar]
  • 90.Mie G. Zur kinetischen Theorie der einatomigen Körper. Annalen der Physik. 1903;316:657–697. doi: 10.1002/andp.19033160802. [DOI] [Google Scholar]
  • 91.Leach, A. R. Molecular modelling: principles and applications (Pearson, 2001).
  • 92.Maple JR, Dinur U, Hagler AT. Derivation of force fields for molecular mechanics and dynamics from ab initio energy surfaces. Proceedings of the National Academy of Sciences. 1988;85:5350–5354. doi: 10.1073/pnas.85.15.5350. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 93.Deiters UK, Sadus RJ. Fully a priori prediction of the vapor-liquid equilibria of Ar, Kr, and Xe from ab initio two-body plus three-body interatomic potentials. The Journal of Chemical Physics. 2019;151:034509. doi: 10.1063/1.5109052. [DOI] [PubMed] [Google Scholar]
  • 94.Ströker P, Hellmann R, Meier K. Thermodynamic properties of argon from Monte Carlo simulations using ab initio potentials. Phys. Rev. E. 2022;105:064129. doi: 10.1103/PhysRevE.105.064129. [DOI] [PubMed] [Google Scholar]
  • 95.Unke OT, et al. Machine Learning Force Fields. Chemical Reviews. 2021;121:10142–10186. doi: 10.1021/acs.chemrev.0c01111. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 96.Brown, T. L. et al. Chemistry: the central science, 15th global edition in si units edn (Pearson, Harlow, 2022).
  • 97.Paskin N. Toward unique identifiers. Proceedings of the IEEE. 1999;87:1208–1227. doi: 10.1109/5.771073. [DOI] [Google Scholar]
  • 98.Kanagalingam G, Schmitt S, Fleckenstein F, Stephan S. 2023. TUK-FFDat - Data scheme and data format for transferable force fields for molecular simulation. Zenodo. [DOI] [PMC free article] [PubMed]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Citations

  1. Kanagalingam G, Schmitt S, Fleckenstein F, Stephan S. 2023. TUK-FFDat - Data scheme and data format for transferable force fields for molecular simulation. Zenodo. [DOI] [PMC free article] [PubMed]

Supplementary Materials

Supplementary Information (147.9KB, pdf)

Data Availability Statement

The implemented force field files are publicly available in an online repository98.

The code used for converting the data format files and building the SQL-based format are publicly available in an online repository98.


Articles from Scientific Data are provided here courtesy of Nature Publishing Group

RESOURCES