Skip to main content
Data in Brief logoLink to Data in Brief
. 2021 Mar 15;35:106948. doi: 10.1016/j.dib.2021.106948

Dataset of AMBER force field parameters of drugs, natural products and steroids for simulations using GROMACS

Jennifer Loschwitz a,b, Anna Jäckering a, Monika Keutmann a,b, Maryam Olagunju a, Olujide O Olubiyi a,c,, Birgit Strodel a,b,
PMCID: PMC8027721  PMID: 33855133

Abstract

We provide general AMBER force field (GAFF) parameters for 160 organic molecules including drugs, natural products, and steroids, which can be employed without further processing in molecular dynamics (MD) simulations using GROMACS. We determined these parameters based on quantum mechanical (QM) calculations involving geometry optimization at the HF6-31G* level of theory. For each molecule we provide a coordinate file of the three-dimensional molecular structure, the topology and the parameter file. The applicability of these parameters was demonstrated by MD simulations of these molecules bound to the active site of the main protease of the coronavirus SARS-CoV-2, 3CLpro, which is a main player during viral replication causing COVID-19.

Keywords: force field parameterization, AMBER force field, MD simulations, GROMACS, Quantum mechanics, drugs, natural products

Specifications Table

Subject Physical and Theoretical Chemistry
Specific subject area Computational biochemistry, Drug discovery, Computer-aided drug design
Type of data PDB files, topology and parameter files in GROMACS format, Gaussian 09 and GROMACS code used for generating the data
How data were acquired Quantum mechanics (QM) at the HF6-31G* level of theory, explicit-solvent molecular dynamics (MD) simulations
Data format Raw
Parameters for data collection Software used: Gaussian 09 for QM, GROMACS 2018 for MD
Description of data collection Force field parameters were derived from QM calculations and assembled in the required files for MD simulations with GROMACS.
Data source location Institute of Biological Information Processing: Structural Biochemistry (IBI-7), Forschungszentrum Jülich, 52428 Jülich, Germany
Data accessibility Dataset is uploaded on Mendeley Data: https://doi.org/10.17632/phxtv76n5s.3
Related research article Olubiyi et al., Molecules 25, 3193 (2020) [1]

Value of the Data

  • GAFF parameters of 160 organic molecules ready for use in MD simulations employing GROMACS.

  • The parameters given here are compatible with AMBER force fields, allowing to study the interactions of these molecules with proteins.

  • Easy identification of the molecules via their ZINC or PubChem accession identifiers and, if available, their trivial names.

1. Data Description

In Table 1, the 160 molecules for which GAFF parameters were derived are listed. The compounds include 62 drugs approved by the FDA (U.S. Food and Drug Administration), 44 drugs approved by other countries’ national regulatory agencies (non-FDA) and investigational drugs, 39 natural products, 10 steroids, and 5 other molecules. Most of these molecules are included in the ZINC database [2], [3], [4], which is a curated collection of more than 230 million commercially available chemical compounds prepared for virtual screening. The molecules in Table 1 are therefore denoted by their ZINC database accession identifier (ID). For the few cases where a ZINC accession ID is not available, we provide the one from PubChem (starting with CID), which is a database of chemical molecules and their activities against biological assays. For the five molecules which are not yet found in the ZINC or the PubChem database, the reference where information about the molecule in question can be found is provided. In addition to the respective database accession ID we provide, if available, the trivial names of the compounds. For an easy identification of the molecules in MD simulations, we invented a 3-letter code for each molecule, that is also shown in Table 1 and is used as molecular identifier in the PDB and GROMACS files provided here.

Table 1.

Identification details of the 160 molecules parameterized in this work.

Accession ID Trivial Name 3-Letter Code Accession ID Trivial Name 3-Letter Code
FDA
Non-FDA and Investigational
ZINC000072318121 Abemaciclib AMB ZINC000003922429 Adozelesin AZL
ZINC000003976838 Afatinib AFB ZINC000003780800 Amrubicin ARC
ZINC000011677837 Apixaban APX ZINC000006717782 BMS-599626 BMS
ZINC000000897240 Azelastine ALT ZINC000001542916 Carmofur CMF
ZINC000014210642 Azilsartan AZT ZINC000254071113 Ciluprevir CPV
ZINC000003782818 Candesartan CDT ZINC000001714738 Cinanserin CNS
ZINC000085537017 Cangrelor CGL ZINC000004215648 Dihydroergocornine DHC
ZINC000001552174 Cilostazol CLT ZINC000014880002 Dihydroergotoxine DHE
ZINC000060325170 Cobimetinib COB CID3194 Ebselen EBS
ZINC000012503187 Conivaptan CVT ZINC000004215770 Elsamitrucin ETC
ZINC000035902489 Crizotinib CZB ZINC000098208742 Entospletinib EPB
ZINC000001530788 Cromolyn CML ZINC000001494900 Enzastaurin EZS
ZINC000003986735 Dasatinib DSB ZINC000019899628 Fenoverine FNV
ZINC000001481815 Deferasirox DFX ZINC000059185874 GDC-0834 GDC
ZINC000003827556 Delafloxacin DFC ZINC000003780340 Hypericin HPC
ZINC000001529266 Disulfiram DSR ZINC000003781738 Lestaurtinib LTB
ZINC000058581064 Dolutegravir DLV ZINC000003950115 Lonafarnib LFB
ZINC000003932831 Dutasteride DUS ZINC000003817327 Ly2090314 LY2
ZINC000222731806 Enasidenib ESB ZINC000043203371 MK-3207 MK3
ZINC000052955754 Ergotamine ETM ZINC000100001820 PF-00477736 PF0
ZINC000003918453 Ertapenem EPN ZINC000013209429 PX-12 P12
ZINC000003938684 Etoposide ETP ZINC000038576002 R-343 NI3
ZINC000003860453 Fluorescein FRC ZINC000059749972 Radotinib RDB
ZINC000100001976 Glimepiride GLP ZINC000063933734 Rebastinib RBB
ZINC000035328014 Ibrutinib IRB CID121304016 Remdesivir RDV
ZINC000003920266 Idarubicin IRC ZINC000003812168 Ruboxistaurin RXS
ZINC000013986658 Idelalisib IDB ZINC000095535868 Rwj-58259 RWJ
ZINC000008101127 Indocyanine IDC ZINC000003973984 Sotrastaurin STS
ZINC000022448696 Indinavir IDV ZINC000003975327 Telomestatin TMS
ZINC000019632618 Imatinib IMB ZINC000028827350 Telcagepant TCG
ZINC000027990463 Lomitapide LTP ZINC000013985228 Tideglusib TDG
ZINC000064033452 Lumacaftor LMC ZINC000043133316 Tirilazad TAD
ZINC000003927822 Lurasidone LRD ZINC000084726167 TMC647055 TMC
ZINC000100003902 Maraviroc MVC ZINC000003978083 Tubocurarine TBC
ZINC000003831151 Montelukast MTL ZINC000068250462 Tucatinib TCB
ZINC000100378061 Naldemedine NMD ZINC000095539256 UK-432,097 UK4
ZINC000005844788 Nebivolol NBL ZINC000001490807 NI5
ZINC000006716957 Nilotinib NLB ZINC000001539348 NI4
ZINC000043206370 Niraparib NPB ZINC000003930598 NI7
ZINC000040430143 Olaparib OPB ZINC000018710085 TFB
ZINC000003812865 Olsalazine OSZ ZINC000021290045 NI1
ZINC000003938686 Palbociclib PBB ZINC000049888572 NI2
ZINC000004214700 Paliperidone PLP ZINC000095092808 NI6
ZINC000011617039 Pazopanib PZB ZINC000100029945 Zosuquidar ZSQ
ZINC000030691797 Perampanel PRP Natural Products
ZINC000004175630 Pimozide PMZ ZINC000003984030 Amentoflavone AMF
ZINC000013831130 Raltegravir RTV CID5321811 Bavacoumestan A BCA
ZINC000013818943 Regadenoson RDS ZINC000004098612 Corilagin CRG
ZINC000003944422 Ritonavir RNV ZINC000018847034 Daidzein DDZ
ZINC000003816514 Rolapitant RLT CID12443227 Epitaraxerol ETX
ZINC000029416466 Saquinavir SQV ZINC000003870412 Epigallocatechin gallate EGC
ZINC000019796168 Sildenafil SDF ZINC000001531664 Ginkgetin GKT
ZINC000253632968 Simeprevir SPV ZINC000100777667 Glabrolide GBL
ZINC000001489478 Sitagliptin STG ZINC000004098322 Homoeriodictyol HMR
ZINC000049036447 Suvorexant SVX CID10077799 Isocorilagin ICL
ZINC000003993855 Tadalafil TDF ZINC000003197535 Isoginkgetin IGK
ZINC000001530886 Telmisartan TMT ZINC000100828606 Neodiosmin NDS
ZINC000004099008 Teniposide TNP ZINC000044351169 Proanthocyanidin A1 PA1
ZINC000001530948 Thalidomide THD ZINC000004098619 Proanthocyanidin A2 PA2
ZINC000100016058 Tipranavir TPV ZINC000095619717 Proanthocyanidin A5’ PA5
ZINC000043100709 Trametinib TMB ZINC000003978800 Rhoifolin RHL
ZINC000018324776 Vardenafil VDF ZINC000002015152 Shikonin SKN
Steroids
ZINC000150352420 Theacitrin A TCA
ZINC000003815419 2-Hydroxyestradiol HED ZINC000230071666 Theacitrin C TCC
ZINC000004096681 2-Hydroxyestrone HES ZINC000003978446 Theaflavin TFV
CID91451 17-α-hydroxypregnenolone AHP ZINC000169372863 Theasinensin A TSA
ZINC000004081043 Allopregnanolone APG ZINC000008214976 Theasinensin B TSB
ZINC000004428526 Androstenedione ASD ZINC000169333962 Theasinensin F TSF
ZINC000004340309 Cortisol CTS ZINC000002107922 N14
ZINC000003807917 Dehydroepiandrosterone DHE ZINC000002114470 N09
CID5757 Estradiol ESD ZINC000002125422 N10
CID27125 Estetrol ESO ZINC000002147804 N02
ZINC000118912393 Testosterone TST ZINC000002148919 N01
Others
ZINC000002158857 N13
PDB 6LU7[19] N3 N3P ZINC000002161217 N08
α-Ketoamide[20] Inhibitor 11R 11R ZINC000004235306 N15
α-Ketoamide[20] Inhibitor 13A 13A ZINC000006624329 N12
α-Ketoamide[20] Inhibitor 13B 13B ZINC000008297065 N16
α-Ketoamide[20] Inhibitor 14B 14B ZINC000008764269 N11
ZINC000008789992 N03
ZINC000011865175 N06
ZINC000012296408 N04
ZINC000012881832 N05
ZINC000014887561 Zeylanone ZYL

For each of the molecules, we supply four files containing the raw data, which are compatible with the GROMACS format and allow the performance of MD simulations without further processing:

  • 1.

    A PDB file containing three-dimensional coordinates of the molecule.

  • 2.

    A top file containing the topology of the molecule.

  • 3.

    An itp file containing the force field parameters, including the atomic charges as well as the σ and ε values.

  • 4.

    An itp file with position restraints involving the heavy atoms as needed by an equilibration MD run.

All files are assembled into one zip file, which is supplied via Mendeley Data, https://doi.org/10.17632/phxtv76n5s.3. Unpacking the zip file yields five folders: FDA, Non-FDA_and_Investigational, Natural_Products, Steroids, and Others. In each of them, one finds further directories, which are named according to the accession ID listed in Table 1. In these subdirectories there are the four files per molecule located, which all start with the 3-letter code as listed in the Table.

2. Experimental Design, Materials and Methods

To determine the GAFF parameters of the 160 molecules, we used the PDB files that we obtained from docking of these compounds bound to the crystal structure of 3CLpro in our previous study [1] as starting point. We isolated the molecules from the protein in order to have only the ligand in the PDB file, which was processed using the GROMACS tool gmx editconf to enter the CONECT records specifying the connectivity between atoms in the PDB file. This is needed by Open Babel [5], which was applied afterwards to add missing hydrogen atoms. We then utilized Antechamber [6], [7] as available in AmberTools 19 [8] to generate the input gcrt file for Gaussian, which contains the coordinates and net charge of the molecule in question. This format was selected since it guarantees that the atom order as present in the PDB file is not changed by Gaussian. These preparatory steps were followed by the QM calculations at the HF6-31G* level of theory, including a geometry optimization and the determination of the electrostatic potential using Gaussian 09 [9]. Antechamber was then employed to extract the force field parameters from the output file called gout, involving bond lengths, bond angles, and torsion angles as well as Lennard-Jones (LJ) interaction parameters. Furthermore, Antechamber also allows to calculate the restrained electrostatic potential (RESP) for determining partial charges [10], [11]. Afterwards, we created a mol2 file containing all necessary parameters, which was analyzed by ACPYPE [12] to generate the required GROMACS input files with extensions .gro, .top, and .itp.

To this procedure two exceptions had to be made: (1) In the case that the molecule in question contains an iodine atom, the basis set CEP-31G was used because at the 6-31G* level this atom is not included. This change is automatically accomplished by Antechamber. (2) Since ebselen contains a selenium atom which is not defined in Antechamber, we had to use a workaround. We performed the parameterization with sulfur, which exhibits similar properties like selenium, replacing the selenium atom. After the ACPYPE step, the sulfur atom was converted back to selenium in the affected GROMACS files. In addition, we changed the Se–N bond parameters in the itp file to the ones that were optimized for the MD software AMBER [13], [14], which are Rmin=2.12 Å and ε=0.2910 kcal/mol and can be converted into the GROMACS format using

σGROMACS[nm]=2·Rmin[A]·216·0.1=3.77741×101nm
εGROMACS[kJ/mol]=4.184·εAMBER[kcal/mol]=1.21754kJ/mol

To test the reliability of the resulting force field parameters, we applied them in energy minimizations of the 160 molecules using their structures as obtained from docking to 3CLpro [15], which were also used for the force field parameterization as starting structures. These calculations were realized with GROMACS 2018 [16]. The energy minimizations were performed using the steepest descent algorithm until all forces were less than 10 kJ mol1 nm1. The resulting energy-minimized structures were compared to the corresponding geometry-optimized conformations from the QM calculations by determining their root mean square deviation (RMSD) after structural superposition using PyMOL [17]. If the RMSD was 4 Å, then no further checks were applied. If this cutoff was exceeded, which happened for only few of the molecules, the structural reorientations were inspected in more detail. However, in none of the cases severe structural rearrangements had occurred. The increased RMSD values could be explained with local rotations of rings or alkyl groups. Afterwards, we applied the newly derived force field parameters in 20 ns MD simulations of the molecules docked to 3CLpro using GROMACS 2018 and AMBER14SB [18] as force field for the protein. For 99 of the ligands that fulfilled specific structural requirements for inhibitor design reported in [15], the MD simulations were extended to 100 ns. All MD simulations (whether 20 ns or 100 ns) finished successfully without any stability or incompatibility issues arising.

Via the already mentioned Mendeley dataset (https://doi.org/10.17632/phxtv76n5s.3), a zip file is provided that contains all Gaussian and GROMACS input files used for generating the force field parameters, along with bash scripts for automating the parameterization procedure as much as possible.

CRediT Author Statement

Jennifer Loschwitz: Methodology, Software, Data curation, Validation, Writing - original draft; Anna Jäckering: Formal analysis, Visualization, Writing - original draft; Monika Keutmann: Investigation, Data curation, Validation; Maryam Olagunju: Investigation, Data curation; Olujide O. Olubiyi: Conceptualization, Supervision, Writing - review & editing; Birgit Strodel: Conceptualization, Supervision, Project administration, Resources, Writing - review & editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships which have, or could be perceived to have, influenced the work reported in this article.

Acknowledgments

The authors gratefully acknowledge the computing time granted through JARA-HPC (project COVID19MD) on the supercomputer JURECA at Forschungszentrum Jülich [21], the hybrid computer cluster purchased from funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) project number INST 208/704-1 FUGG, and the Centre for Information and Media Technology at Heinrich Heine University Düsseldorf.

Contributor Information

Olujide O. Olubiyi, Email: olubiyioo@oauife.edu.ng.

Birgit Strodel, Email: b.strodel@fz-juelich.de.

References

  • 1.Olubiyi O., Olagunju M., Keutmann M., Loschwitz J., Strodel B. High throughput virtual screening to discover inhibitors of the main protease of the coronavirus sars-cov-2. Molecules. 2020;25:3193. doi: 10.3390/molecules25143193. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Irwin J., Shoichet B. ZINC–a free database of commercially available compounds for virtual screening. J. Chem. Inf. Model. 2005;45:177–182. doi: 10.1021/ci049714. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Irwin J., Sterling T., Mysinger M., Bolstad E., Coleman R. ZINC: a free tool to discover chemistry for biology. J. Chem. Inf. Model. 2012;52:1757–1768. doi: 10.1021/ci3001277. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Sterling T., Irwin J. Zinc 15 – ligand discovery for everyone. J. Chem. Inf. Model. 2015;55:2324–2337. doi: 10.1021/acs.jcim.5b00559. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.O’Boyle N., Banck M., James C., Morley C., Vandermeersch T., Hutchison G. Open babel: an open chemical toolbox. J. Cheminformatics. 2011;3:33. doi: 10.1186/1758-2946-3-33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Wang J., Wolf R., Caldwell J., Kollman P., Case D. Development and testing of a general amber force field. J. Comput. Chem. 2004;25:1157–1174. doi: 10.1002/jcc.20035. [DOI] [PubMed] [Google Scholar]
  • 7.Wang J., Wang W., Kollman P., Case D. Automatic atom type and bond type perception in molecular mechanical calculations. J. Mol. Graphics Modell. 2006;25:247–260. doi: 10.1016/j.jmgm.2005.12.005. [DOI] [PubMed] [Google Scholar]
  • 8.D. Case, I. Ben-Shalom, S. Brozell, D. Cerutti, T. Cheatham, V. Cruzeiro, III, T. Darden, R. Duke, D. Ghoreishi, G. Giambasu, T. Giese, M. Gilson, H. Gohlke, A. Goetz, D. Greene, R. Harris, N. Homeyer, Y. Huang, S. Izadi, A. Kovalenko, R. Krasny, T. Kurtzman, T. Lee, S. LeGrand, P. Li, C. Lin, J. Liu, T. Luchko, R. Luo, V. Man, D. Mermelstein, K. Merz, Y. Miao, G. Monard, C. Nguyen, H. Nguyen, A. Onufriev, F. Pan, R. Qi, D. Roe, A. Roitberg, C. Sagui, S. Schott-Verdugo, J. Shen, C. Simmerling, J. Smith, J. Swails, R. Walker, J. Wang, H. Wei, L. Wilson, R. Wolf, X. Wu, L. Xiao, Y. Xiong, D. York, P. Kollman, Amber 2019, 2019. University of California, San Francisco.
  • 9.M. Frisch, G. Trucks, H. Schlegel, G. Scuseria, M. Robb, J. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. Hratchian, A. Izmaylov, J. Bloino, G. Zheng, J. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, J. Montgomery, J. Peralta, F. Ogliaro, M. Bearpark, J. Heyd, E. Brothers, K. Kudin, V. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. Burant, S. Iyengar, J. Tomasi, M. Cossi, N. Rega, J. Millam, M. Klene, J. Knox, J. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. Stratmann, O. Yazyev, A. Austin, R. Cammi, C. Pomelli, J. Ochterski, R. Martin, K. Morokuma, V. Zakrzewski, G. Voth, P. Salvador, J. Dannenberg, S. Dapprich, A. Daniels, O. Farkas, J. Foresman, J. Ortiz, J. Cioslowski, D. Fox, Gaussian 09 Revision E.01, 2009. Gaussian Inc. Wallingford CT.
  • 10.Bayly C., Cieplak P., Cornell W., Kollman P. A well-behaved electrostatic potential based method using charge restraints for deriving atomic charges: the RESP model. J. Phys. Chem. 1993;97:10269–10280. [Google Scholar]
  • 11.Cornell W., Cieplak P., Bayly C., Kollman P. Application of RESP charges to calculate conformational energies, hydrogen bond energies, and free energies of solvation. J. Phys. Chem. 1993;115:9620–9631. [Google Scholar]
  • 12.Silva A.S.d., Vranken W. ACPYPE – antechamber PYthon parser interface. BMC Res. Notes. 2012;5:367. doi: 10.1186/1756-0500-5-367. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Torsello M., Pimenta A., Wolters L., Moreira I., Orian L., Polimeno A. General amber force field parameters for diphenyl diselenides and diphenyl ditellurides. J. Phys. Chem. A. 2016;120:4389–4400. doi: 10.1021/acs.jpca.6b02250. [DOI] [PubMed] [Google Scholar]
  • 14.Fellowes T., White J. Simulating chalcogen bonding using molecular mechanics: a pseudoatom approach to model ebselen. ChemRxiv. 2020 doi: 10.26434/chemrxiv.12345434.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Loschwitz J., Jäckering A., Keutmann M., Olagunju M., Eberle R.J., Coronado M.A., Olubiyi O.O., Strodel B. Novel inhibitors of the main protease of SARS-cov-2 identified via a molecular dynamics simulation-guided in vitro assay. ChemRxiv. 2020 doi: 10.26434/chemrxiv.13200281.v1. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Abraham M., Murtola T., Schulz R., Páll S., Smith J., Hess B., Lindahl E. GROMACS: high performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX. 2015;1–2:19–25. [Google Scholar]
  • 17.L. Schrödinger, The PyMOL molecular graphics system, version 1.8, 2015.
  • 18.Maier J.A., Martinez C., Kasavajhala K., Wickstrom L., Hauser K.E., Simmerling C. ff14sb: improving the accuracy of protein side chain and backbone parameters from ff99sb. J. Chem. Theory Comput. 2015;11:3696–3713. doi: 10.1021/acs.jctc.5b00255. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Jin Z., Du X., Xu Y., Deng Y., Liu M., Zhao Y., Zhang B., Li X., Zhang L., Peng C., Duan Y., Yu J., Wang L., Yang K., Liu F., Jiang R., Yang X., You T., Liu X., Yang X., Bai F., Liu H., Liu X., Guddat L.W., Xu W., Xiao G., Qin C., Shi Z., Jiang H., Rao Z., Yang H. Structure of Mpro from COVID-19 virus and discovery of its inhibitors. Nature. 2020 doi: 10.1038/s41586-020-2223-y. [DOI] [PubMed] [Google Scholar]
  • 20.Zhang L., Lin D., Kusov Y., Nian Y., Ma Q., Wang J., von Brunn A., Leyssen P., Lanko K., Neyts J., de Wilde A., Snijder E.J., Liu H., Hilgenfeld R. α-ketoamides as broad-spectrum inhibitors of coronavirus and enterovirus replication: structure-based design, synthesis, and activity assessment. J. Med. Chem. 2020;63:4562–4578. doi: 10.1021/acs.jmedchem.9b01828. [DOI] [PubMed] [Google Scholar]
  • 21.Krause D., Thörnig P. JURECA: modular supercomputer at Jülich supercomputing centre. JLSRF. 2018;4:A132. [Google Scholar]

Articles from Data in Brief are provided here courtesy of Elsevier

RESOURCES