Abstract
Preparing molecular coordinate files for molecular dynamics (MD) simulations can be a very time-consuming process. Herein we present the development of a user-friendly program that drastically reduces the time required to prepare these molecular coordinate files for MD software packages such as AmberTools. Our program, known as charge atomtype naming (CAN), creates and uses a library of structures such as amino acid monomers to update the charge, atom type, and name of atoms in any molecular structure (mol2) file. We demonstrate the utility of this new program by rapidly preparing structural files for MD simulations for polypeptides ranging from small molecules to large protein structures. Both native and non-native amino acid residues are easily handled by this new program.
Keywords: Amber, AmberTools, Avogadro, Leap, mol2 file, molecular coordinate file, molecular dynamics simulations, molecular structures, polypeptide, python
Graphical Abstract

1 |. INTRODUCTION
In recent years, molecular dynamics (MD) calculations have become increasingly useful aids for laboratory research efforts in chemistry and biology.1–7 MD simulations can aid in structure determination of polypeptides, polysaccharides, RNA, DNA, and biochemical membranes structures.8–11 Various software packages are routinely used to perform MD simulations, including Amber,9 CHARMM,12 GROMACS,13 LAMMPS,14 NAMD,15 and OpenMM.16 However, many of these tools and software packages can be challenging to navigate and require a significant investment of time to generate and validate meaningful results. Recent reports have sought to simplify the process of running MD simulations by developing software pieces that ensure researchers can obtain accurate and realistic insights into biomolecule structures. Such software packages that interface with the full MD packages include MDMS,17 CHARMM-GUI,18 QwikMD,19 and the Kepler Workflow for Reproducible AMBER GPU Molecular Dynamics. However, these programs are limited to the AmberTools native library and do not allow for expanding to non-native libraries.20
While performing MD experiments with native residues in many programs has become routine, the introduction of unique or custom residues requires a significant amount of time to prepare molecular coordinate files.21–24 Often times manual editing of molecular coordinate files is required because non-native (nonnatural) residues or molecules are needed that are not included in the software package being used. This manual editing process can often be the bottleneck for performing MD simulations, as it can require significant direct user input that often causes errors in the final work. Automating the process for preparing correct molecular coordinate files (mol2 files) for MD simulations, including correcting the atom types, charges, and names of atoms in a mol2 file would eliminate user error and accelerate the process for preparing structures for MD simulations.
The AmberTools MD software package requires the user to build molecules using LEap, a subprogram of AmberTools.21 LEap can generate a mol2 file that contains the needed information to build parameter files. However, using residues or molecules outside of AmberTools library results in erroneous molecular coordinate files in the absence of extensive manual editing. Additionally, LEap as a molecular builder or editor is not user-friendly and obtaining the desired spatial arrangement of molecules within the program can be challenging.
As alternatives to LEap, molecular builder programs such as Avagadro,25,26 Chem3D (from ChemDraw), Chimera, and Spartan are much more user-friendly and provide a better spatial representation of the molecule of interest. However, using third party software like Avagadro and integrating the generated molecular coordinate files into LEap to read and build the parameter files requires extensive manual editing. A productive solution to this problem would build a bridge between a modeling program such as Avogadro and AmberTools by embedding forcefield properties into the molecular coordinate file (mol2 file) and then inputting these files directly into LEap to build parameter files for MD simulations.
Our group has created an automated process for preparing molecular coordinate files for MD simulations that incorporates information about atom charges, atom types, and atom names from a molecular coordinate file such as a mol2 file. Other file types, such as pdb files, can also be adopted into our program with minimal editing of the main program. Our new core program was written in python, a widely used programming language in the scientific community, allowing for custom use applications.27 Our results provide a program and method, which we refer to herein as the charge atomtype naming (CAN) program, that will allow any molecular 3D structure to be embedded with atom charges, atom typing, and atom names from a source library file created with the same program. While the examples shown here are specifically for polypeptide and protein structures, the same principles can be used to create molecular coordinate files for DNA, RNA, polysaccharides, or other large biostructures for MD simulations. The structural files prepared through CAN are readily imported into Ambertools, as shown here, but can also be used in other MD software packages.
2 |. METHODS
To automate the process of charging, atom typing, and atom naming using our CAN program, the user needs to first build their correct library files for each residue of interest (Figure 1, Step 1). For example, a library file can be created for each non-native residue being used in a polypeptide. This can be done using the library script (can lib), which is a part of the CAN program. To build these library residue files, a correct residue mol2 file must be generated first. A correct mol2 file can be generated via a few different means. For native residues, LEap can be used to save mol2 files of any residue that AmberTools has in its library. These mol2 files will contain correct forcefield properties to build parameter files. Non-native residues can be built in a molecular builder like Avogadro and then run through Antechamber, an AmberTools subprogram, that will give correct names, atom typing (either GAFF or GAFF2), and charges (such as AM1/BCC charges).22 Charges can also be developed using RED, an automated processing program that will calculate RESP Charges.1 Alternatively, the integrated Antechamber and Gaussian program can be used to build these charges.1 Once individual corrected mol2 files of both native and non-native residues are built, users should separate them into different folders labeled “native” and “gaff2/gaff” or mark which forcefield each residue will be using. This allows for the program to search certain folders for residue library files depending on which residues and forcefields are being used.
FIGURE 1.

Charge atomtype naming (CAN) flow chart for file preparation. Step 1 involves the generation of CAN library files and correct mol2 files for individual residues. Step 2 is the workflow for preparing molecular coordinate files for polypeptide or polymeric structures
Next, the can lib script is run to build residue library files from already embedded mol2 files. Once the residue library files are created, they can then be saved for later reference using the CAN program. This workflow is shown in Figure 1, Step 1.
To create a polypeptide or other linked structure, a user would use a molecular builder like Avogadro to build their polypeptide or download it from a source like the Protein Data Bank (PDB) database and convert any necessary files into the mol2 format (Figure 1, Step 2). These mol2 files from sources such as the PDB must be first cleaned up by removing solvent, ligands, and other molecules present to were only the peptide is shown. Next, the residue names in the mol2 file must be updated to match the corresponding residue name from their CAN library. The CAN script updates the mol2 file based on residue name, connectivity, and residue number. Therefore, the correct residue name must match the residue name the program is referencing from the CAN library created from can lib. Once CAN is run on the mol2 file, it will be updated with the correct information saved in the CAN library residue files. This updated mol2 file then can be inputted into LEap to read and to generate the appropriate parameter files for MD experiments. An overview of the entire process is shown in the flow chart in Figure 1.
CAN library database files are setup into five columns (Figure 2). The first column is the atom number, which is simply an arbitrary atom numbering system. The second column is the atom name, and the third column is the unique longhand name for each atom in a residue. This longhand name is generated from can lib and is simply the connectivity of each atom in the residue. The longhand name merely states how each atom is connected to the whole residue. An example of this unique longhand name for a non-native amino acid residue, 2-aminoisobutyric acid (AIB), is shown in Figure 2.
FIGURE 2.

Charge atomtype naming (CAN) library file format and representation of a longhand naming system for identifying unique atoms in a peptide residue or molecule
The fourth column in Figure 2 is the atom-type, which gives the identity of forcefield parameters to be used. The fifth column is the partial charge for each atom. Once the library files are created, they can be changed by hand and updated if needed, or new library files can be created by can lib to replace the old ones.
The CAN program works by referencing the residues in the CAN library files for each residue in the polypeptide file (or targeted molecule file) and updates that target file with CAN library residue information. Figure 3 (top) shows a raw mol2 file created from Avogadro for a single residue (AIB, fourth residue) of a long polypeptide. Figure 3 (bottom) shows the mol2 file of the same fourth residue after it has been updated by the CAN program. This resulting mol2 file can now be directly import into LEap to generate parameter files for Amber MD simulations. The CAN program can be used in a flexible manner by updating all attributes or updating only the desired attributes. Thus, CAN provides flexibility to mix and match changes made to the mol2 file to best suit the user's needs.
FIGURE 3.

Top: Molecular coordinate file for a single residue (residue 4, 2-aminoisobutyric acid [AIB]) of a polypeptide before running charge atomtype naming (CAN). Bottom: The same file after running CAN
Our new CAN method for preparing molecular coordinate files for MD simulations provides significant advantages over current methods. First, it is faster than manually inputting each atom and updating data for each atom name, atom type, and charge, especially for large polypeptides and proteins. Depending on file sizes and quantities, CAN takes anywhere from a few minutes to a few hours to run. Another feature of CAN is that it does not apply any geometry checking or adjust any atom coordinates. This allows users to begin a MD experiment in any molecular geometry they desire based on how the molecule was prepared in the molecular structure builder (like Avogadro) or based on how it was downloaded from a database. With our simple CAN method in hand, any user can take a new structure that needs corrected names, atom types, and charges and update them from their CAN library within minutes. This simple CAN program reduces preparation work for running MD simulations from hours of manual work to just a few minutes of automatic processing.
3 |. RESULTS
To test our CAN program, we first built several non-native residues and created mol2 files using Avogadro. These non-native residues include commonly used amino acids like aminoisobutyric acid (Aib) and very unique residues such as thiourea and enamine catalytic groups that we previously used in the development of peptide-based catalysts.28 We also included a stapled polypeptide containing a non-native squaric acid staple (WW-domain) previously developed by our group (see Supporting Information for detailed structures).29 We next ran each non-native residue through antechamber to add the correct residue attributes like charge, name, and atom typing to each residue mol2 file. We made sure to run these with overlapping atoms in each residue and then removed the overlapping atoms after attributes are added. Native amino acids were exported from LEap and saved as mol2 files. These files were then used to generate CAN library files for each residue using the can lib script. Once library files were generated, we then selected various polypeptide and protein structures that use these native and non-native residues (Figure 4). Our test polypeptides include polypeptides with native and non-native residues (peptide catalyst and ww-domain) and large proteins with multiple domains that contain up to 6000+ residues. Using the CAN program, we updated the mol2 files that were obtained either by building the structure in Avogadro or by downloading the mol2 file from the PDB database. The CAN program then successfully updated the mol2 file attributes including charge, atom type, and name (See Supporting Information for details).
FIGURE 4.

Polypeptides and proteins run through the charge atomtype naming (CAN) program. Specific sequences and residue identities can be found in the Supporting Information
For each of the polypeptides/proteins seen in Figure 4, the original mol2 files and those generated after being run through CAN are included in the Supporting Information. This test of our CAN program includes polypeptides ranging from 11 residues to 6000+ residues that include both non-native and native residues. After each polypeptide was run through CAN, we built the parameter files and tested each peptide in AmberTools MD simulation for 0.5 ns at room temperature. For these simulations, an OPC water box was used, shake was turned on, an NPT type run was performed, and the simulation was run in a periodic boundary condition. Results of these runs are found in Supporting Information. Importantly, MD simulations using AmberTools were successful in each case based on the mol2 file built using our CAN program. This demonstrates the ability of CAN to enable MD simulations with both small and large polypeptides.
4 |. CONCLUSIONS
Our group has created a simple to use program known as CAN to change the manual process of preparing molecular coordinate files for MD simulations. This new program correctly atom types, atom names, and charges individual atoms in a molecular coordinate file (mol2, pdb, etc.) for import into MD simulation programs like AmberTools. Our program greatly streamlines the process for preparing such files, eliminating the need for tedious manual editing of mol2 files. CAN is a powerful toolset for all MD uses and can be easily edited to add different file types and embeddable attributes. CAN is written so that the user does not need to know how to program in python to apply this new program and accelerate research in MD simulations. Our free opensource CAN program can be found on sourceforge.net at https://sourceforge.net/projects/canmd/ as a downloadable python program.
Supplementary Material
ACKNOWLEDGMENTS
We acknowledge financial support from the National Institutes of Health (R15-GM134476). We also thank Brigham Young University and the Office of Research Computing, especially the Fulton Supercomputing Lab.
Funding information
Brigham Young University; National Institutes of Health, Grant/Award Number: R15-GM134476
Footnotes
SUPPORTING INFORMATION
Additional supporting information may be found in the online version of the article at the publisher's website.
REFERENCES
- [1].Perilla JR, Goh BC, Cassidy CK, Liu B, Bernardi RC, Rudack T, Yu H, Wu Z, Schulten K, Curr. Opin. Struct. Biol. 2015, 31, 64. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [2].Bernardi RC, Melo MCR, Schulten K, Biochim. Biophys. Acta 2015, 1850, 872. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Ponder JW, Case DA, Adv. Protein Chem. 2003, 66, 27. [DOI] [PubMed] [Google Scholar]
- [4].Maier JA, Martinez C, Kasavajhala K, Wickstrom L, Hauser KE, Simmerling C, J. Chem. Theory Comput. 2015, 11, 3696. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [5].Case DA, Cheatham III TE., Darden T, Gohlke H, Luo R, Merz KM Jr., Onufriev A, Simmerling C, Wang B, Woods R, J. Comput. Chem. 2005, 26, 1668. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Pearlman DA, Case DA, Caldwell JW, Ross WS, Cheatham III TE., DeBolt S, Ferguson D, Seibel G, Kollman P, Comput. Phys. Commun. 1995, 91, 1. [Google Scholar]
- [7].Graf J, Nguyen PH, Stock G, Schwalbe H, J. Am. Chem. Soc. 2007, 129, 1179. [DOI] [PubMed] [Google Scholar]
- [8].Cheatham III TE., Case DA, Biopolymers 2013, 99, 969. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Salomon-Ferrer R, Case DA, Walker RC, WIREs Comput. Mol. Sci. 2013, 3, 198. [Google Scholar]
- [10].Harvey S, McCammon JA, Dynamics of Proteins and Nucleic Acids, Cambridge University Press, Cambridge: 1987. [Google Scholar]
- [11].Leach AR, Molecular Modelling. Principles and Applications, 2nd ed., Prentice-Hall, Harlow, England: 2001. [Google Scholar]
- [12].Brooks BR, Brooks CL, Mackerell AD, Nilsson L, Petrella RJ, Roux B, Won Y, Archontis G, Bartels C, Boresch S, Caflisch A, Caves L, Cui Q, Dinner AR, Feig M, Fischer S, Gao J, Hodoscek M, Im W, Kuczera K, Lazaridis T, Ma J, Ovchinnikov V, Paci E, Pastor RW, Post CB, Pu JZ, Schaefer M, Tidor B, Venable RM, Woodcock HL, Wu X, Yang W, York DM, Karplus M, J. Comput. Chem. 2009, 30, 1545. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [13].Abraham MJ, Murtola T, Schulz R, Páll S, Smith JC, Hess B, Lindahl E, SoftwareX 2015, 1–2, 19. [Google Scholar]
- [14].Plimpton S, J. Comput. Phys. 1995, 117, 1. [Google Scholar]
- [15].Phillips JC, Braun R, Wang W, Gumbart J, Tajkhorshid E, Villa E, Chipot C, Skeel RD, Kalé L, Schulten K, J. Comput. Chem. 2005, 26, 1781. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [16].Eastman P, Swails J, Chodera JD, McGibbon RT, Zhao Y, Beauchamp KA, Wang L-P, Simmonett AC, Harrigan MP, Stern CD, Wiewiora RP, Brooks BR, Pande VS, PLoS Comput. Biol. 2017, 13, e1005659. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Żaczek S, J. Comput. Chem. 2020, 41, 266. [DOI] [PubMed] [Google Scholar]
- [18].Jo S, Kim T, Iyer VG, Im W, J. Comput. Chem. 2008, 29, 1859. [DOI] [PubMed] [Google Scholar]
- [19].Ribeiro JV, Bernardi RC, Rudack T, Stone JE, Phillips JC, Freddolino PL, Schulten K, Sci. Rep. 2016, 6, 26563. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Purawat S, Ieong PU, Malmstrom RD, Chan GJ, Yeung AK, Walker RC, Altintas I, Amaro RE, Biophys. J. 2017, 112, 2469. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Case DA, Belfon K, Ben-Shalom IY, Brozell SR, Cerutti DS, Cheatham III TE., Cruzeiro VWD, Darden TA, Duke RE, Giambasu G, Gilson MK, Gohlke H, Goetz AW, Harris R, Izadi S, Izmailov SA, Kasavajhala K, Kovalenko A, Krasny R, Kurtzman T, Lee TS, LeGrand S, Li P, Lin C, Liu J, Luchko T, Luo R, Man V, Merz KM, Miao Y, Mikhailovskii O, Monard G, Nguyen H, Onufriev A, Pan F, Pantano S, Qi R, Roe DR, Roitberg A, Sagui C, Schott-Verdugo S, Shen J, Simmerling CL, Skrynnikov NR, Smith J, Swails J, Walker RC, Wang J, Wilson L, Wolf RM, Wu X, Xiong Y, Xue Y, York DM, Kollman PA, AMBER 2020, University of California, San Francisco: 2020. [Google Scholar]
- [22].Hornak V, Abel R, Okur A, Strockbine B, Roitberg A, Simmerling C, Proteins 2006, 65, 712. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Aduri R, Psciuk BT, Saro P, Taniga H, Schlegel HB, SantaLucia J Jr., J. Chem. Theory Comput. 2007, 3, 1465. [DOI] [PubMed] [Google Scholar]
- [24].Tian C, Kasavajhala K, Belfon KAA, Raguette L, Huang H, Migues AN, Bickel J, Wang Y, Pincay J, Wu Q, Simmerling C, Chem J. Theory Comput. 2020, 16, 528. [DOI] [PubMed] [Google Scholar]
- [25].Hanwell MD, Curtis DE, Lonie DC, Vandermeersch T, Zurek E, Hutchison GR, J. ChemInform. 2012, 4, 17. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [26].Avogadro: an open-source molecular builder and visualization tool. Version 1.2.0. http://avogadro.cc/
- [27].van Rossum G, Python tutorial, Technical Report CS-R9526, Centrum voor Wiskunde en Informatica (CWI), Amsterdam, The Netherlands; 1995. [Google Scholar]
- [28].Kinghorn MJ, Valdivia-Berroeta GA, Chantry DR, Smith MS, Ence CC, Draper SRE, Duval JS, Masino BM, Cahoon SB, Flansburg RR, Conder CJ, Price JL, Michaelis DJ, ACS Catal. 2017, 7, 7704. [Google Scholar]
- [29].Kinghorn MJ, Wayment AX, Lofgreen GQ, Nielsen PM, Smith SL, Xiao Q, Tretbar JW, Porter MN, Parkman JA, Rodriguez Moreno M, Nygaard JML, Jacobsen SC, Price JL, Michaelis DJ, ACS Chem. Biol. 2021, 16(5), 806.33847484 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
