IMPACT‐4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections

Carson Farmer; Hector Medina

doi:10.1002/jcc.70106

. 2025 Apr 18;46(11):e70106. doi: 10.1002/jcc.70106

IMPACT‐4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections

Carson Farmer ¹, Hector Medina ^1,^✉

PMCID: PMC12008713 PMID: 40251873

ABSTRACT

Collision cross section (CCS) values can enhance the identification and classification of molecular contaminants such as per‐ and polyfluororoalkyl substances (PFAS). However, the computational burden required for large molecules, combined with the increasing number of potential PFAS candidates, can render existing methods incapable of providing sufficiently accurate results in a timely manner. Furthermore, machine learning methods struggle to generalize when the (de)protonated structure undergoes structural changes that are not common in the training dataset. In this study, we introduce IMPACT4‐CCS (Integrated Modeling and Prediction using Ab initio and Trained potentials for Collision Cross Section), a novel computational workflow ensemble that comprises ab initio with machine learning tasks to accelerate accurate prediction of CCS for PFAS molecules. IMPACT‐4CCS achieves comparable accuracy to current machine learning approaches, as validated using a test set of 100 molecules. Furthermore, IMPACT‐4CCS exhibits better accuracy when implemented on some specific emerging PFAS subclasses, such as the nH‐perfluoroalkyl carboxylic acids (nH‐PFCA) family, for which other methods overestimate their CCS values. As far as the authors know, IMPACT‐4CCS is the only existing method capable of capturing structural dynamics (i.e., hydrogen bridging) present in some large and flexible PFAS molecules. Our work demonstrates that the careful use of machine learning to accelerate traditional methods is likely to be more accurate than relying purely on machine learning on molecular graphs. Future (or recommended) work includes assessing the usefulness of IMPACT‐4CCS for extending nontarget analysis to larger PFAS datasets such as the OECD (Organization for Economic Co‐operation and Development) PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.

Keywords: collision cross section, machine learning, mass spectrometry, PFAS

For CCS calculations, the molecule is converted from a graph (a) to a structure (b), and the structure is then formed into the adduct (c), by typically deprotonation or protonation. Next, a series of conformers for the adduct is generated (d). Lastly, the trajectory method (e) calculates the CCS in this work. IMPACT‐4CCS modifies the workflow by incorporating machine‐learned potentials to accelerate the structure generation in (b) and energy and partial charge calculations in (d).

graphic file with name JCC-46-0-g001.jpg

Abbreviations

CASRN: chemical abstracts service registry numbers
CCS: collision cross section
CID: PubChem compound identifier
DFT: density functional theory
ESI: electrospray ionization
ML: machine learning
nH‐PFCA: nH‐perfluoroalkyl carboxylic acids
OECD: organisation for economic co‐operation and development
PE: percent error
PFAS: per‐ and polyfluoroalkyl substances

1. Introduction

Collision cross sections (CCS) can be very useful in the differentiation, classification, and identification of molecules for various applications, including the analysis of molecular contaminants such as per‐ and polyfluoroalkyl substances (PFAS) [1, 2, 3, 4]. Particularly for PFAS, the diversity of chemicals within the classification poses challenges for nontargeted analysis when confronted with potentially hundreds of (or perhaps many more) chemicals in a single environmental (e.g., water, soil, air) or human (e.g., blood, serum, or urine) matrix [5]. PFAS is a group of concerning contaminants whose regulations have recently been started [6]; but much more work is needed to understand their behavior and completely remove them from the environment [7]. In particular, for suspected and nontargeted analysis, the identification of potential PFAS in complex matrices (e.g., contaminated water matrices) [6] could be enhanced by expanding databases with accurately predicted CCS values. Furthermore, since PFAS can often exist as structural isomers [8], CCS information can help distinguish these isomers based on their three‐dimensional structure and shape, which is difficult using m/z information alone [9].

Modern CCS prediction methods for small molecules utilize support vector machines (CCSbase [10]), graph neural networks (AllCCS [11] and SigmaCCS [12]), and molecular descriptors (LipidCCS [13]). At the core of any CCS machine learning model, the initial structure is paired with a specified adduct type to form a representation for some type of machine learning to correlate the representation with the CCS. Since highly fluorinated compounds have been studied less for CCS values, predicting on fluorinated compounds (> 6 fluorines) has been shown to yield potentially poor results despite being in the m/z range that the model was trained on [14].

Recent advances in machine learning potentials such as AimNet2 [15], MACE [16], and Neuqip [17] have significantly decreased the computational burden for energy calculations at near ab initio accuracy. Specifically, the recent AimNet2 model allowed for ωB97M‐D3/def2‐TZVPP level energies to be predicted. Furthermore, the neural charge equilibration (NQE) introduced in [18] produced a method capable of accurately predicting the energies of arbitrarily charged molecules.

As the family of PFAS molecules for consideration continues to increase to more than 7 million compounds based on the application of the Organisation for Economic Co‐operation and Development (OECD) list to the PubChem database [19], the diversity and need for accurate methods that account for CCS prediction methods which are able to account for the highly fluorinated structures is required. With new tools being made available to aid nontarget analysis, such as the recent PubChem Lite [20], the methods for CCS prediction need to be assessed for accuracy to a wide family of PFAS molecules. In the recent work of Jobst et al. [21], the structural changes that nH‐perfluoroalkyl carboxylic acids (nH‐PFCA) undergo in deprotonation causes current machine‐learning (ML) approaches to overestimate the CCS value.

In this work, we assemble a customized hybrid algorithm combining ML‐trained with ab initio potentials to improve the prediction of CCS, especially for large and flexible PFAS molecules that exhibit structural changes such as hydrogen bridging. First, we assess the applicability of traditional ab initio CCS prediction workflows on a subset of PFAS molecules. Then, the approach is updated to use machine learning potentials for geometry optimization, conformer filtering, and partial charge calculation. The method is benchmarked against SigmaCCS, AllCCS, and CCSbase. The methods are further compared on the nH‐PFCA family to demonstrate the applicability of the ML potential for families of molecules that undergo significant structural changes when deprotonated.

2. Dataset and Methods

2.1. Data

The “PFAS LC‐IMS‐MS Library” is used from Baker Lab. ¹ The negative mode electrospray ionization (ESI) results are filtered from the dataset. Additionally, any molecules that are noted to form complexes in the drift tube are removed from the dataset. All duplicate entries were removed from the list. Duplicates are removed since the experimental errors between the replicates are often less than 1%. The [M‐H]‐ adducts were considered for analysis. For additional data on the nH‐PFCA family of PFAS, CCS values from [21] are included in the dataset. A limited subset of the data was used for verification of the workflow using the full ab initio method; the Chemical Abstracts Service Registry Numbers (CASRN) and PubChem compound identifiers (CIDs) are included in Table A.1.

2.2. Methods

For ab initio CCS predictions, structures are generated with ETKDGv3 [22]. Next, the structure is optimized with a multistage optimization at first MMFF and then GFN2‐xTB (semi‐empirical tight binding). GFN2‐xTB is an accurate semi‐empirical method for small molecule optimization. The adducts were formed using the (de)protonation workflows available in CREST [23]. CREST was chosen based on other works demonstrating the effectiveness of the method for sampling small molecule conformational space [24]. The resulting adducts were used for the final structure optimization and energy calculations (see Figure 1a). Adduct optimization and partial charge calculation are carried out with B3LYP/6‐31G* density functional theory (DFT). B3LYP/6‐31G* was chosen based on other studies demonstrating the applicability of the method for estimating CCS values [25, 26]. The settings for each task in the workflow are included in Table A.2.

(a) The workflow for performing ab initio‐based CCS calculations. (b) The workflow for using machine learning accelerated CCS calculations. The settings for the ab initio and IMPACT‐4CCS methods are provided in Table A.2 and Table A.3, respectively. (a) ab initio (b) IMPACT‐4CCS.

The CCS values are predicted according to the workflow in Figure 1b. (Note that the settings for the ab initio and IMPACT‐4CCS methods are provided in Table A.2 and Table A.3, respectively.) First, the SMILES structure is queried from PubChem. The SMILES string is then converted to an initial 3D structure using ETKDGv3 in rdKit 2024.09.1 [22]. The structure is then optimized using AimNet2 [15] for energy calculations and the FIRE algorithm in ASE 3.23.0 [27] for optimization. AimNet was chosen based on their recent publication showing results comparable to ωB97M‐D3/def2‐TZVPP. Next, conformers are generated using Auto3D [28]. Auto3D is designed using the ANI models [29, 30] and the AimNet models. CREST is not compatible with the AimNet2 potential. The energy window is set to 40 kcal/mol to provide a large sampling of potential conformers. Next, each conformer is optimized with AimNet2, and partial charges are calculated using the neural charge equilibrium model in AimNet2. The CCS values for each conformer are calculated using MOBCAL‐SHM [31]. MOBCAL‐SHM is selected for CCS predictions based on the recent work of Colby et al. for predicting CCS values with ab initio approaches [31]. The Boltzmann‐weighted average of the CCS values for the conformer ensemble is ultimately calculated. For SigmaCCS [11], the pretrained model was used as provided in the original work. For AllCCS [11] and CCSbase [10], SMILES strings were submitted to the online servers to obtain the CCS predictions.

To measure the accuracy of the different methods, the percent error (PE) is calculated for each compound. The mean and standard deviation are taken for the dataset. The PE is determined with

P E = \frac{| ŷ - y |}{y} \times 100

(1)

where $y$ is the experimental value and $ŷ$ is the predicted value.

3. Results

3.1. PFAS Results

First, an assessment of current ab initio methods for CCS prediction is undertaken for a subset of the complete dataset. The majority of the molecules in the limited test set are below 2% error Figure 2a. The use of fully ab initio methods resulted in a high computational cost for PFAS with chain lengths longer than 6 (equivalent to approximately $>$ 30 atoms).

(a) The percent error for the ab initio approach as applied to the subset of data selected for verification. (b) The parity between the experimental and predicted ab initio CCS values. For most of the compounds in the subset, the traditional workflow was able to accurately predict the CCS values for all of the compounds to within ≈2% error.

3.2. IMPACT‐4CCS Approach

In applying IMPACT‐4CCS to the dataset from Baker Lab, the predictions were within 5% error. Both SigmaCCS and AllCCS produced results, which were inconsistently accurate. Parity between the different approaches to experimental data is shown in Figure 3. Furthermore, the $m / z$ to percent error comparison is shown in Figure 4. The results are summarized in Table 1. Specifically, a certain outlier of 5H‐PFCA was the largest outlier from the ab initio study. Furthermore, CCSbase appears to have a lower overall error when compared to the IMPACT‐4CCS method. CCSBase exhibits inaccuracies for specific families of PFAS molecules, for example, nH‐PFCA (see Figure 5).

Parity between the different methods, (a) IMPACT‐4CCS, (b) CCSbase, (c) AllCCS, and (d) SigmaCCS, and the experimental dataset. Despite a tend to overpredict as the molecular weight increases, IMPACT‐4CCS maintains a consistent linearity when compared to the experimental data. The dashed line represents a linear trend‐line fit to the data, the solid line is parity to experimental data, the gray band is ±2% error, and the dotted line is ±5% error.

Percent error vs. experimental CCS for the different methods, (a) IMPACT‐4CCS, (b) CCSbase, (c) AllCCS, and (d) SigmaCCS, and the experimental dataset. Impact4CCS is the only method to have a systematic error with over‐predictions occurring at higher CCS values.

TABLE 1.

With the exception of the ab initio method, the methods were compared to the full dataset to measure the mean and standard deviation of the percent error.

Method	Mean PE (%)	Standard deviation of PE (%)
Ab initio	1.38707	0.913108
IMPACT‐4CCS	4.01664	2.39515
SigmaCCS	8.91411	5.47614
AllCCS	5.00116	5.43964
CCSbase	3.6744	4.04678

Open in a new tab

nH‐PFCA structure for the PFAS family that was isolated to benchmark the different methods.

3.3. nH‐PFCA

To assess the outlier from the ab initio workflow, the experimental value reported for 5H‐PFCA from Baker Lab was 143.0Å Inline graphic , while the predicted value was 124.6Å. To address this large disparity between the experimental results and the simulation, additional complexes of 5H‐PFCA with Na+ were examined. Additionally, the 8H‐PFCA dimer [2M‐H]‐ was added as a further check for the nH‐PFCA family. Unfortunately, the current iteration of AimNet2 is not parameterized for Na and was substituted with CREST at GFN‐FF/GFN2‐xTB (metadynamics/optimization) for the conformer generation portion and r²SCAN‐3c [32] for the geometry optimization and partial charge calculation. r²SCAN‐2c was selected to achieve a balance between simulation time relative to the number of atoms in the complexes when compared to B3LYP/6‐31G* in the ab initio workflow. The results for the complexes are shown in Table 2. Except for 5H‐PFCA [M‐H]‐, the PE was below 4%. The low PE seemingly indicates an issue with the reported experimental results. Therefore, in the analysis, 5H‐PFCA is not compared when generating the metrics for the different methods.

TABLE 2.

The additional complexes from the Baker Lab dataset were studied at the CREST (GFN‐FF/GFN2‐xTB) and r²SCAN‐3c levels of theory to assess the accuracy of the approach to 5H‐PFCA.

Molecule	Adduct	Experimental CCS [Å]	Predicted CCS [Å]	Percent error (%)
5H‐PFCA	M‐H	143.0	124.36	13.15
5H‐PFCA	2M‐2H+Na	178.4	175.61	1.56
5H‐PFCA	3M‐3H+2Na	217.4	223.30	2.715
5H‐PFCA	4M‐4H+3Na	256.1	260.48	1.710
5H‐PFCA	5M‐5H+4Na	286.2	296.78	3.695
8H‐PFCA	2M‐H	200.8	197.10	1.8

Open in a new tab

To further study the nH‐PFCA family, the CCS values from Jobst et al. [21] are studied for $n$ from 9 to 24 (see Figure 5 for the compound formula). They identified this family as challenging for machine learning methods due to the formation of hydrogen bridging between the head and tail of the compound, resulting in a cyclic structure. They further highlighted that AllCCS2 failed to accurately predict CCS values for the family. To assess our method in comparison to the other ML methods, the CCS values of $n$ from 2 to 24 are predicted to generate linear trends (see Figure 6). The mean PE and standard deviation of the PE for the different methods are in Table 3.

(a) As the chain length, $n$ , of the nH‐PFCA molecule increases, All of the purely machine learning methods tend to over‐predict when compared to IMPACT‐4CCS(b) The parity for the predictions to the experimental dataset.

TABLE 3.

The mean and standard deviation of the PE was evaluated for the different methods on nH‐PFCA data from Jobst et al. [21].

Method	Mean PE (%)	Standard deviation of PE (%)
IMPACT‐4CCS	4.69105	1.74586
SigmaCCS	18.6578	2.98039
AllCCS	16.6488	5.01661
CCSbase	11.844	1.74431

Open in a new tab

4. Discussion

From Figure 2, traditional approaches of using density functional theory, ab initio conformer searching, and trajectory method CCS predictions, the overall average percent error shows that the methods are applicable to the broad class of PFAS chemicals. However, as the chain length and, thereby, the overall flexibility of the PFAS molecules increase, the total number of conformers required to adequately estimate the CCS exponential increased. Based on [33], compounds having at least 8 rotatable bonds would require over 500 conformers to be generated, optimized, and have a CCS value predicted, with the most time‐consuming portions being the geometry optimizations. The molecular flexibility of larger molecules for conformer generation significantly bottlenecks the potential for using high‐throughput ab initio methods to create and maintain up‐to‐date CCS databases for the potentially > 7 million PFAS molecules recognized by the OECD in PubChem.

IMPACT‐4CCS overcomes the computational bottleneck by leveraging ML to serve as an energy predictor for the molecules rather than a method to predict the CCS values. The proposed method led to an error comparable with CCSbase with the advantage over CCSbase of a lower standard deviation. The consistency in overpredicting the CCS values, as shown in Figure 3a, may be the result of systematic errors that could be reduced in future works. When comparing to the other methods, the lower standard deviation (see Table 1) in the results indicates a more consistent performance of the method. When applied to new data outside of the original evaluation sets, the IMPACT‐4CCS approach is expected to perform more consistently compared to the other methods, leading to increased confidence in the results.

To highlight the consistency of the results, the nH‐PFCA family was examined to highlight the failing of the ML methods when compared to IMPACT‐4CCS. Since the ML methods fail to capture the hydrogen bridging that forms, they all overpredict with mean PEs of > 10% (see Table 3). IMPACT‐4CCS remains under 5%, which is suspected to the previously mentioned tendency to overpredict for larger PFAS molecules. This highlights the ability of IMPACT‐4CCS to capture the formation of the hydrogen bridge compared to the other methods.

The use of the ML for the interatomic potential improves the timing of the method in comparison to the ab initio workflow while being more accurate than purely ML approaches. Given that the workflow‐based methods can be parallelized after the conformer generation step, the computing resources available can significantly alter the time to compute the CCS for a given molecule. To provide a resource‐independent example for performance analysis, trifluoroacetic acid is selected for analysis. TFA contains zero rotatable bonds. Performing a conformer search in CREST yields a single conformer for TFA. Using a single conformer for both the ab initio and IMPACT‐4CCS workflows, the timings for a serial run of the workflow can be used to compare performance. The timings for the critical parts of the workflows are shown in Table 4. Since AllCCS and CCSbase were evaluated using the respective webservers, the timings for SigmaCCS are reported. All calculations were conducted on an Intel i9‐12900HX with an Nvidia RTX A3000. For this example, IMPACT‐4CCS is approximately an order of magnitude faster than the ab initio approach. However, as system size increases, DFT scales approximately to the number of electrons in the system, while AimNet2 scales linearly with the number of atoms. The performance increase reported for TFA increases with respect to the number of atoms in the molecule. Further, purely ML models are likely going to be 2–3 orders of magnitude faster than a workflow‐based approach. Additionally, as shown for the nH‐PFCA family, ML models may not extrapolate well to cases where unexpected structural changes occur in the adduct.

TABLE 4.

Comparison of computational times in evaluating the various steps of the two workflows (IMPACT‐4CCS and ab initio) and Sigma CCS (ML‐based model). In this comparison trifluoroacetic acid (CASRN: 76‐05‐1) was utilized.

Method	Step	Timing (s)
ab initio	Conformer generation	7.443
	Adduct geometry optimization	122.291
	Energy and partial charges	5.112
	Other steps	2.987
	Total	137.833
IMPACT‐4CCS	Conformer Generation	17.611
	Adduct geometry optimization	0.070
	Energy and partial charges	0.052
	Other steps	3.957
	Total	21.690
SigmaCCS	Total	0.269

Open in a new tab

For nontarget analysis for environmental contaminants, accurate and reliable CCS predictions are required. With IMPACT‐4CCS being the only method capable of capturing the hydrogen bridging that formed in nH‐PFCA, the method is recommended for extending nontarget analysis to larger PFAS datasets such as the OECD PFAS list in PubChem, which could be greater than 7 million molecules with diverse chemistry.

5. Conclusion

First, the use of traditional trajectory methods is applicable to the prediction of the CCS values of PFAS molecules. Using DFT methods for geometry optimization and partial charge calculation, the MOBCAL‐SHM program was used to predict CCS values for a subset of a PFAS dataset of CCS values. However, the computational burden increases with both the increasing size of PFAS molecules and the formation of complexes.

To reduce computational burden, the AimNet2 potential was used to optimize and predict partial charges for the structures. This was shown to produce accurate CCS values compared to modern ML approaches. For the nH‐PFCA family, the structural changes in the deprotonated structures challenged modern ML methods where the structure is expected to be linear rather than cyclic. However, the AimNet2 potential can optimize the new cyclic structure and accurately predict CCS values.

Despite not being explicitly developed for PFAS compounds, the AimNet2 potential was shown to accelerate CCS predictions and provide accurate estimates for the CCS values when compared to ML methods. With future developments in datasets for training ML potentials, the chemical space of applicable small molecule potentials can be improved. While ML approaches are capable of accurately predicting CCS values, the areas of chemical space where they breakdown and fail to accurately predict requires further investigation and consideration when being applied to nontargeted analysis. Furthermore, improvements to trajectory method approaches could yield improved results if the parameters are fit to larger databases of fluorinated compounds. In the absence of extensive data to train machine learning models for CCS, time‐consuming portions of the CCS prediction pipeline can be adapted to utilize machine learning to accelerate the consuming portions of the prediction pipeline.

Acknowledgments

The authors would like to acknowledge support from the School of Engineering at Liberty University for providing high‐performance computing resources for conducting the ab initio study.

Appendix A.

Ab Initio Molecules

TABLE A.1.

Subset of PFAS molecules used for ab initio study.

CASRN	cid	Experimental CCS [Å²]
76‐05‐1	6422	106.3
1493‐13‐6	62,406	109.6
422‐64‐0	62,356	115.1
354‐88‐1	10,219,841	117.4
355‐80‐6	9641	124.5
376‐72‐7	120,227	142.99
423‐41‐6	9,859,771	125.1
375‐73‐5	67,815	133.62
757124‐22‐4	20,734,543	150.39
34454‐99‐4	12,576,037	146.83
2706‐91‐4	75,922	142.2
1651215‐26‐7	87,556,140	173.33
39108‐34‐4	3,016,044	185.8
68259‐12‐1	86,998	177.24
73606‐19‐6	25,210,512	170.24

Open in a new tab

Workflow Settings

TABLE A.2.

Settings for ab initio CCS calculations.

Software

Parameter

Value

rdKit

Structure generation method

ETKDGv3

rdkit

Force field for MM‐optimization

MMFF

xTB & CREST

Semi‐emperical method

GFN2‐xTB

NWChem

DFT basis set

6‐31G*

NWChem

DFT exchange correlation functional

B3LYP

NWChem

Partial charge type

Mulliken

CREST

E_{w i n}

10 kcal mol⁻¹

CREST

Temperature

300 K

MOBCAL‐SHM

Number of random rotations per potential calculation (IPR)

1000

MOBCAL‐SHM

Number of complete cycles for average mobility calculation (ITN)

MOBCAL‐SHM

Number of points in velocity integration (INP)

MOBCAL‐SHM

Number of points in Monte Carlo integrations of impact parameter and orientation (IMP)

1024

MOBCAL‐SHM

CCS buffer gas

Nitrogen

MOBCAL‐SHM

CCS buffer gas mass

28.014 Da

MOBCAL‐SHM

Temperature

300 K

Open in a new tab

TABLE A.3.

Settings for the IMPACT‐4CCS calculation workflow.

Software	Parameter	Value
rdKit	Structure generation method	ETKDGv3
Auto3D	Energy window	40 kcal/mol
MOBCAL‐SHM	Temperature	300 K
MOBCAL‐SHM	CCS buffer gas	Nitrogen
MOBCAL‐SHM	CCS buffer gas mass	28.014 Da

Open in a new tab

Endnotes

^¹

https://doi.org/10.25345/C5XW4876Q.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

1. Díaz‐Galiano F. J., Murcia‐Morales M., Monteau F., Le Bizec B., and Dervilly G., “Collision Cross‐Section as a Universal Molecular Descriptor in the Analysis of Pfas and Use of Ion Mobility Spectrum Filtering for Improved Analytical Sensitivities,” Analytica Chimica Acta 1251 (2023): 341026. [DOI] [PubMed] [Google Scholar]
2. Hinnenkamp V., Balsaa P., and Schmidt T. C., “Target, Suspect and Non‐Target Screening Analysis From Wastewater Treatment Plant Effluents to Drinking Water Using Collision Cross Section Values as Additional Identification Criterion,” Analytical and Bioanalytical Chemistry 414 (2022): 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Asef C. K., Rainey M. A., Garcia B. M., et al., “Unknown Metabolite Identification Using Machine Learning Collision Cross‐Section Prediction and Tandem Mass Spectrometry,” Analytical Chemistry 95, no. 2 (2023): 1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Bijlsma L., Bade R., Celma A., et al., “Prediction of Collision Cross‐Section Values for Small Molecules: Application to Pesticide Residue Analysis,” Analytical Chemistry 89, no. 12 (2017): 6583–6589. [DOI] [PubMed] [Google Scholar]
5. Megson D., Niepsch D., Spencer J., et al., “Non‐Targeted Analysis Reveals Hundreds of Per‐and Polyfluoroalkyl Substances (PFAS) in Uk Freshwater in the Vicinity of a Fluorochemical Plant,” Chemosphere 367 (2024): 143645. [DOI] [PubMed] [Google Scholar]
6. Medina H. and Farmer C., “Current Challenges in Monitoring Low Contaminant Levels of Per‐ and Polyfluoroalkyl Substances in Water Matrices in the Field,” Toxics 12, no. 8 (2024): 610. [DOI] [PMC free article] [PubMed] [Google Scholar]
7. Ware A., Hess S., Gligor D., et al., “Identification of Plant Peroxidases Catalyzing the Degradation of Fluorinated Aromatics Using a Peroxidase Library Approach,” Engineering in Life Sciences 24, no. 11 (2024): e202400054. [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Benskin J. P., De Silva A. O., and Martin J. W., “Isomer Profiling of Perfluorinated Substances as a Tool for Source Tracking: A Review of Early Findings and Future Applications,” Reviews of Environmental Contamination and Toxicology 208: Perfluorinated alkylated substances (2010): 111–160. [DOI] [PubMed] [Google Scholar]
9. Guardian M. G., Antle J. P., Vexelman P. A., Aga D. S., and Simpson S. M., “Resolving Unknown Isomers of Emerging Per‐and Polyfluoroalkyl Substances (PFASs) in Environmental Samples Using COSMO‐RS‐Derived Retention Factor and Mass Fragmentation Patterns,” Journal of Hazardous Materials 402 (2021): 123478. [DOI] [PubMed] [Google Scholar]
10. Ross D. H., Cho J. H., and Xu L., “Breaking Down Structural Diversity for Comprehensive Prediction of Ion‐Neutral Collision Cross Sections,” Analytical Chemistry 92, no. 6 (2020): 4548–4557. [DOI] [PubMed] [Google Scholar]
11. Zhou Z., Luo M., Chen X., et al., “Ion Mobility Collision Cross‐Section Atlas for Known and Unknown Metabolite Annotation in Untargeted Metabolomics,” Nature Communications 11, no. 1 (2020): 4334. [DOI] [PMC free article] [PubMed] [Google Scholar]
12. Guo R., Zhang Y., Liao Y., et al., “Highly Accurate and Large‐Scale Collision Cross Sections Prediction With Graph Neural Networks,” Communications Chemistry 6, no. 1 (2023): 139. [DOI] [PMC free article] [PubMed] [Google Scholar]
13. Zhou Z., Tu J., Xiong X., Shen X., and Zhu Z.‐J., “Lipidccs: Prediction of Collision Cross‐Section Values for Lipids With High Precision to Support Ion Mobility–Mass Spectrometry‐Based Lipidomics,” Analytical Chemistry 89, no. 17 (2017): 9559–9566. [DOI] [PubMed] [Google Scholar]
14. de Cripan S. M., Arora T., Olomí A., Canela N., Siuzdak G., and Domingo‐Almenara X., “Predicting the Predicted: A Comparison of Machine Learning‐Based Collision Cross‐Section Prediction Models for Small Molecules,” Analytical Chemistry 96 (2024): 9088–9096. [DOI] [PMC free article] [PubMed] [Google Scholar]
15. Anstine D., Zubatyuk R., and Isayev O., “Aimnet2: A Neural Network Potential to Meet Your Neutral, Charged, Organic, and Elemental‐Organic Needs,” 2024.
16. Batatia I., Kovacs D. P., Simm G., Ortner C., and Csányi G., “Mace: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields,” Advances in Neural Information Processing Systems 35 (2022): 11423–11436. [Google Scholar]
17. Batzner S., Musaelian A., Sun L., et al., “E (3)‐Equivariant Graph Neural Networks for Data‐Efficient and Accurate Interatomic Potentials,” Nature Communications 13, no. 1 (2022): 2453. [DOI] [PMC free article] [PubMed] [Google Scholar]
18. Zubatyuk R., Smith J. S., Nebgen B. T., Tretiak S., and Isayev O., “Teaching a Neural Network to Attach and Detach Electrons From Molecules,” Nature Communications 12, no. 1 (2021): 4870. [DOI] [PMC free article] [PubMed] [Google Scholar]
19. Schymanski E. L., Zhang J., Thiessen P. A., Chirsir P., Kondic T., and Bolton E. E., “Per‐ and Polyfluoroalkyl Substances (PFAS) in Pubchem: 7 Million and Growing,” Environmental Science & Technology 57, no. 44 (2023): 16918–16928. [DOI] [PMC free article] [PubMed] [Google Scholar]
20. Elapavalore A., Ross D., Groues V., et al., “Pubchemlite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Non‐Target Environmental Data,” Environmental Science & Technology Letters 12 (2024): 166–174. [DOI] [PMC free article] [PubMed] [Google Scholar]
21. Jobst K. J., Penney C., and Burgers P. C., “Why Are Nh‐Perfluoroalkanoate Ions More Mobile Than Expected? Implications for Identifying an Emerging Environmental Pollutant,” Chemical Communications 60, no. 61 (2024): 7894–7897. [DOI] [PubMed] [Google Scholar]
22. Wang S., Witek J., Landrum G. A., and Riniker S., “Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional‐Angle Preferences,” Journal of Chemical Information and Modeling 60, no. 4 (2020): 2044–2058. [DOI] [PubMed] [Google Scholar]
23. Pracht P., Grimme S., Bannwarth C., et al., “Crest–a Program for the Exploration of Low‐Energy Molecular Chemical Space,” Journal of Chemical Physics 160, no. 11 (2024): 114110. [DOI] [PubMed] [Google Scholar]
24. Nielson F. F., Colby S. M., Thomas D. G., Renslow R. S., and Metz T. O., “Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions,” Analytical Chemistry 93, no. 8 (2021): 3830–3838. [DOI] [PMC free article] [PubMed] [Google Scholar]
25. Ieritano C. and Hopkins W. S., “Assessing Collision Cross Section Calculations Using Mobcal‐Mpi With a Variety of Commonly Used Computational Methods,” Materials Today Communications 27 (2021): 102226. [Google Scholar]
26. Ieritano C., Crouse J., Campbell J. L., and Hopkins W. S., “A Parallelized Molecular Collision Cross Section Package With Optimized Accuracy and Efficiency,” Analyst 144, no. 5 (2019): 1660–1670. [DOI] [PubMed] [Google Scholar]
27. Bitzek E., Koskinen P., Gähler F., Moseler M., and Gumbsch P., “Structural Relaxation Made Simple,” Physical Review Letters 97, no. 17 (2006): 170201. [DOI] [PubMed] [Google Scholar]
28. Liu Z., Zubatiuk T., Roitberg A., and Isayev O., “Auto3d: Automatic Generation of the Low‐Energy 3d Structures With Ani Neural Network Potentials,” Journal of Chemical Information and Modeling 62, no. 22 (2022): 5373–5382. [DOI] [PubMed] [Google Scholar]
29. Smith J. S., Isayev O., and Roitberg A. E., “Ani‐1: An Extensible Neural Network Potential With Dft Accuracy at Force Field Computational Cost,” Chemical Science 8, no. 4 (2017): 3192–3203. [DOI] [PMC free article] [PubMed] [Google Scholar]
30. Devereux C., Smith J. S., Huddleston K. K., et al., “Extending the Applicability of the Ani Deep Learning Molecular Potential to Sulfur and Halogens,” Journal of Chemical Theory and Computation 16, no. 7 (2020): 4192–4202. [DOI] [PubMed] [Google Scholar]
31. Colby S. M., Thomas D. G., Nuñez J. R., et al., “Isicle: A Quantum Chemistry Pipeline for Establishing In Silico Collision Cross Section Libraries,” Analytical Chemistry 91, no. 7 (2019): 4346–4356. [DOI] [PMC free article] [PubMed] [Google Scholar]
32. Grimme S., Hansen A., Ehlert S., and Mewes J.‐M., “R2scan‐3c: A Swiss Army Knife Composite Electronic‐Structure Method,” Journal of Chemical Physics 154, no. 6 (2021): 064103. [DOI] [PubMed] [Google Scholar]
33. Chan L., Morris G. M., and Hutchison G. R., “Understanding Conformational Entropy in Small Molecules,” Journal of Chemical Theory and Computation 17, no. 4 (2021): 2099–2106. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

[jcc70106-bib-0001] 1. Díaz‐Galiano F. J., Murcia‐Morales M., Monteau F., Le Bizec B., and Dervilly G., “Collision Cross‐Section as a Universal Molecular Descriptor in the Analysis of Pfas and Use of Ion Mobility Spectrum Filtering for Improved Analytical Sensitivities,” Analytica Chimica Acta 1251 (2023): 341026. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0002] 2. Hinnenkamp V., Balsaa P., and Schmidt T. C., “Target, Suspect and Non‐Target Screening Analysis From Wastewater Treatment Plant Effluents to Drinking Water Using Collision Cross Section Values as Additional Identification Criterion,” Analytical and Bioanalytical Chemistry 414 (2022): 1–14. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0003] 3. Asef C. K., Rainey M. A., Garcia B. M., et al., “Unknown Metabolite Identification Using Machine Learning Collision Cross‐Section Prediction and Tandem Mass Spectrometry,” Analytical Chemistry 95, no. 2 (2023): 1047–1056. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0004] 4. Bijlsma L., Bade R., Celma A., et al., “Prediction of Collision Cross‐Section Values for Small Molecules: Application to Pesticide Residue Analysis,” Analytical Chemistry 89, no. 12 (2017): 6583–6589. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0005] 5. Megson D., Niepsch D., Spencer J., et al., “Non‐Targeted Analysis Reveals Hundreds of Per‐and Polyfluoroalkyl Substances (PFAS) in Uk Freshwater in the Vicinity of a Fluorochemical Plant,” Chemosphere 367 (2024): 143645. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0006] 6. Medina H. and Farmer C., “Current Challenges in Monitoring Low Contaminant Levels of Per‐ and Polyfluoroalkyl Substances in Water Matrices in the Field,” Toxics 12, no. 8 (2024): 610. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0007] 7. Ware A., Hess S., Gligor D., et al., “Identification of Plant Peroxidases Catalyzing the Degradation of Fluorinated Aromatics Using a Peroxidase Library Approach,” Engineering in Life Sciences 24, no. 11 (2024): e202400054. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0008] 8. Benskin J. P., De Silva A. O., and Martin J. W., “Isomer Profiling of Perfluorinated Substances as a Tool for Source Tracking: A Review of Early Findings and Future Applications,” Reviews of Environmental Contamination and Toxicology 208: Perfluorinated alkylated substances (2010): 111–160. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0009] 9. Guardian M. G., Antle J. P., Vexelman P. A., Aga D. S., and Simpson S. M., “Resolving Unknown Isomers of Emerging Per‐and Polyfluoroalkyl Substances (PFASs) in Environmental Samples Using COSMO‐RS‐Derived Retention Factor and Mass Fragmentation Patterns,” Journal of Hazardous Materials 402 (2021): 123478. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0010] 10. Ross D. H., Cho J. H., and Xu L., “Breaking Down Structural Diversity for Comprehensive Prediction of Ion‐Neutral Collision Cross Sections,” Analytical Chemistry 92, no. 6 (2020): 4548–4557. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0011] 11. Zhou Z., Luo M., Chen X., et al., “Ion Mobility Collision Cross‐Section Atlas for Known and Unknown Metabolite Annotation in Untargeted Metabolomics,” Nature Communications 11, no. 1 (2020): 4334. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0012] 12. Guo R., Zhang Y., Liao Y., et al., “Highly Accurate and Large‐Scale Collision Cross Sections Prediction With Graph Neural Networks,” Communications Chemistry 6, no. 1 (2023): 139. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0013] 13. Zhou Z., Tu J., Xiong X., Shen X., and Zhu Z.‐J., “Lipidccs: Prediction of Collision Cross‐Section Values for Lipids With High Precision to Support Ion Mobility–Mass Spectrometry‐Based Lipidomics,” Analytical Chemistry 89, no. 17 (2017): 9559–9566. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0014] 14. de Cripan S. M., Arora T., Olomí A., Canela N., Siuzdak G., and Domingo‐Almenara X., “Predicting the Predicted: A Comparison of Machine Learning‐Based Collision Cross‐Section Prediction Models for Small Molecules,” Analytical Chemistry 96 (2024): 9088–9096. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0015] 15. Anstine D., Zubatyuk R., and Isayev O., “Aimnet2: A Neural Network Potential to Meet Your Neutral, Charged, Organic, and Elemental‐Organic Needs,” 2024.

[jcc70106-bib-0016] 16. Batatia I., Kovacs D. P., Simm G., Ortner C., and Csányi G., “Mace: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields,” Advances in Neural Information Processing Systems 35 (2022): 11423–11436. [Google Scholar]

[jcc70106-bib-0017] 17. Batzner S., Musaelian A., Sun L., et al., “E (3)‐Equivariant Graph Neural Networks for Data‐Efficient and Accurate Interatomic Potentials,” Nature Communications 13, no. 1 (2022): 2453. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0018] 18. Zubatyuk R., Smith J. S., Nebgen B. T., Tretiak S., and Isayev O., “Teaching a Neural Network to Attach and Detach Electrons From Molecules,” Nature Communications 12, no. 1 (2021): 4870. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0019] 19. Schymanski E. L., Zhang J., Thiessen P. A., Chirsir P., Kondic T., and Bolton E. E., “Per‐ and Polyfluoroalkyl Substances (PFAS) in Pubchem: 7 Million and Growing,” Environmental Science & Technology 57, no. 44 (2023): 16918–16928. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0020] 20. Elapavalore A., Ross D., Groues V., et al., “Pubchemlite Plus Collision Cross Section (CCS) Values for Enhanced Interpretation of Non‐Target Environmental Data,” Environmental Science & Technology Letters 12 (2024): 166–174. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0021] 21. Jobst K. J., Penney C., and Burgers P. C., “Why Are Nh‐Perfluoroalkanoate Ions More Mobile Than Expected? Implications for Identifying an Emerging Environmental Pollutant,” Chemical Communications 60, no. 61 (2024): 7894–7897. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0022] 22. Wang S., Witek J., Landrum G. A., and Riniker S., “Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional‐Angle Preferences,” Journal of Chemical Information and Modeling 60, no. 4 (2020): 2044–2058. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0023] 23. Pracht P., Grimme S., Bannwarth C., et al., “Crest–a Program for the Exploration of Low‐Energy Molecular Chemical Space,” Journal of Chemical Physics 160, no. 11 (2024): 114110. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0024] 24. Nielson F. F., Colby S. M., Thomas D. G., Renslow R. S., and Metz T. O., “Exploring the Impacts of Conformer Selection Methods on Ion Mobility Collision Cross Section Predictions,” Analytical Chemistry 93, no. 8 (2021): 3830–3838. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0025] 25. Ieritano C. and Hopkins W. S., “Assessing Collision Cross Section Calculations Using Mobcal‐Mpi With a Variety of Commonly Used Computational Methods,” Materials Today Communications 27 (2021): 102226. [Google Scholar]

[jcc70106-bib-0026] 26. Ieritano C., Crouse J., Campbell J. L., and Hopkins W. S., “A Parallelized Molecular Collision Cross Section Package With Optimized Accuracy and Efficiency,” Analyst 144, no. 5 (2019): 1660–1670. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0027] 27. Bitzek E., Koskinen P., Gähler F., Moseler M., and Gumbsch P., “Structural Relaxation Made Simple,” Physical Review Letters 97, no. 17 (2006): 170201. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0028] 28. Liu Z., Zubatiuk T., Roitberg A., and Isayev O., “Auto3d: Automatic Generation of the Low‐Energy 3d Structures With Ani Neural Network Potentials,” Journal of Chemical Information and Modeling 62, no. 22 (2022): 5373–5382. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0029] 29. Smith J. S., Isayev O., and Roitberg A. E., “Ani‐1: An Extensible Neural Network Potential With Dft Accuracy at Force Field Computational Cost,” Chemical Science 8, no. 4 (2017): 3192–3203. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0030] 30. Devereux C., Smith J. S., Huddleston K. K., et al., “Extending the Applicability of the Ani Deep Learning Molecular Potential to Sulfur and Halogens,” Journal of Chemical Theory and Computation 16, no. 7 (2020): 4192–4202. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0031] 31. Colby S. M., Thomas D. G., Nuñez J. R., et al., “Isicle: A Quantum Chemistry Pipeline for Establishing In Silico Collision Cross Section Libraries,” Analytical Chemistry 91, no. 7 (2019): 4346–4356. [DOI] [PMC free article] [PubMed] [Google Scholar]

[jcc70106-bib-0032] 32. Grimme S., Hansen A., Ehlert S., and Mewes J.‐M., “R2scan‐3c: A Swiss Army Knife Composite Electronic‐Structure Method,” Journal of Chemical Physics 154, no. 6 (2021): 064103. [DOI] [PubMed] [Google Scholar]

[jcc70106-bib-0033] 33. Chan L., Morris G. M., and Hutchison G. R., “Understanding Conformational Entropy in Small Molecules,” Journal of Chemical Theory and Computation 17, no. 4 (2021): 2099–2106. [DOI] [PubMed] [Google Scholar]

PERMALINK

IMPACT‐4CCS: Integrated Modeling and Prediction Using Ab Initio and Trained Potentials for Collision Cross Sections

Carson Farmer

Hector Medina

ABSTRACT

Abbreviations

1. Introduction

2. Dataset and Methods

2.1. Data

2.2. Methods

FIGURE 1.

3. Results

3.1. PFAS Results

FIGURE 2.

3.2. IMPACT‐4CCS Approach

FIGURE 3.

FIGURE 4.

TABLE 1.

FIGURE 5.

3.3. nH‐PFCA

TABLE 2.

FIGURE 6.

TABLE 3.

4. Discussion

TABLE 4.

5. Conclusion

Acknowledgments

Appendix A.

Ab Initio Molecules

TABLE A.1.

Workflow Settings

TABLE A.2.

TABLE A.3.

Endnotes

Data Availability Statement

References

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases