Skip to main content
Journal of Cheminformatics logoLink to Journal of Cheminformatics
. 2025 Dec 29;17:181. doi: 10.1186/s13321-025-01120-2

CalVSP: a program for analyzing the molecular surface areas, volumes, and polar surface areas

Yuzhu Li 1,2, Daiju Yang 2, Qingyi Shi 2, Weidong Zhang 2,3,, Qingyan Sun 2,
PMCID: PMC12752003  PMID: 41462364

Abstract

The molecular volume, surface area, and polar molecular surface area are important descriptors for characterizing and predicting the molecular properties of lead compounds. Existing computational tools for calculating the above parameters often have complex workflows and are not well-suited for high-throughput conditions. CalVSP is an open-source software for computing molecular volume, molecular surface area, and polar surface area. The software implements a grid-based algorithm that dynamically optimizes grid spacing via quantum chemical reference data to ensure precise parameter calculations. CalVSP was tested on 9489 3D molecular structures, and the results revealed a mean absolute percentage error of 1.25% (95% CI: 1.23–1.27%) for the molecular volume and 1.33% (95% CI: 1.31–1.35%) for the molecular surface area compared with the quantum chemical data. For the molecular polar surface area calculations, the mean absolute percentage error was 4.59% (95% CI: 4.16–5.04%) across the 388 tested molecular structures. The CalVSP written in the C programming language offers a lightweight and easy tool. It can be integrated with other molecular property prediction tools to increase computational accuracy and for large-scale molecular calculations.

Graphical Abstract

graphic file with name 13321_2025_1120_Figa_HTML.jpg

Supplementary Information

The online version contains supplementary material available at 10.1186/s13321-025-01120-2.

Keywords: Computer programs, Molecular surface area, Molecular volume, PSA

Scientific contribution

We provide a user-friendly, open-source command-line tool and library for calculating the molecular volume, surface area, and polar surface area. Compared with other software programs based on quantum chemical calculations, CalVSP results in fewer calculation errors and faster processing speeds. Additionally, CalVSP can be integrated as a library function and combined with other programs for advanced data mining and calculations.

Supplementary Information

The online version contains supplementary material available at 10.1186/s13321-025-01120-2.

Introduction

Molecular descriptors play an important role in predicting the physicochemical properties of molecules, drug property prediction, and noncovalent interactions between molecules. For example, the calculated molecular volume can be used to accurately describe the molecular size and space occupancy. Molecular volume, as an important alternative parameter for molecular shape analysis, can be used to calculate LogP, chromatographic capacity factor[1, 2]. The molecular surface is an approximation of the actual molecular boundary and is an important geometric parameter for describing molecules. The strength of intermolecular dispersion energy is related to the size of the contact surface area between the ligand and receptor, which also dictates their steric complementarity. This relationship between contact area and dispersion energy has been demonstrated experimentally[3]. It enables the quantitative assessment of polar surface area (PSA) exposure ratios to calculate ClogP and brain permeability in drug design[4]. The molecular volume and surface area are also used to predict the density, viscosity, heat capacity, and other physicochemical properties of ionic liquids[5]. PSA refers to the surface area occupied by oxygen or nitrogen atoms or hydrogen atoms attached to oxygen atoms (OH) or nitrogen atoms (NH) in a molecule (some scholars argue that sulfur atoms or hydrogen atoms attached to sulfur atoms (SH) should also be included). TPSA (topological PSA) is a methodology to calculate molecular PSA as a sum of polar fragment contributions[6]. PSA and TPSA play a critical role in the early stage of drug development, particularly in predicting drug permeability. The advantage of TPSA is straightforward calculation and does not require computationally demanding steps. Both PSA and the solvent-accessible polar surface area (SAPSA) are more effective measures of polarity than hydrogen bond donor/acceptor counts or the TPSA[79]. Studies on antibacterial or antiviral drugs beyond Lipinski’s rule of 5 (bRo5) have demonstrated a strong correlation between PSA and cellular permeability/solubility[7]. The permeability across monolayers of the efflux-inhibited colorectal adenocarcinoma cell line Caco-2 is correlated with the minimum SAPSA for each drug (r2 = 0.90) but has a weaker correlation with the drug TPSA (r2 = 0.36)[7, 8]. Some drugs undergo conformational changes in environments with different polarities (e.g., transitioning from aqueous phases to cellular phospholipid bilayers), especially cyclic peptides, macrocyclic drugs, and proteolysis-targeting chimeras (PROTACs), and approximately 50% of drug-like bioactive compounds exhibit this ability, termed “molecular chameleon” properties [10, 11]. In addition to changes in molecular conformation, molecules also change their protonated state with different environments. For example, protonated local anesthetics penetrate the cell membrane in a deprotonated form[12, 13]. Molecules with molecular chameleon properties can change PSA in environments with different polarities, improving the absorption characteristics and efficacy of drugs [7, 11]. PSA has become a vital descriptor for designing and researching bRo5 drugs in early drug development. Calculation of the molecular volume, surface area, and PSA is critically important for guiding drug design, physicochemical property prediction, and material performance optimization.

Molecular volume, which is an unobservable measure referring to the space enclosed by molecular surfaces, inherently has varying definitions across calculation methods, leading to discrepancies in computed volumes owing to differing surface models. The commonly used molecular volume is the van der Waals volume. The van der Waals volumes and surface areas of molecules can be calculated by summing the atomic sphere volumes and surfaces or through Monte Carlo simulation algorithms by predefined atomic van der Waals radii[14]. The van der Waals volumes of molecules change due to the electronic effects of electron transfer and electron cloud distortion during chemical bond formation. Calculating the molecular van der Waals volume by summing the atomic sphere volumes may underestimate the impact of electronic effects on these values. Nevertheless, this computationally efficient method is particularly suitable for large molecules such as polymers, proteins, and nucleotides, offering significant advantages in the calculation of volume and surface area. Different research methodologies lead to discrepancies in atomic van der Waals radius data[1517]. Consequently, the calculated molecular van der Waals volume and surface area variations are based on different radii.

Different studies have different definitions for isosurface, Bader proposed an isosurface with an electron density of 0.001 a.u. for van der Waals surfaces in the gas phase and 0.002 a.u. for the solid phase[1820]. This molecular surface definition, which accounts for more than 98% of the electron density distribution and addresses the limitations of atomic sphere superposition methods (which neglect electronic effects), has been widely adopted in computational chemistry. Calculating van der Waals volumes through Bader's definition involves two steps: (1) performing density functional theory (DFT) calculations on the molecule's 3D structure to obtain its wavefunction and (2) applying Monte Carlo integration or marching tetrahedron algorithms to calculate the volume or surface[21].

However, generating wavefunction files via quantum chemistry (QC) calculations is computationally demanding and requires substantial resources and specialized software. As the number of atoms in a molecule increases, the required computation time and memory demand increase substantially, rendering this method inefficient for large-scale molecular systems or high-throughput computational studies.

Many software programs are available for the calculation of molecular volume, surface area, and PSA. For example, the excellent open-source MoloVol can calculate molecular cavities, volumes, and surfaces, demonstrating high efficiency for large molecules such as protein structures[22]. Additionally, Zeo +  + and PoreBlazer are widely used for molecular volume/surface area calculations[23, 24]. As visualization software, PyMOL provides functionalities for van der Waals or solvent-accessible surface area computations[25]. PLATON can support molecular volume calculations[26]. Polarsa2 calculates the PSA by processing van der Waals surface data generated via MOLVOL[27]; however, its workflow remains cumbersome and currently only supports PSA calculations. The QC analysis software Multiwfn can analyze QC data to extract molecular volume, surface area, and PSA data[28]. However, the above software has drawbacks of computational inefficiency and procedure complexity or the inability to calculate these three parameters with the same software when calculating the molecular volume, surface area, and PSA.

To overcome the above shortcomings, we introduce CAlVSP, a free program for calculating molecular volume, surface area, and PSA. The tool is primarily based on a grid algorithm to approximate the Bader van der Waals volume or surface area and calculate the van der Waals volume, surface area, and PSA of molecules. CalVSP has significant advantages, including high accuracy, as well as improved computational efficiency relative to that of QC workflows. Thus, CalVSP is particularly suitable for calculating the molecular volume, surface area, and PSA of high-throughput or high-throughput molecules with many atoms. The surface area, volume, and PSA of certain molecules may be influenced by their molecular configurations and vary among various conformational states, thus requiring large-scale conformational sampling and analysis. Existing software in such scenarios often suffers from cumbersome or time-consuming workflows, whereas CalVSP efficiently handles these computations. For large-scale conformational sampling studies, CalVSP supports direct processing of molecular conformational trajectory files in the xyz format, enabling rapid batch calculations.

CalVSP is an open-source application written in C. The entire source code is available on a repository hosted on GitHub https://github.com/CalVSP/CalVSP.git and is free to use and modify under the MIT license. The application is operated via a command‒line interface. CalVSP has been tested on Windows 11 and Ubuntu 20.04 LTS. Current and past releases are available as precompiled binaries or as source code via https://github.com/CalVSP/CalVSP.git. Future releases will be made available through the same web page.

Implementation

The overall calculation process of CalVSP is illustrated in Fig. 1. Users may submit 3D molecular structure files in mol2, pdb, sdf, or xyz formats to perform calculations of molecular van der Waals or solvent-accessible surface area, PSA, and volume with CalVSP. CalVSP identifies the file type on the basis of the input file's extension and subsequently extracts atomic information along with corresponding Cartesian coordinates, applying format-specific parsing rules. CalVSP extracts atomic information from different molecular file formats based on chemical record formats, and its core process is as follows: For MOL2 format: The program reads the data between @ < TRIPOS > ATOM and @ < TRIPOS > BOND field identifiers for calculation. For PDB format: The program extracts atomic information from all standard record lines starting with the keywords "ATOM" or "HETAMM". For SDF format: The program first obtains the total number of atoms from the third line of the file, and then uses this value as the upper limit to read the atomic coordinate data of the corresponding line number. For XYZ format: The program parses the total number of atoms declared on the second line of the file and reads the atomic coordinates corresponding to the subsequent lines. For trajectory files containing multiple conformations, CalVSP will parse them frame by frame and calculate for each frame. This method only applies to parsing and extracting data explicitly declared in files. The entire process does not involve the determination of chemical bonding relationships within molecules or the number of molecules in the document.

Fig. 1.

Fig. 1

Flow chart presenting the CalVSP calculation algorithm

Once the data are acquired, the system calculates the van der Waals or solvent-accessible volume, surface area, and PSA of the molecule.

Calculation details

Selection of functional/basis set and grid resolution

The selection of different functional/basis set combinations can significantly affect the computation time, memory requirements, and accuracy of the results. For testing purposes, we conducted calculations on 23 molecular compounds via 19 distinct functional/basis set combinations (Table 1) and generated corresponding wavefunction files. Using Multiwfn software with an electron density isosurface threshold of 0.001 a.u. and a spacing of grid points of 0.15 Bohr, we calculated 3D molecule volumes [28].

Table 1.

Selected functional basis sets

Group Functional basis sets
a B97-3c [3134]
b r2SCAN-3c [3537]
c BLYP D3 def2-TZVP [31, 32, 38, 39]
d B3LYP D3 def2-TZVP(-f) [31, 32, 3840]
e B3LYP D3 def2-TZVP [31, 32, 38, 39]
f wB97M-V def2-TZVP [38, 39, 4143]
g PWPB95 D3 def2-TZVPP [31, 32, 38, 39, 44]
h wB97X-2 D3 def2-TZVPP [3133, 38, 39, 45]
i D4 def2-TZVPP [3739]
j PWPB95 D3 def2-QZVPP [31, 32, 38, 39]
k wB97X-2 D3 def2-QZVPP [3133, 38, 39]
l D4 def2-QZVPP [3739]
m DLPNO-CCSD(T) normalPNO [46]
n DLPNO-CCSD(T) tightPNO [46]
o CCSD(T) [46]
p B3LYP/6-31G[4751]
q B3LYP/6-31G**[4752]
r B3LYP/6-311G[47, 48, 51, 53]
s B3LYP/6-311G** [47, 48, 51, 53]

To evaluate the accuracy of the volumes obtained from different functional/basis set combinations calculated, a theoretical benchmark was needed. We employed the highly accurate DLPNO-CCSD(T)/cc-pVTZ, which provides results close to the CCSD(T) complete basis set as the reference[29, 30]. The absolute errors in the molecular volume calculations were then determined by comparing the values from the 18 functional/basis set combinations against this DLPNO-CCSD(T) benchmark. As shown in Fig. 2, the b group r2SCAN-3c functional demonstrates the lowest absolute error in the results of the molecular volume and achieves faster computational speeds than other functional/basis set combinations. Consequently, the r2SCAN-3c-derived results were adopted as reference standards to optimize the grid resolution parameters to minimize errors.

Fig. 2.

Fig. 2

Absolute errors in molecular volume compared with the m-functional/basis set

We selected 3D molecules with different sizes of Vrec (the volume of the rectangular bounding box) and compared the molecular volumes calculated with different spacings of grid points against QC-derived benchmarks to identify the spacing of grid points of volume error minimized for different sizes of Vrec.

The Vrec is defined by Eq. (1) for molecular volume calculations, via enumerating the Cartesian coordinate values of all the atoms in a molecule and determining the maximum and minimum values along the X-, Y-, and Z-axes. Here, xmax, xmin, ymax, ymin, zmax, and zmin represent the maximum and minimum coordinate values of all the atoms Ai(xi, yi, zi) of the molecule.

Vrec=k{x,y,z}(kmax-kmin) 1

The relationship between Vrec with minimal error and the spacing of grid points was established, with specific values tabulated in support information Table S1.

Generating grid points and calculating data values

Then, molecules are encapsulated in rectangular boxes of suitable dimensions, and three-dimensional grid point data P(x,y,z) are generated in Cartesian coordinates using the predefined grid resolution. Each grid point Pj (xj, yj, zj) is scanned across 26 directions (6 orthogonal, 12 face-diagonal, and 8 body-diagonal) to compute the minimum Euclidean distance Rdis to the nearest atom Ai (xi, yi, zi). Rdis is compared with the van der Waals radius Rvdw of the nearest atom according to Eq. (2)[54]:

Rdis=minAi(xP-xi)2+(yP-yi)2+(zP-zi)2 2

The grid points are classified as follows:

  • Internal if Rdis < Rvdw,

  • Surface if Rdis = Rvdw ​,

  • External if Rdis > Rvdw.

This classification is mathematically defined as Eq. (3):

Mark(Pi)=sign(Rvdw-Rdis)=10-1ifRdis<RvdwifRdis=RvdwifRdis>Rvdw 3

Isolated misassigned grid points (e.g., Pj) at the center of 3 × 3 × 3 cubic lattices are scanned 26 neighboring grid points (including face-, edge-, and corner-sharing grid points in 3D space). When any grid point Pj exhibits the condition where in one or more of its 13 neighboring directions, the two opposing grid points Pj1 and Pj2 along such a direction possess the same attributes that differ from Pj’s own, Pj undergoes reclassification so that its attributes align with those of its neighbors (Pj1 and Pj2). This correction process is similar to that of the flood-fill algorithm in that all grid points are accurately classified, and it is iterated until there are no inconsistencies.

Molecular volume and surface area calculations

The volume weight factor is defined by summing the contributions of the internal and surface grid points via Eq. (4); then, the molecular volume V can be calculated via multiplication with the unit grid volume via Eq. (5):

GV=j=1N1if Mark(Pj)-10otherwise. 4
V=GV×grid_size3 5

Similarly, the surface area weight factor is defined by Eq. (6), and the molecular surface area S can be empirically corrected via QC benchmarks according to Eq. (7):

GS=j=1N1if Mark(Pj)=00otherwise. 6
S=GS×grid_size2×1.261+13.506 7

PSA calculation

PSA is defined as the surface area contributed by polar atoms (N, O, NH, OH). For each surface grid point Psuf, the nearest atom is identified. If it belongs to the polar group, Psuf is labeled PPSA according to Eq. (8):

Mark(PPSA,i)=1Rdis=RvdwAi{N,O,NH,OH}0otherwise. 8

The PSA SPSA is then calculated according to Eqs. (9) and (10):

GP=j=1N1if Mark(PPSA,j)=10otherwise. 9
SPSA=GPGS×S 10

Output molecular surface information

The CalVSP program outputs results according to computational requirements and exports surface grid data in the xyz format for downstream applications.

Results and discussion

Determination of Optimal Grid Spacing

Our initial investigation on a parameterization set of 57 molecules revealed that the grid spacing which minimizes systematic error for an individual molecule is influenced by its specific physical characteristics (e.g., shape, compactness), leading to a distribution of molecule-specific optimal values (see Table S1, Supplementary Information). The arithmetic mean of these 57 optimal values is 0.43 Å, and they were found to cluster around this common value. This suggested that a single, universal grid spacing of 0.43 Å might serve as a robust approximation for the molecule-specific optima.

To rigorously test this hypothesis and assess the generalizability of both approaches, we performed a large-scale validation on an independent set of 9,489 molecules. We evaluated the performance of the universal spacing (0.43 Å) against our size-dependent parameterization. The results are summarized in Table 2.

Table 2.

Comprehensive performance comparison on independent test set (n = 9489)

Metric Size-dependent (Table S1) Universal 0.43 Å spacing
Volume RMSE (Å3) 6.44 ± 0.05 6.06 ± 0.05
Surface Area RMSE (Å2) 6.22 ± 0.05 6.31 ± 0.05

The uncertainty estimates ( ±) for the test set were obtained via bootstrapping

As shown in Table 2, the validation reveals:

The universal spacing (0.43 Å) provides superior accuracy for molecular volume calculation on the independent test set. Conversely, the size-dependent parameterization delivers the highest accuracy for molecular surface area calculation.

Given that the precise computation of molecular surface area is the primary requirement of our analysis and is central to the conclusions of this work, we have elected to retain and recommend the size-dependent scheme. This approach is justified by its optimal performance for our key metric of interest. The universal spacing of 0.43 Å remains a simplified alternative for applications where volumetric accuracy is the priority.

Comparison of the CalVSP and QC methods for molecular surface area and volume calculations

Since existing molecular volume and surface area calculation tools (nonquantum calculation methods) fail to consider changes in molecular volume and surface area caused by electron cloud redistribution during interatomic bonding, we compared CalVSP computational accuracy against reference values obtained from quantum chemical calculations. Our calculations show that these changes are significant: for example, in molecules like CO₂, the volume decreases by approximately 35.8% compared to the sum of atomic volumes, while the surface area decreases by 45.9% (see Table S3 for details).

Test dataset preparation: To ensure a random selection, 11,000 Compound ID (CID) numbers were randomly sampled from the PubChem database. A total of 10,000 corresponding 3D compound structures were then downloaded in SDF format [55]. All molecular structures that successfully completed the quantum chemical calculation workflow (n = 9489) were programmatically validated to confirm non-zero Z-coordinates for all atoms, ensuring the use of authentic 3D geometries in subsequent analysis. Wavefunction files were calculated via the ORCA 5.0.3 program (r2SCAN-3c functional, no structure optimization) and processed at Multiwfn 3.8 (isosurface threshold: 0.001, 0.002 a.u., grid point spacing: 0.15 or isosurface threshold: 0.0016 a.u., grid point spacing: 0.15) to obtain QC-derived molecular volumes and surface areas[20, 5660].

In response to reviewer comments, we have enhanced the CalVSP software to support multiple isosurface thresholds (0.001, 0.0016, and 0.002 a.u.), thereby improving its versatility. To validate our methodology against the benchmark established by Amin Alibakhshi[60] (which employs DSD-PBEP86 0.0016 a.u.), we confirmed the agreement between r2SCAN-3c and DSD-PBEP86 functionals under identical conditions. Detailed comparisons are provided in Supplementary Material (Fig. S1).

After filtering out invalid or incomplete entries, the volume and surface area values of 9489 compounds were obtained through the above calculation process. The corresponding CalVSP calculations are performed on the same dataset. A comparative analysis was performed between the CalVSP and QC results.

In this work, the statistical parameters mean squared error (MSE, Eq. (11)), root mean squared error (RMSE, Eq. (12)), mean absolute error (MAE, Eq. (13)), absolute percentage error (APE, Eqs. (14,15)), and mean absolute percentage error (MAPE, Eq. (16)) were employed to evaluate the calculation results.

MSE=1nk=1n(CalVSPk-QMk)2 11
RMSE=MSE 12
MAE=1nk=1nCalVSPk-QMk 13
Reltative Error(k)=CalVSPk-QMkQMk 14
APEk=Relative Error(k)×100% 15
MAPE=1nk=1nAPEk 16

The frequency distributions of QC-calculated van der Waals surface areas and volumes are presented in Fig. 3. Both the surface areas (Fig. 3A) and volumes (Fig. 3B) distributions exhibit distinct bimodal characteristics, reflecting the structural diversity of the molecular dataset. The surface areas distribution spans from 78.33 to 737.27 Å2 with a mean of 380.19 Å2, while the volumes distribution ranges from 58.30 to 906.83 Å3 with a mean of 426.62 Å3.

Fig. 3.

Fig. 3

A QC-calculated vdW surfaces frequency distribution; B QC-calculated vdW Volumes frequency distribution;

For surface area computations, regression analysis revealed a strong correlation between the QC- and CalVSP-derived surface areas (R2 = 0.99503, Pearson's r = 0.9976; N = 9489), as shown in Table 3. The relative error distribution (Fig. 4B) revealed 95% values within [− 2.94%, + 3.91%] (full range: − 6.14% to + 7.81%), with relative errors tightly clustered around neutrality (median = 0.046%) and a standard deviation (SD) of 1.71%. As shown in Fig. 4C and Table 3, the Bland–Altman analysis demonstrated minimal systematic bias (mean residual = 0.09 Å2, MAE = 4.91 Å2), and 94.8% of the differences fell within the 95% limits of agreement (LoA: ± 1.96σ, σ = 6.28 Å2).

Table 3.

Comparison between CalVSP and QC calculation methods in calculating molecular surface area and volume

Metric Surface Volume
MSE 38.7 ± 0.6 (Å2)2 41.5 ± 0.6 (Å3)2
RMSE 6.22 ± 0.05 Å2 6.44 ± 0.05 Å3
MAE 4.91 ± 0.04 Å2 5.18 ± 0.04 Å3
MAPE 1.33 ± 0.01% 1.251 ± 0.009%
Pearson’s r 0.99767 ± 0.00005 0.9986 ± 0.00003
R2 0.99503 ± 0.0001 0.99669 ± 0.00007

The uncertainty estimates ( ±) were obtained via bootstrapping

Fig. 4.

Fig. 4

Computational validation of van der Waals surface metrics: A correlation scatter plot between QC-derived and CalVSP-calculated vdW surfaces; B relative error frequency distribution of surface area computations; C Bland‒Altman analysis of methodological agreement;

Regression analysis revealed a strong correlation between the QC- and CalVSP-calculated molecular volumes, with a coefficient of determination (R2 = 0.99669, Pearson's r = 0.9986; N = 9489), as shown in Table 3. The relative error distribution (Fig. 5B) revealed 95% values within [− 2.95%, + 3.09%] (full range: − 6.51% to + 7.52%), with a median of − 1.21% and an SD of 1.54%, indicating slight systematic underestimation by the CalVSP method compared with the QC method while maintaining high overall reliability. As shown in Fig. 4C and Table 3, the Bland‒Altman analysis (Fig. 5C) demonstrated minimal systematic bias (mean residual = − 1.21 Å3, MAE = 5.18 Å3), and 95.0% of the differences fell within the 95% limits of agreement (LoA: ± 1.96σ, σ = 6.28 Å3).

Fig. 5.

Fig. 5

Computational validation of van der Waals volume metrics: A Scatter plot comparing molecular van der Waals volumes derived from QC calculations and CalVSP; B Relative error distribution of CalVSP-calculated van der Waals volumes relative to that of QC; C Bland‒Altman plot assessing the agreement between the van der Waals volumes calculated via QC and CalVSP;

To evaluate the CalVSP across different isosurface thresholds, we extended our validation to thresholds of 0.0016 and 0.002 a.u. The results (Supplementary Material: Tables S5–S6 and Figs. S2–S5) demonstrate agreement with QC calculations, with R2 values > 0.996 and MAPE < 1.1% for both surface area and volume at all thresholds. This confirms that CalVSP supports multiple isosurface thresholds for molecular surface area and volume computations, enhancing its versatility and practical applicability in diverse computational scenarios.

The surface of atoms and molecules is defined as an iso-density surface based on electron density data. Different studies have proposed varying threshold values, each yielding significantly different iso-density surfaces[18, 19]. For instance, Bader et al. suggested a threshold of 0.001 a.u. for gas-phase systems (e.g., methane and inert gases), where it aligns with experimentally measured equilibrium diameters [20]. In contrast, for condensed phases or van der Waals interactions (such as crystal packing or solvent-accessible surfaces), a higher threshold of 0.002 a.u. may be more appropriate[20]. Recently, Amin et al. recommended a value of 0.0016 a.u. based on thermodynamically effective (TE) surfaces, which are derived from experimental phase-change data (e.g., vaporization enthalpy and surface tension) [60]. This suggests that the optimal iso-density threshold depends on the molecular state: gas-phase systems may require a lower value (0.001 a.u.), while condensed phases need a higher value (0.002 a.u.). The 0.0016 a.u. threshold, validated against TE surfaces, is particularly suitable for liquid states, as TE surfaces incorporate liquid-phase experimental data. Consequently, to accommodate these diverse scenarios, CalVSP supports multiple iso-density thresholds (0.001, 0.0016, and 0.002 a.u.) for calculating molecular surface area and volume, enhancing its versatility across computational applications.

In summary, these quantitative assessments validate CalVSP's reliability for van der Waals surface and volume computations compared with the reference QC methods.

Comparison of the molecular surface area, volume, and PSA across computational tools

MoloVol is an excellent tool for calculating molecular volume and surface area. PyMOL provides the molecular surface area and PSA calculations. This study evaluated the accuracy of CalVSP, MoloVol, and PyMOL by comparing their calculated molecular volumes, surface areas, and PSAs against reference values obtained from QC calculations.

Methodological Constraints:

  • Molecular Volume: Due to the lack of a volumetric calculation function in PyMOL, only CalVSP and MoloVol were compared for this parameter.

  • Molecular Surface Area: All three tools (CalVSP, MoloVol, PyMOL) were assessed via shared capabilities.

  • PSA: MoloVol does not support direct PSA computation; therefore, comparisons were restricted to CalVSP and PyMOL.

A randomly selected subset of 390 3D molecular structures was extracted from the original 9,489 compounds with validated QC calculations. The selection protocol was based on Eqs. (17) and (18).

n=z2×p^(1-p^)ε2 17
n=n1+z2×p^(1-p^)ε2N 18

As shown in Table 4 and Fig. 6A, CalVSP-calculated molecular surface areas exhibit a MAPE of 1.27% relative to the QC reference values. Compared with MoloVol and PyMOL, CalVSP achieves better precision in surface area calculations, with both lower APEs and reduced SDs of APEs.

Table 4.

Error analysis of CalVSP, MoloVol, and PyMOL compared with the QC reference methods

Surface Areas Volumes PSAs
RMSE(Å2) MAPE(%) SDAPE RMSE(Å3) MAPE(%) SDAPE RMSE(Å2) MAPE(%) SDAPE
CalVSP 6.2 ± 0.2 1.27 ± 0.05 0.016 6.4 ± 0.2 1.22 ± 0.05 0.015 3.4 ± 0.1 4.6 ± 0.2 0.062
MoloVol 12.8 ± 0.4 3.0 ± 0.1 0.032 103 ± 1 23.11 ± 0.005 0.009
PyMOL 16.0 ± 0.7 3.1 ± 0.1 0.039 7.8 ± 0.3 11.0 ± 0.8 0.196

The uncertainty estimates ( ±) were obtained via bootstrapping

Fig. 6.

Fig. 6

A Average percent error (APE) comparison of molecular surface areas calculated by CalVSP, MoloVol, and PyMOL against QC-derived reference values. B APE comparison of molecular volumes calculated by CalVSP and MoloVol relative to the QC benchmarks. C APE comparison of PSA between the CalVSP and PyMOL methods versus the QC methods

For molecular volumes (Fig. 6B), CalVSP achieves a MAPE of 1.22%, whereas MoloVol shows a substantially higher MAPE of 23.11%, which is likely attributable to systematic algorithmic biases. Although the SD of the APEs for CalVSP (0.015%) is marginally greater than MoloVol’s 0.009%, this difference is negligible given CalVSP’s order-of-magnitude improvement in the mean accuracy.

In the PSA comparisons (Fig. 6C), CalVSP outperforms PyMOL, with a MAPE of 4.6%, versus PyMOL, with a MAPE of 11.0%. Compared with PyMOL, CalVSP also has a narrower error distribution (SD: 0.062%) and broader spread (SD: 0.196%).

CalVSP demonstrates statistically significant accuracy advantages over both MoloVol and PyMOL across the evaluated metrics of molecular volume, surface area, and PSA for the tested dataset.

Testing for high-molecular-weight compounds

In prior analyses, the random selection of 3D molecular structures from the PubChem database produced a dataset with molecular atom counts primarily spanning 4–108 atoms and molecular weights within 5–780 g/mol, and the majority of compounds presented atom counts of 35–68 atoms (equivalent to molecular weights of 200–500 g/mol). To evaluate CalVSP’s applicability to larger molecules, we extended comparative calculations to high-molecular-weight compounds. However, QC methods are limited in their ability to handle large-molecule systems because of their high computational costs and memory demands. Therefore, only molecules successfully computed by QC were retained for analysis, and molecules were selected as shown in Fig. 7.

Fig. 7.

Fig. 7

Structures of selected high-molecular-weight compounds

As shown in Fig. 8, CalVSP is in closer agreement with the QC-derived results in calculating the molecular volume, surface area, and PSA for high-molecular-weight compounds, outperforming PyMOL and MoloVol. These include bRo5 drugs such as macrocyclic antibiotics, cyclic peptides, PROTACs, cardiotonic agents, antitumor compounds, and biological macromolecules such as small proteins. CalVSP, which simplifies and streamlines the molecular property calculation process by eliminating the complex input file preparation required for QC workflows, offers a more user-friendly and efficient alternative to costly QC methods.

Fig. 8.

Fig. 8

Comparison (volume, area, and PSA) between CalVSP and QC for high-molecular-weight compounds

Calculation time test

Testing was conducted on 27 three-dimensional structured molecules with varying atom counts. The computational times required by CalVSP and MoloVol for calculating the molecular volume and surface area, as well as that of the QC software ORCA 5.0.3, were recorded separately. As shown in Table 5, CalVSP has a significantly faster computational speed than MoloVol for molecules containing fewer than 438 atoms. However, its computational efficiency markedly decreases with increasing atomic count. In contrast, the MoloVol processing speed is less dependent on the molecular size. Both CalVSP and MoloVol substantially reduce the computation time compared with that of the QC calculations. The performance degradation of CalVSP with larger systems is due primarily to increased grid point requirements and iterative computational steps associated with increased molecular size. Nevertheless, CalVSP maintains acceptable computation times above 1 min for molecular systems that contain fewer than 1,000 atoms. The calculation speed of CalVSP is much faster than that of QC calculations, making it more suitable for high-throughput calculation scenarios (such as molecular dynamics simulation trajectory analysis or high-throughput screening).

Table 5.

Computational time analysis of molecules with different atom counts

Time(S)
ID Number of Heavy Atoms CalVSP MoloVol QC
Camptothecin 26 0.05 0.18 39.89
Lorlatinib 30 0.07 0.33 63.76
Pacritinib 35 0.10 0.29 68.78
Retapamulin 36 0.11 0.28 105.19
Digoxin 41 0.14 0.30 113.72
Plerixafor 36 0.13 0.50 88.58
JW48 60 0.39 0.60 159.76
Rifampin 59 0.27 0.68 222.82
Azithromycin 26 0.23 0.51 218.46
Lecithin 53 0.55 0.70 182.88
Amphotericin B 139 0.43 0.62
PDB:1B9E(A) 163 0.79 2.14
Vancomycin 101 0.53 0.98 513.7
PDB:1APH(A) 180 0.73 1.78
PDB:3C59(B) 209 1.07 2.50
PDB:1B9E(B) 245 2.24 3.37
PDB:1APH(B) 254 1.67 2.83
PDB:5OTT(B) 274 2.26 3.35
PDB:2M7D 152 0.93 0.89
PDB:1APH 438 3.02 3.32
PDB:5NIQ(1) 643 8.83 4.90
PDB:6JWE 455 6.12 2.39
PDB:1B9E 816 15.62 8.19
PDB:3C59(A) 850 17.49 8.97
PDB:6S8Y 894 30.46 6.03
PDB:1C2H 1650 119.5 9.08

“-” QC calculations failed. Benchmark tests were performed on Windows 11(23H2) 11th Gen Intel(R) Core(TM) i5-1155G7 CPU (8 cores @ 2.50 GHz) 16 GB (8 × 2 GB) LPDDR4X-4267 SDRAM @ 0.6 V and CalVSP compiled through Dev-C +  + Version 5.11—27 April 2015

Validation of the PSA calculation results

To evaluate the accuracy of CalVSP in calculating the PSA, we selected reference datasets from the literature and compared the average PSA values obtained from molecular dynamics (MD) simulation trajectories with the Boltzmann-weighted average dynamic PSA (PSAd) values in Table 6 [61]. And the fit of average PSA values to human fraction absorbed data (%FA) via the Boltzmann sigmoidal curve is shown in Fig. 9 (R2 = 0.95, RMSE = 8.71%).

Table 6.

Comparison of PSA calculated via CalVSP with literature data (PSAd)

compound PSAda2 PSA/Å2 Detb2
Metoprolol 53.1 48.5 4.6
Nordiazepam 45.1 47.1 − 2.0
Diazepam 33 30.2 2.8
Oxprenolol 46.8 47.7 − 0.9
Phenazone 27.1 27.1 0.0
Oxazepam 66.9 70.0 − 3.1
Alprenolol 37.1 37.8 − 0.7
Practolol 73.4 75.4 − 2.0
Pindolol 56.5 61.6 − 5.1
Ciprofloxacin 78.7 79.5 − 0.8
Metolazone 94.5 94.7 − 0.2
Tranexamic acid 69.2 71.7 − 2.5
Atenolol 90.9 90.5 0.4
Sulpiride 100.2 102.4 − 2.2
Mannitol 116.6 123.9 − 7.3
Foscarnet 115.3 125.9 − 10.6
Sulfasalazine 141.9 148.7 − 6.8
Olsalazine 141.0 148.8 − 7.8
Lactulose 177.2 185.7 − 8.5
Raffinose 242.1 234.2 7.9

aPSA values obtained from molecular dynamics simulation trajectories with the Boltzmann-weighted average dynamic PSA

bDet represents the difference between PSAd and PSA, calculated as PSAd—PSA

Fig. 9.

Fig. 9

Fitting curves of the PSAs calculated via the CalVSP vs. the human FA data

As shown in Table 6, there is a slight discrepancy between the calculated results and the literature data. The observed discrepancies in the calculated results arise from differences in molecular conformational sampling environments and methodologies, where the literature method employs vacuum-based conformational sampling incorporating Boltzmann distribution weighting, whereas our experimental approach applies aqueous MD simulations to derive ensemble averages directly. Nonetheless the results demonstrate a strong correlation between the PSA values derived from these trajectories and experimental drug absorption data (Fig. 9), confirming that CalVSP is a reliable tool for computing the PSAs of chemical compounds. This offers strong support for predicting the absorption and permeation properties of drug candidates[62, 63].

Molecules also change their protonated state with different environments, and the protonation state of the molecules can also change the PSA value of the molecules. Local anesthetics penetrate the cell membrane in a deprotonated form[12, 13]. We target four drug molecules with different acidic and basic properties (Nortriptyline, Pindolol, Probenecid, and Warfarin). The numerical variation of PSA value with pH value was tested.

From Fig. 10, it can be observed that the PSA of the four molecules undergoes significant changes with variations in pH. The magnitude of PSA change for the basic molecules Nortriptyline and Pindolol is greater than that for the acidic molecules Probenecid and Warfarin. For Probenecid, the deprotonation of its carboxylic acid group leads to an increase in PSA, which is presumably related to the greater exposure of the polar surface area of the O atoms after the COOH group loses a proton.

Fig. 10.

Fig. 10

Changes in PSA of Four Drug Molecules with pH: A Nortriptyline; B Pindolol; C Probenecid; D Warfarin

From the above calculations, it is evident that CalVSP can effectively reflect the changes in the PSA values of molecules during proton gain or loss. PSA is a crucial factor influencing molecular polarity, apart from atomic charge factors.

Conclusion

Currently, numerous methods are available for calculating the molecular volume, surface area, and PSA. However, these methods may have complex calculation processes, certain requirements for computational resources, and even require multiple software supports to complete the calculation of the corresponding parameters. This paper introduces CalVSP, a tool that provides results closer to those from QC methods than commonly used tools such as PyMOL and MoloVol for molecular surface area and volume calculations. CalVSP offers a simpler workflow than do the QC methods do, achieving similar results at a lower computational cost. For complex compounds with high molecular weights that are computationally prohibitive for QC methods, CalVSP can rapidly compute the molecular volume, surface area, and PSA. Beyond standalone use, CalVSP can be integrated as a library function and combined with other methods for advanced data mining and calculations. Additionally, this study evaluated CalVSP’s accuracy in PSA calculations against PyMOL and verified its ability to accurately correlate with experimental drug absorption data. The results show that the CalVSP is a reliable and efficient tool for predicting drug properties such as absorption, distribution, metabolism, excretion, and toxicity (ADMET), offering significant support for drug discovery and development.

However, CalVSP also has certain limitations. For instance, as the number of atoms increases, the computational speed is affected. Particularly for calculations surfaces and volumes involving isosurfaces at 0.0016 a.u. and 0.002 a.u., which require smaller grid spacings, the computational speed is relatively slow and necessitates further optimization of subsequent algorithms.

Availability and requirements

Project name: CalVSP.

Project home page: https://github.com/CalVSP/CalVSP.git

Operating system(s): Tested on Linux OS (Ubuntu 20.04 LTS) and Windows OS(11).

Programming language: C.

Other requirements: dependencies are described in the README file on the project home page.

License: MIT.

Any restrictions to use by nonacademics: None.

Supporting information available

Test dataset. randomly selected subset of 390 3D molecular structures. Testing for High-Molecular-Weight Compounds. PSA data.

Supplementary Information

Additional file 1. (1.5MB, docx)

Acknowledgements

This research was funded by the National Key Research and Development Program of China (2022YFC3502000), National Natural Science Foundation of China (82430119), Shanghai Municipal Science and Technology Major Project (ZD2021CY001), the ability establishment of sustainable use for valuable Chinese medicine resources (2060302), Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (2023-I2M-3-009), National Key Laboratory of Lead Druggability Research (Grant No. NKLYT2023010).

Author contributions

YL, WZ and QS designed the research study. YL developed the method and wrote the code. DY and QS performed the analysis. YL, WZ and QS wrote the paper. All the authors read and approved the manuscript.

Funding

National Key Research and Development Program of China (2022YFC3502000), National Natural Science Foundation of China (82430119), Shanghai Municipal Science and Technology Major Project (ZD2021CY001), the ability establishment of sustainable use for valuable Chinese medicine resources (2060302), Chinese Academy of Medical Sciences (CAMS) Innovation Fund for Medical Sciences (2023-I2M-3–009), National Key Laboratory of Lead Druggability Research (Grant No. NKLYT2023010).

Availability of data and materials

The CalVSP application is publicly available on GitHub at https://github.com/CalVSP/CalVSP.git under the MIT License. The README file in the GitHub repository provides information about how to set up and use the application. The tutorials on CalVSP are available on GitHub at https://github.com/CalVSP/CalVSP.git.

Declarations

Competing Interests

The authors declare no competing interests.

Footnotes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Contributor Information

Weidong Zhang, Email: wdzhangy@hotmail.com.

Qingyan Sun, Email: sqy_2000@163.com.

References

  • 1.Zissimos AM, Abraham MH, Barker MC et al (2002) Calculation of Abraham descriptors from solvent–water partition coefficients in four different systems; evaluation of different methods of calculation. J Chem Soc Perkin Trans 2:470–477. 10.1039/B110143A [Google Scholar]
  • 2.Abraham MH, McGowan JC (1987) The use of characteristic volumes to measure cavity terms in reversed phase liquid chromatography. Chromatographia 23:243–246. 10.1007/BF02311772 [Google Scholar]
  • 3.Rühe J, Rajeevan M, Shoyama K et al (2024) A terrylene bisimide based universal host for aromatic guests to derive contact surface-dependent dispersion energies. Angew Chem Int Ed 63:e202318451. 10.1002/anie.202318451 [DOI] [PubMed] [Google Scholar]
  • 4.Muehlbacher M, Kerdawy AE, Kramer C et al (2011) Conformation-dependent QSPR models: logPOW. J Chem Inf Model 51:2408–2416. 10.1021/ci200276v [DOI] [PubMed] [Google Scholar]
  • 5.Preiss UPRM, Slattery JM, Krossing I (2009) In silico prediction of molecular volumes, heat capacities, and temperature-dependent densities of ionic liquids. Ind Eng Chem Res 48:2290–2296. 10.1021/ie801268a [Google Scholar]
  • 6.Ertl P, Rohde B, Selzer P (2000) Fast calculation of molecular polar surface area as a sum of fragment-based contributions and its application to the prediction of drug transport properties. J Med Chem 43:3714–3717. 10.1021/jm000942e [DOI] [PubMed] [Google Scholar]
  • 7.Rossi Sebastiano M, Doak BC, Backlund M et al (2018) Impact of dynamically exposed polarity on permeability and solubility of chameleonic drugs beyond the rule of 5. J Med Chem 61:4189–4202. 10.1021/acs.jmedchem.8b00347 [DOI] [PubMed] [Google Scholar]
  • 8.Guimarães CRW, Mathiowetz AM, Shalaeva M et al (2012) Use of 3D properties to characterize beyond rule-of-5 property space for passive permeation. J Chem Inf Model 52:882–890. 10.1021/ci300010y [DOI] [PubMed] [Google Scholar]
  • 9.Begnini F, Poongavanam V, Atilaw Y et al (2021) Cell permeability of isomeric macrocycles: predictions and NMR studies. ACS Med Chem Lett 12:983–990. 10.1021/acsmedchemlett.1c00126 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Caron G, Ermondi G (2017) Updating molecular properties during early drug discovery. Drug Discov Today 22:835–840. 10.1016/j.drudis.2016.11.017 [DOI] [PubMed] [Google Scholar]
  • 11.Whitty A, Zhong M, Viarengo L et al (2016) Quantifying the chameleonic properties of macrocycles and other high-molecular-weight drugs. Drug Discov Today 21:712–717. 10.1016/j.drudis.2016.02.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Schreier S, Frezzatti WA, Araujo PS et al (1984) Effect of lipid membranes on the apparent pK of the local anesthetic tetracaine spin label and titration studies. Biochim Biophys Acta BBA—Biomembr 769:231–237. 10.1016/0005-2736(84)90027-0 [DOI] [PubMed] [Google Scholar]
  • 13.Collura V, Letellier L (1990) Mechanism of penetration and of action of local anesthetics in Escherichia coli cells. Biochim Biophys Acta BBA—Biomembr 1027:238–244. 10.1016/0005-2736(90)90313-D [DOI] [PubMed] [Google Scholar]
  • 14.Dodd LR, Theodorou DN (1991) Analytical treatment of the volume and surface area of molecules formed by an arbitrary collection of unequal spheres intersected by planes. Mol Phys 72:1313–1345. 10.1080/00268979100100941 [Google Scholar]
  • 15.Bondi A (1964) Van der Waals volumes and radii. J Phys Chem 68:441–451. 10.1021/j100785a001 [Google Scholar]
  • 16.Rowland RS, Taylor R (1996) Intermolecular nonbonded contact distances in organic crystal structures: comparison with distances expected from van der Waals radii. J Phys Chem 100:7384–7391. 10.1021/jp953141+ [Google Scholar]
  • 17.Alvarez S (2013) A cartography of the van der Waals territories. Dalton Trans 42:8617. 10.1039/c3dt50599e [DOI] [PubMed] [Google Scholar]
  • 18.Bader RFW, Henneker WH, Cade PE (1967) Molecular charge distributions and chemical binding. J Chem Phys 46:3341–3363. 10.1063/1.1841222 [Google Scholar]
  • 19.Boyd RJ (1977) The relative sizes of atoms. J Phys B At Mol Phys 10:2283. 10.1088/0022-3700/10/12/007 [Google Scholar]
  • 20.Bader RFW, Carroll MT, Cheeseman JR, Chang C (1987) Properties of atoms in molecules: atomic volumes. J Am Chem Soc 109:7968–7979. 10.1021/ja00260a006 [Google Scholar]
  • 21.Lu T, Chen F (2012) Quantitative analysis of molecular surface based on improved marching tetrahedra algorithm. J Mol Graph Model 38:314–323. 10.1016/j.jmgm.2012.07.004 [DOI] [PubMed] [Google Scholar]
  • 22.Maglic JB, Lavendomme R (2022) Molovol: an easy-to-use program for analyzing cavities, volumes and surface areas of chemical structures. J Appl Crystallogr 55:1033–1044. 10.1107/S1600576722004988 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Willems TF, Rycroft CH, Kazi M et al (2012) Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials. Microporous Mesoporous Mater 149:134–141. 10.1016/j.micromeso.2011.08.020 [Google Scholar]
  • 24.Sarkisov L, Harrison A (2011) Computational structure characterisation tools in application to ordered and disordered porous materials. Mol Simul 37:1248–1257. 10.1080/08927022.2011.592832 [Google Scholar]
  • 25.Schrödinger LLC, Warren DL. The PyMOL molecular graphics system
  • 26.Spek AL (2009) Structure validation in chemical crystallography. Acta Crystallogr D Biol Crystallogr 65:148–155. 10.1107/S090744490804362X [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Clark DE (1999) Rapid calculation of polar molecular surface area and its application to the prediction of transport phenomena. 1. Prediction of intestinal absorption. J Pharm Sci 88:807–814. 10.1021/js9804011 [DOI] [PubMed] [Google Scholar]
  • 28.Lu T (2024) A comprehensive electron wavefunction analysis toolbox for chemists, Multiwfn. J Chem Phys 161:082503. 10.1063/5.0216272 [DOI] [PubMed] [Google Scholar]
  • 29.Riplinger C, Neese F (2013) An efficient and near linear scaling pair natural orbital based local coupled cluster method. J Chem Phys 138:034106. 10.1063/1.4773581 [DOI] [PubMed] [Google Scholar]
  • 30.Riplinger C, Sandhoefer B, Hansen A, Neese F (2013) Natural triple excitations in local coupled cluster calculations with pair natural orbitals. J Chem Phys 139:134101. 10.1063/1.4821834 [DOI] [PubMed] [Google Scholar]
  • 31.Grimme S, Antony J, Ehrlich S, Krieg H (2010) A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu. J Chem Phys 132:154104. 10.1063/1.3382344 [DOI] [PubMed] [Google Scholar]
  • 32.Grimme S. Effect of the damping function in dispersion corrected density functional theory. In: Stephan Ehrlich. 10.1002/jcc.21759. Accessed 1 Apr 2025 [DOI] [PubMed]
  • 33.Ekström U, Visscher L, Bast R et al (2010) Arbitrary-order density functional response theory from automatic differentiation. J Chem Theory Comput 6:1971–1980. 10.1021/ct100117s [DOI] [PubMed] [Google Scholar]
  • 34.Brandenburg JG, Bannwarth C, Hansen A, Grimme S (2018) B97–3c: a revised low-cost variant of the B97-D density functional method. J Chem Phys 148:064104. 10.1063/1.5012601 [DOI] [PubMed] [Google Scholar]
  • 35.Grimme S, Hansen A, Ehlert S, Mewes J-M (2021) r2SCAN-3c: a “Swiss army knife” composite electronic-structure method. J Chem Phys 154:064103. 10.1063/5.0040021 [DOI] [PubMed] [Google Scholar]
  • 36.Kruse H, Grimme S (2012) A geometrical correction for the inter- and intra-molecular basis set superposition error in Hartree-Fock and density functional theory calculations for large systems. J Chem Phys 136:154101. 10.1063/1.3700154 [DOI] [PubMed] [Google Scholar]
  • 37.Caldeweyher E, Bannwarth C, Grimme S (2017) Extension of the D3 dispersion coefficient model. J Chem Phys 147:034112. 10.1063/1.4993215 [DOI] [PubMed] [Google Scholar]
  • 38.Weigend F, Ahlrichs R (2005) Balanced basis sets of split valence, triple zeta valence and quadruple zeta valence quality for H to Rn: design and assessment of accuracy. Phys Chem Chem Phys 7:3297–3305. 10.1039/B508541A [DOI] [PubMed] [Google Scholar]
  • 39.Weigend F (2006) Accurate Coulomb-fitting basis sets for H to Rn. Phys Chem Chem Phys 8:1057–1065. 10.1039/B515623H [DOI] [PubMed] [Google Scholar]
  • 40.Stephens PJ, Devlin FJ, Chabalowski CF, Frisch MJ (1994) Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields. J Phys Chem 98:11623–11627. 10.1021/j100096a001 [Google Scholar]
  • 41.Mardirossian N, Head-Gordon M (2016) ω B97M-V: A combinatorially optimized, range-separated hybrid, meta-GGA density functional with VV10 nonlocal correlation. J Chem Phys 144:214110. 10.1063/1.4952647 [DOI] [PubMed] [Google Scholar]
  • 42.Vydrov OA, Van Voorhis T (2010) Nonlocal van der Waals density functional: the simpler the better. J Chem Phys 133:244103. 10.1063/1.3521275 [DOI] [PubMed] [Google Scholar]
  • 43.Hujo W, Grimme S (2011) Performance of the van der Waals density functional VV10 and (hybrid)GGA variants for thermochemistry and noncovalent interactions. J Chem Theory Comput 7:3866–3871. 10.1021/ct200644w [DOI] [PubMed] [Google Scholar]
  • 44.Goerigk L, Grimme S (2011) Efficient and accurate double-hybrid-meta-GGA density functionals—evaluation with the extended GMTKN30 database for general main group thermochemistry, kinetics, and noncovalent interactions. J Chem Theory Comput 7:291–309. 10.1021/ct100466k [DOI] [PubMed] [Google Scholar]
  • 45.Chai J-D, Head-Gordon M (2009) Long-range corrected double-hybrid density functionals. J Chem Phys 131:174105. 10.1063/1.3244209 [DOI] [PubMed] [Google Scholar]
  • 46.Dunning TH Jr (1989) Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen. J Chem Phys 90:1007–1023. 10.1063/1.456153 [Google Scholar]
  • 47.Pritchard BP, Altarawy D, Didier B et al (2019) A new basis set exchange: an open, up-to-date resource for the molecular sciences community. J Chem Inf Model 59:4814–4820. 10.1021/acs.jcim.9b00725 [DOI] [PubMed] [Google Scholar]
  • 48.Schuchardt KL, Didier BT, Elsethagen T et al (2007) Basis set exchange: a community database for computational sciences. J Chem Inf Model 47:1045–1052. 10.1021/ci600510j [DOI] [PubMed] [Google Scholar]
  • 49.Hehre WJ, Ditchfield R, Pople JA (1972) Self-consistent molecular orbital methods. XII. Further extensions of Gaussian-type basis sets for use in molecular orbital studies of organic molecules. J Chem Phys 56:2257–2261. 10.1063/1.1677527 [Google Scholar]
  • 50.Ditchfield R, Hehre WJ, Pople JA (1971) Self-consistent molecular-orbital methods. IX. An extended Gaussian-type basis for molecular-orbital studies of organic molecules. J Chem Phys 54:724–728. 10.1063/1.1674902 [Google Scholar]
  • 51.Feller D (1996) The role of databases in support of computational chemistry calculations. J Comput Chem 17:1571–1586. 10.1002/(SICI)1096-987X(199610)17:13<1571::AID-JCC9>3.0.CO;2-P [Google Scholar]
  • 52.Hariharan PC, Pople JA (1973) The influence of polarization functions on molecular orbital hydrogenation energies. Theor Chim Acta 28:213–222. 10.1007/bf00533485 [Google Scholar]
  • 53.Krishnan R, Binkley JS, Seeger R, Pople JA (1980) Self-consistent molecular orbital methods. XX. A basis set for correlated wave functions. J Chem Phys 72:650–654. 10.1063/1.438955 [Google Scholar]
  • 54.A cartography of the van der Waals territories—Dalton Transactions (RSC Publishing) [DOI] [PubMed]
  • 55.Bolton EE, Chen J, Kim S et al (2011) PubChem3D: a new resource for scientists. J Cheminform 3:32. 10.1186/1758-2946-3-32 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 56.Software update: the ORCA program system, version 4.0—Neese—2018—WIREs computational molecular science. Wiley Online Library. 10.1002/wcms.1327. Accessed 1 Apr 2025
  • 57.Lu T, Chen F (2012) Multiwfn: a multifunctional wavefunction analyzer. J Comput Chem 33:580–592. 10.1002/jcc.22885 [DOI] [PubMed] [Google Scholar]
  • 58.Edward F. Valeev Libint: a library for the evaluation of molecular integrals of many-body operators over Gaussian functions
  • 59.Lehtola S, Steigemann C, Oliveira MJT, Marques MAL. Libxc—a comprehensive library of functionals for density functional theory. 2018
  • 60.Alibakhshi A, Schäfer LV (2024) Electron iso-density surfaces provide a thermodynamically consistent representation of atomic and molecular surfaces. Nat Commun 15:6086. 10.1038/s41467-024-50408-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 61.Palm K, Stenberg P, Luthman K, Artursson P (1997) Polar molecular surface properties predict the intestinal absorption of drugs in humans. Pharm Res 14:568–571. 10.1023/A:1012188625088 [DOI] [PubMed] [Google Scholar]
  • 62.Waterbeemd H, Kansy M (1992) Hydrogen-bonding capacity and brain penetration. Chimia 46:299–299. 10.2533/chimia.1992.299 [Google Scholar]
  • 63.Veber DF, Johnson SR, Cheng H-Y et al (2002) Molecular properties that influence the oral bioavailability of drug candidates. J Med Chem 45:2615–2623. 10.1021/jm020017n [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Additional file 1. (1.5MB, docx)

Data Availability Statement

The CalVSP application is publicly available on GitHub at https://github.com/CalVSP/CalVSP.git under the MIT License. The README file in the GitHub repository provides information about how to set up and use the application. The tutorials on CalVSP are available on GitHub at https://github.com/CalVSP/CalVSP.git.


Articles from Journal of Cheminformatics are provided here courtesy of BMC

RESOURCES