Abstract
The COSMO-SAC modeling approach has found wide application in science as well as in a range of industries due to its good predictive capabilities. While other models for liquid phases, as for example UNIFAC, are in general more accurate than COSMO-SAC, these models typically contain many adjustable parameters and can be limited in their applicability. In contrast, the COSMO-SAC model only contains a few universal parameters and sub-divides the molecular surface area into charged segments that interact with each other. In recent years, additional improvements to the construction of the sigma profiles and evaluation of activity coefficients have been made. In this work, we present a comprehensive description how to postprocess the results of a COSMO calculation through to the evaluation of thermodynamic properties. We also assembled a large database of COSMO files, consisting of 2261 compounds, freely available to academic and noncommercial users.
We especially focus on the documentation of the implementation and provide the optimized source code in C++, wrappers in Python, sample sigma profiles calculated from each approach, as well as tests and validation results. The misunderstandings in the literature relating to COSMO-SAC are described and corrected. The computational efficiency of the implementation is demonstrated.
Keywords: sigma profile, vapor-liquid-equilibria, COSMO-SAC, open-source
Graphical Abstract
1. Introduction
The calculation of thermodynamic properties of multi-component mixtures is of great importance for the chemical industry. The reason for this is that experimental measurements of mixtures are very time-consuming, costly or involve a high risk when measuring in extreme conditions or considering toxic fluids. In the past, many successful models have been developed. The accuracy of results of predictive models such as the Group Contribution Method (GCM) is based on numerous adjustable parameters which have to be fitted to experimental data. If no adjusted parameters for specific group interactions exist, the model cannot be used. A more predictive alternative to GCM’s such as UNIFAC,1–6 are models based on quantum mechanical conductor-like screening model (COSMO) calculations, originally proposed by Klamt et al. (conductor-like screening model for real solvents, COSMO-RS).7–9 Based on the COSMO-RS model, Lin and Sandler10 developed the COSMO segment activity coefficient model (COSMO-SAC). In these models, the interactions of molecules in a mixture are not modeled as pairwise molecular group interactions, but rather as pairwise interactions of charged surface segments of the molecule that can be obtained from quantum mechanical calculations when the molecule is placed in a perfect conductor. COSMO-based models are models for liquid mixtures which typically depend on temperature and composition only, i.e., the pressure-dependency is usually neglected. However, note that pressure-dependency can be introduced to the model by either combining COSMO-based models with equations of state (see, e.g., refs 11–20) or by modifying the model itself (see, e.g., refs 21,22). While COSMO-based models are useful tools for predicting properties of mixtures, their correct implementation can be tricky and time-consuming. The purpose of this work is therefore to provide a reference implementation of three COSMO-SAC models, which are: the original COSMO-SAC model by Lin and Sandler10 and the modifications of this model by Hsieh et al.23,24. These models will be discussed in more detail in section 3.1 (the original model COSMO-SAC-200210), section 3.2 (COSMO-SAC-201023), and section 3.3 (COSMO-SAC-dsp24). The COSMO-SAC model can in principle be applied to all types of liquid mixtures. Fingerhut et al. examined thoroughly its performance for over ten thousand binary mixtures based on 2295 compounds, including water.25 The method can also be used for polymers26 and, when combined with the Pitzer-Debye-Hückel model, for electrolyte27,28 and ionic liquids.29,30 The article is organized thematically by addressing the following aspects of the COSMO-SAC model:
Preprocessing:
Generate sigma profile(s) from the results of the COSMO quantum mechanical calculation
Split the profile into hydrogen bonding parts if desired
Calculate dispersive contributions
Use:
Calculate activity coefficients and the excess Gibbs energy
Calculate phase equilibria by combining COSMO-SAC with the ideal gas law
2. Part I: COSMO file processing
The COSMO file obtained as the output of a quantum mechanical density-functional-theory calculation is a text file of non-standardized format containing the results of the calculation and information about the molecule. The information needed for creating the sigma profile (see section 2.2) in order to conduct a COSMO-SAC calculation is essentially:
Volume and surface area of the molecule
Positions of all nuclei
Location of each segment patch of the molecule, together with its area and charge
The units of the parameters in the COSMO files are frequently not specified. This un-fortunate historical design decision has led to many mistakes in publications and implementations (see for instance Section 2.2). Therefore, users must be exceptionally careful to ensure that a consistent set of units is used. The most frequent source of confusion is the length unit, which is sometimes given in Bohr radius (atomic units), and sometimes in Ångstroms (Å). The conversion factor from Bohr radius to Å is not large in magnitude (1 Bohr radius a0 ≈ 0.52918 Å31), further muddying the waters.
2.1. Atoms, Bonds, and Dispersion
The structural information of the molecules is not required for the original model COSMO-SAC-2002 (see section 3.1). However, the structure of the molecule is important for the more advanced models COSMO-SAC-2010 (see section 3.2) and COSMO-SAC-dsp (see section 3.3).
Position data of all nuclei are read from the COSMO file and converted to the Ångstrom scale by multiplying with the conversion factor 0.52917721067 Å/(Bohr radius) as given by CODATA.31 These nuclei position data are used to determine: a) which atoms are bonded to each other, and b) what type of hybridization of the electron orbitals is present in the atom, used in the analysis of dispersion.
The pairwise distance between each pair of nuclei m and n, in Å, is calculated from
(1) |
Whether atoms m and n are covalently bonded is determined by comparing their distance and the sums of the covalent radii of the atoms of the pair. If the distance between atoms is less than the sum of the covalent radii, the atoms are assumed to be bonded together. Covalent radii were obtained from Ref.,32 and these values are also used in the OpenBabel (v2.3.1) cheminformatics library. The covalent radius for carbon was taken to be 0.76 Å (equal to that of the sp3 hybridization).
2.1.1. Hydrogen bonding
COSMO-SAC-2010 and COSMO-SAC-dsp take different types of hydrogen bonding into account. The molecular surface is separated into segments that are non-hydrogen-bonding (nhb) and segments that form hydrogen bonds. The hydrogen-bonding segments are further divided into hydrogen bonds of an hydroxyl group (OH) and other hydrogen bonds (OT). Other hydrogen bonds (OT) consider surface segments of the atoms nitrogen (N), oxygen (O), and fluorine (F) as well as hydrogen (H) atoms bonded to N or F. Therefore, information about bonding needs to be obtained from the COSMO file. The source code for determining nhb, OT, and OH segments is organized as follows: Once it has been determined which atoms are bonded to each other, the hydrogen bonding class of each atom is determined. If the atom is not in the set of (O, H, N, F), the atom is not considered to be a candidate for hydrogen bonding in the COSMO-SAC framework, and is given the hydrogen bonding flag of ”NHB” (non-hydrogen-bonding). If the atom is an N or an F, the atom is considered to hydrogen bond, but is not in the OH family, and therefore, it is given the ”OT” designation (hydrogen bonding, but not OH). If the atom is O or H, the hydrogen bonding class of the atom is:
OH: if the atom is O and is bonded to an H, or vice versa
OT: if the atom is O and is bonded to an atom other than H, or if the atom is H and is bonded to N or F
NHB: otherwise
2.1.2. Dispersion
The models COSMO-SAC-dsp24 and COSMO-SAC 201333 take the dispersion contribution to the activity coefficient into account. The dispersive interactions have been considered by Hsieh et al.24 by assuming equally sized atoms with a size parameter σ = 3 Å of the Lennard-Jones potential and assigned a dispersion parameter ϵAtom to each atom forming a molecule. They proposed to compute the dispersion parameter of the molecule ϵMolecule from the relation
(2) |
where ϵAtom,i/kB is the dispersion parameter of atom i, n is the number of atoms in molecule i, and NAtom is the total number of atoms for which . The molecular dispersion parameter for the molecule ϵMolecule/kB depends on the atomic structure of the molecule and on the dispersion parameters of each atom ϵAtom,i/kB. A dispersion parameter is attached to each atom, depending on its orbital hybridization. The orbital hybridization of an atom in a molecule is determined by the number of atoms that are bonded to it. In the case of carbon, sp3 hybridization corresponds to four, sp2 to three, and sp to two bonded neighbor atoms. In the case of nitrogen, sp3 and sp2 hybridization corresponds to three and two bonded neighbor atoms, respectively, and sp to one bonded neighbor atom.
In addition, the w parameter for the COSMO-SAC-dsp model contains additional molecule specific information (see section 3.3). To calculate w, the dispersive nature of the molecules are classified into categories:
DSP_WATER indicates water
DSP_COOH indicates a molecule with a carboxyl group
DSP_HB_ONLY_ACCEPTOR indicates that the molecule is only a hydrogen bonding acceptor
DSP_HB_DONOR_ACCEPTOR indicates that the molecule is a hydrogen bonding acceptor and donor
DSP_NHB indicates that the molecule is non-hydrogen-bonding
If the molecule is a water molecule or if the molecule contains a COOH-group, the molecule is tagged as DSP_WATER or DSP_COOH, respectively. Following the implementation of COSMO-SAC-dsp,24 a molecule is treated as DSP_HB_ONLY_ACCEPTOR if the molecule contains any of the atoms O, N, or F but no H-atoms bonded to any of these O, N, or F. Molecules with NH, OH, or FH (but not OH of COOH or water) functional groups are treated as DSP_HB_DONOR_ACCEPTOR. If the molecule meets neither of the hydrogen-bonding criteria and is not water and does not contain a COOH group, it is handled as a non-hydrogen bonding molecule and tagged as DSP_NHB. For the COSMO-SAC-dsp model, if an atom other than C, H, O, N, F, Cl is included, the associated value of ϵAtom,i/kB is set to an undefined value and the calculation is aborted.
2.1.3. Example
The block of the COSMO file with atom locations looks something like the example in Fig. 2 taken from the database of Mullins et al.34 In case of a .cosmo file from DMol3, the location of the atoms (x, y, z) are given in Å. For in stance, the x, y, z position of the first hydrogen nucleus (H1) is (0.888162953 Å, −1.326789759 Å, −0.880602803 Å).
2.2. Sigma Profile Construction
When doing a quantum mechanical COSMO calculation, the output file supplies a charge density on each surface element with area an as well as its charge. These numerical values for the surface charge density are truncated to a few significant digits in the COSMO file. When using the charge density values as given in the COSMO file instead of recalculating the charge density by dividing the charge by the surface area of the segment, the differences can be on the order of a few percent. Therefore, for full replicability of numerical values with the values in this work, the charge density of each segment must be calculated from the charge given in the COSMO file divided by the area given in the COSMO file.
The following averaging equation was originally used by Klamt et al.8 for COSMO-RS
(3) |
where is the original, non-averaged, surface charge of the n-th segment given in elementary charge e coming directly from the COSMO file, rn = (an/π)0.5, rav = 0.5 Å, and dmn is the distance (in Å) between the centers of the surface segments n and m in Å.
Lin and Sandler10 used an effective radius for averaging of reff = (aeff /π)0.5 with aeff = 7.5 Å2 , but otherwise applied a similar methodology as Klamt. Due to a confusion of units in the COSMO file generated by DMol3 (Lin and Sandler10 thought the coordinates of the centers of the segments used to calculate the distances were in Å but they were in Bohr radii), in the erratum35 to their original article Lin and Sandler10 had to provide a unit conversion parameter fdecay to correct the distance dmn from Å to Bohr radius (1 Bohr radius a0 ≈ 0.52918 Å,31 thus fdecay = 0.52918−2 ≈ 3.57). The corrected equation is given as
(4) |
where dmn is the distance (in Å) between the centers of the surface segments n and m, reff = (aeff /π)0.5 in Å. Nonetheless, this equation remains dimensionally inconsistent (the argument of the exponential function has units of bohr2/Å2); the parameter fdecay should therefore be thought of as a dimensionless scaling quantity (only). Hence, the practical interpretation of fdecay is that, first, the coordinates of the segments given in bohr are converted to Å and, second, fdecay is used to scale the values given in Å back to the numerical values of bohr (keeping Å as unit).
The model of Lin and Sandler10,35 was made available as a Fortran source code together with a comprehensive sigma-profile database in the very useful work of Mullins et al.34 They computed the sigma profiles with density-functional-theory calculations using the software DMol3. Note that the parametrization and the results of COSMO models in general depend on the underlying method and software with which the sigma profiles are calculated.36 Hence, it is very important for the comparability and evaluation of these models to use exactly the same set of sigma profiles.
It is furthermore important to note that in the Fortran code for the computation of the sigma profiles, Mullins et al.34 used the same averaging equation as Klamt (Eq. (3)), but with rav = 0.81764 Å. The use of an averaging radius of rav = 0.81764 Å and fdecay = 1 is equivalent (when assuming ) to the assumption of dividing numerator and denominator by fdecay, such that the fdecay correction is applied to the averaging radius.
Once the averaged value of σm has been obtained for each segment m, the p(σ)Ai values (probability p(σ) of finding a given segment with specified value of σ multiplied with the entire surface area Ai of molecule i, which gives the surface areas Ai(σm) of molecule i with charge densities σm) need to be obtained on gridded values. The values of σ for which the p(σ)A values are to be obtained is generally - 0.025 e/Å2 to 0.025 e/Å2 in increments of 0.001 e/Å2, forming a set of 51 points.
Subsequently for each value of σ (in e/Å2), the (0-based) index corresponding to the left of the value is obtained from
(5) |
the fractional distance of the value between the left and right edges of the cell are given by
(6) |
which is by definition between 0 and 1. The area of the segment is then distributed between the gridded sigma values above and below the value, according to the weighting parameter w:
(7) |
(8) |
Figure 3 illustrates the construction of the sigma profile for ethanol. For each patch, the value of sigma is obtained, and from that, the value of sigma is then distributed amongst the gridded values. As can be seen in this plot, the histogram is constructed from a relatively small number of patches.
2.2.1. Example
Figure 4 gives an example of a COSMO file from DMol3. It is important to note that the units are not specified for the areas or the charge. It is especially problematic that the length units differ within the same line of the file (the area in units of Å2, and segment positions are in Bohr radius (from the atomic units (a.u.) system of measurement)), a source of confusion for many authors, ourselves included. Nonetheless, this is a standard file format.
2.3. Splitting of Profiles
Lin et al.37 proposed to split the sigma profile into hydrogen-bonding (hb) and non-hydrogen-hydrogen-bonding (nhb) segments with . Hydrogen-bonding atoms were defined to be oxygen, nitrogen, and fluorine atoms as well as hydrogen atoms bound to one of oxygen, nitrogen, or fluorine. Hence, all surfaces belonging to the aforementioned atoms contribute to the sigma profile and the other atoms forming molecule i contribute to the sigma profile . Hsieh et al.23 suggested to further split the hydrogen-bonding sigma profile into interactions of surfaces belonging to groups of oxygen and hydrogen (OH) and surfaces belonging to other groups (OT).
Each atom of the molecule is assigned to be in one of the hydrogen-bonding classes:
NHB: the atom is not a candidate to hydrogen bond
OH: the atom is either the oxygen or the hydrogen in a OH hydrogen-bonding pair
OT: the atom is N, F, or an oxygen that is not part of an OH bonding group
Though these classes are consistent with the work of Hsieh et al.,23 they do not consider the fact that the H of a COOH group (likewise for other similar groups) is delocalized between the two oxygens of the group.
(9) |
A currently undocumented feature of the profile splitting is that the contribution for a given segment is deposited into the NHB, OH, or OT sigma profiles depending on its averaged charge density in the following manner:
If the segment belongs to an O atom, and the hydrogen-bonding class of the atom is OH, and the averaged charge density value of the segment is greater than zero, the segment goes into the OH profile.
If the segment belongs to an H atom, and the hydrogen-bonding class of the atom is OH, and the averaged charge density value of the segment is less than zero, the segment goes into the OH profile.
If the segment belongs to an O, N, or F atom, and the hydrogen-bonding class of the atom is OT, and the averaged charge density value of the segment is greater than zero, the segment goes into the OT profile.
If the segment belongs to an H atom, and the hydrogen-bonding class of the atom is OT, and the averaged charge density value of the segment is less than zero, the segment goes into the OT profile.
Otherwise, the segment goes into the NHB profile.
By definition, the superpositioned profiles (NHB + OH + OT) must equal the original profile (see Eq. (9)). Wang et al.38 proposed the use of a Gaussian-type function for the probability P of a hydrogen-bonding segment to indeed form a hydrogen bond
(10) |
with σ0 = 0.007 e/Å2 . This probability function considers that not all surfaces belonging to potentially hydrogen-bond forming atoms in fact form hydrogen bonds. With the probability function for hydrogen bonding according to Eq. (10), it follows
(11) |
(12) |
(13) |
where is the surface area of non-hydrogen-bonding atoms of molecule i, is the surface area of hydrogen-bonding segments of molecule i, is the surface area of hydrogen-bonding segments of OH-groups of molecule i, and is the surface area of hydrogen-bonding segments other than OH.
2.4. Implementation, Validation and Verification
A Python script profiles/to_sigma.py was written that fully automates the process of reading in a COSMO file from a DMol3 calculation and generates the split profiles as well as calculates the dispersion parameters. The output file with sigma profiles is a space-delimited text file with additional metadata stored in JSON (JavaScript Object Notation) format in the header of the sigma profile. In case of discrepancies between the description above and the Python code, the latter should be used as the reference. This Python script makes heavy use of vectorized matrix operations of the numpy matrix library, especially in case of the sigma averaging, the computationally most expensive part of sigma profile generation. Furthermore, the sigma profile generation from COSMO files is automated with the Python script profiles/generate_all_profiles.py, which generates sigma profiles in parallel. All of these scripts are available in the provided code.
The sigma profiles, dispersion flags, and dispersion parameters are available in the supplemental material for 2261 molecular species. Parameters to carry out the sigma profile averaging (rav, fdecay) are documented in the header of each sigma profile for reproducibility. All parameters are specified along with their units (where appropriate).
Before carrying out any COSMO-SAC calculations, users should first verify that their implementation yields exactly the same sigma profiles compiled in the supplemental material when processing the provided COSMO files. Values of p(σ) A should agree to within at least 10−15.
Finally, a segment charge visualization tool was written with the three.js javascript library. This tool, driven by a Python-based script, reads the COSMO file and generates an HTML file with the data of the locations and orientations of the segments (and the atoms). The visualization scene is constructed with three.js and behind the scenes, WebGL powers the 3D visualization, allowing for a seamless visualization in three dimensions, even for rather large molecules. This approach is cross-platform, fully open-source, and while intended to be rudimentary, could easily be extended by users for their own application. An example of the visualization tool is provided in Fig. 1, and other examples are available.
3. Part II: Activity Coefficient Calculation
3.1. Model of Lin and Sandler – COSMO-SAC 2002
To generate the sigma profile, in accordance with the work of Mullins et al.,39 we have used the equation from Klamt (Eq. (3)) with the value of effective radius defined by rav = 0.81764 Å, and all radii in Å. The model parameters are summarized in Table 1.
Table 1:
Parameter | Value | Value (SI) |
---|---|---|
q0 | 79.53 Å2 | 79.53 × 10−20 m2 |
r0 | 66.69 Å3 | 66.69 × 10−30 m3 |
z | 10 | 10 |
rav | 0.81764 Å | 8.1764 × 10−11 m |
aeff | 7.5 Å2 | 7.5 × 10−20 m2 |
chb | 85580 kcal Å4 mol−1 e−2 | 1.3949003091892562 × 106 J m−4 mol−1 C−2 |
σhb | 0.0084 e Å−2 | 0.134582837256 C m−2 |
α′† | 16466.72 kcal Å4 mol−1 e−2 | 2.6839720518033312 × 105 J m−4 mol−1 C−2 |
R | 0.001987 kcal mol−1 K−1 | 8.313608 J mol−1 K−1 |
: Mullins et al. used an erroneous value in their Table 1. The correct value for the misfit energy parameter10 is obtained from , with ε0 = 2.395 × 10−4 (e2 mol)/(kcal Å), ϵ = 3.667, fpol = (ϵ – 1)/(ϵ + 0.5), and aeff = 7.5 Å2, according to the FORTRAN code of Mullins et al.
According to Lin and Sandler,10 the activity coefficient γi,S of component i in the liquid mixture S can be obtained from the equation
(14) |
where the combinatorial part accounts for the size and shape differences of the molecules. This quantity is usually described by the Staverman-Guggenheim combinatorial term
(15) |
with
(16) |
(17) |
(18) |
and
(19) |
Here r0 = 66.69 Å3 denotes the normalized volume parameter and
(20) |
where q0 = 79.53 Å2 denotes the normalized surface area parameter and z is the coordination number, which was chosen to be 10. Note that r0 is not needed to calculate the combinatorial term as it cancels out internally in Eq. (15), see, e.g., Ref. 33. Vi is the molecular volume of component i and Ai is the molecular surface area of component i coming from the COSMO calculations. Note that in the article by Lin and Sandler,10 there are misplaced parentheses in the equation for li, which was corrected in Eq. (18) (compare, e.g., Ref.1).
In the case of infinite dilution, when one of the mole fractions goes to zero, Eqs. (16) and (17) are ill-defined because division by zero occurs. The terms in Eq. (15) leading to division by zero can be rewritten as
(21) |
(22) |
(23) |
and were used throughout.
The residual part , which is also called the restoring free energy part, mainly accounts for electrostatic interactions between the molecules in the mixture. According to a statistical mechanical derivation by Lin and Sandler,10 the residual part of the activity coefficient can be obtained as follows
(24) |
where ni denotes the number of surface segments of molecule i with a standard segment surface area aeff and can be calculated according to:
(25) |
σm is the screening charge density of segment m, which is the average screening charge of the surface segment divided by aeff , pi (σm ) denotes the probability of finding a segment with screening charge density σm on the surface of component i, ΓS (σm) is the activity coefficient of segment m in the mixture, and Γi (σm) is the activity coefficient of segment m in the mixture of segments of only pure component i. The quantity pi (σm) is called the sigma profile of pure component i and is defined as
(26) |
where Ai(σm) is the surface area with screening charge density σm of a molecule of species i and Ai is again the entire surface area of the molecule of species i. The sigma profile of the mixture S can then be obtained by
(27) |
where xi is the mole fraction of component i. The segment activity coefficient in the mixture can be calculated from
(28) |
where the sum on the right hand side goes over all charge densities σn in the mixture. Note that Eq. (28) needs to be solved numerically for the segment activity coefficients ΓS (σm). The quantity ∆W (σm, σn) is called the exchange energy and can be calculated from
(29) |
where the first term on the right hand side is the misfit energy, accounting for the electrostatic interactions, and the second term on the right hand side accounts for hydrogen-bonding interactions. The values of the generalized parameters are: α′ = 16466.72 kcal Å4 mol−1 e−2, chb = 85580 kcal Å4 mol−1 e−2, and σhb = 0.0084 e Å−2 (with 1 kcal = 4184 J). Note that – in accordance with the FORTRAN code supplied by Mullins et al.34 – the value for α′ given in the article of Lin and Sandler10 was used rather than the different value provided in the article by Mullins et al.34 Furthermore, σacc = max(σm, σn) and σdon = min(σm, σn)) holds. Γi (σm) and ΓS (σm) have a similar form
(30) |
3.2. Model of Hsieh et al. – COSMO-SAC 2010
In 2010, Hsieh et al.23 suggested an improvement of COSMO-SAC for phase equilibrium calculations based on the modifications published by Lin et al.37 and Wang et al.38 Hsieh et al. proposed two modifications for Eq. (29), which in their model reads
(31) |
where the superscripts t and s denote different types of sigma profiles. The first modification concerns the electrostatic interaction parameter cES, which was made temperature dependent
(32) |
with AES = 6525.69 kcal Å4 mol−1 e−2 and BES = 1.4859 × 108 kcal Å4 K2 mol−1 e−2. The second modification concerns the hydrogen-bonding term given in Eq. (31). With this distinction, the parameter chb is defined as follows
(33) |
where cOH−OH = 4013.78 kcal Å4 mol−1 e−2, cOT−OT = 932.31 kcal Å4 mol−1 e−2, and cOH−OT = 3016.43 kcal Å4 mol−1 e−2. Due to the separation of the sigma profiles into nhb, OT, and OH contributions, an additional sum over the different sigma-profiles needs to be introduced in Eqs. (24), (28), and (30). Equation (24) becomes
(34) |
and Eq. (29) becomes
(35) |
Equation (30) can again be obtained by changing the index S to i in Eq. (35). For COSMO-SAC 2010, Hsieh et al.23 used the sigma profile database of Mullins et al.34
Furthermore the value for aeff was changed to 7.25 Å2 and the sigma profile averaging equation according to Eq. (4) was used. A summary of the model parameters is given in Table 2.
Table 2:
Parameter | Value | Value (SI) |
---|---|---|
q0 | 79.53 Å2 | 79.53 ×10−20 m2 |
r0 | 66.69 Å3 | 66.69 × 10−30 m3 |
z | 10 | 10 |
aeff | 7.25 Å2 | 7.25 × 10−20 m2 |
reff | (aeff/π)0.5 | |
fdecay | 3.57 | 3.57 |
cOH−OH | 4013.78 kcal Å4 mol−1 e−2 | 6.542209585204081 × 104 J m4 mol−1 C−2 |
cOT-OT | 932.31 kcal Å4 mol−1 e−2 | 1.5196068091379239 × 104 J m4 mol−1 C−2 |
cOH-OT | 3016.43 kcal Å4 mol−1 e−2 | 4.916591656517583 × 104 J m4 mol−1 C−2 |
σ0 | 0.007 e Å−2 | 0.11215236437999998 C m2 |
AES | 6525.69 kcal Å4 mol−1 e−2 | 1.06364652940795 × 105 J m4 mol−1 C−2 |
BES | 1.4859 108 kcal Å4 K2 mol−1 e−2 | 2.421923778247623 × 109 J m4 K2 mol−1 C−2 |
NA | 6.022140758 × 1023 mol−1 | 6.022140758 × 1023 mol−1 |
kB | 1.38064903 × 10−23 J K−1 | 1.38064903 × 10−23 J K−1 |
R | kBNA/4184 kcal mol−1 K−1 | kBNA J mol−1 K−1 |
3.3. Model of Hsieh at el. – COSMO-SAC-dsp
On the basis of their model modification summarized in section 2.2, Hsieh et al.24 proposed to also take dispersive interactions between the molecules into account, which were entirely neglected before. The activity coefficient of component i in the mixture S then becomes
(36) |
with being the contribution to the activity coefficient due to dispersion. The combinatorial part and the residual part ln are calculated in the same way and with the same parameters as given in sections 2.1 and 2.2, respectively. Hsieh et al.24 suggest the use of the one-constant Margules equation for the calculation of the dispersive interaction and give the following equations for a binary mixture of components 1 and 2
(37) |
As given in the article by Hsieh et al.,24 the parameter A can be calculated according to
(38) |
with the definition of w given in a corrigendum by Hsieh et al.40
(39) |
Hsieh et al.24 define substances as hb-only-acceptor, if they are able to form a hydrogen-bond by accepting a proton from its neighbor and hb-donor-acceptors as substances that are able to form hydrogen-bonds by either providing or accepting a proton from its neighbors. All substances containing a carboxyl group are denoted with COOH. Note that there is a typographical error in the corrigendum, where the value w = −0.27027 is proposed for the combination of “COOH + nhb or hb-only-acceptor”, whereas in the article, this value for w is given for the combination of “COOH + nhb or hb-donor-acceptor”. Furthermore, note that there also is a confusion of units in the original article, as the constant A should be dimensionless and the dispersion parameters are usually divided by Boltzmann’s constant kB = 1.380649 × 10−23 J K−1 (value taken from41). Therefore, we implemented Eq. (38) and Eq. (39) as follows
(40) |
and
(41) |
The dispersion parameters of the atoms have been fitted to experimental data by Hsieh et al.24 and are listed in Table 3 (already considering the corrigendum40). The other model parameters stayed the same as already given in Table 2.
Table 3:
Atom type i | (ϵAtom,i/kB) / K | Note |
---|---|---|
C (sp3) | 115.7023 | bonded to four others |
C (sp2) | 117.4650 | bonded to three others |
C (sp) | 66.0691 | bonded to two others |
N (sp3) | 15.4901 | bonded to three others |
N (sp2) | 84.6268 | bonded to two others |
N (sp) | 109.6621 | bonded to one other |
–O– | 95.6184 | bonded to two others |
=O | −11.0549 | double-bonded O |
F | 52.9318 | bonded to one other |
Cl | 104.2534 | bonded to one other |
H (water) | 58.3301 | H in water |
H (OH) | 19.3477 | H-O bond (not water) |
H (NH) | 141.1709 | H bonded to N |
H (other) | 0 | |
other | invalid |
The QM/COSMO calculation results from the University of Delaware database33 were used for the development of COSMO-SAC-dsp model. This database can be considered as a revised and extended version of the VT-database34,39 and was developed with the co-operation of Stanley Sandler’s research group at the University of Delaware and Dr. S. Lustig, formerly at DuPont. The basis set used in Dmol3 was the GGA/VWN-BP/DNP functional with double numerical basis with polarization functions (DNP). The detailed procedure for obtaining the equilibrium geometry and the screening charges can be found in Ref. 34. The effects from using different combination of DFT methods and basis set were studied by Chen et al.42 More details are available in Xiong et al.33 A fully-open-source set of sigma profiles generated from an open-source quantum chemical tool is available43 at https://github.com/lvpp/sigma for the release 18.07,44 and their use in a COSMO approach has been investigated previously.45,46
4. Implementation Details for the COSMO-SAC Models
The numerical method used to solve the non-linear system of Eqs. (28) and (35) is the successive substitution method. This method is the solver used in the code of Mullins et al.34 forming the basis of this implementation. Successive substitution is characterized as being both reliable and slowly convergent. Analysis of the successive substitution method and a comparison with the Newton-Raphson method is provided in Possani and de P. Soares.47
To solve the segment activity coefficient of Eqs. (28) and (35), initial values have to be specified for ΓS (σn) and , respectively. Therefore all values of Γ are set to unity before initiating the calculation for all intervals of the considered pure molecule or mixture. After the first iteration, the newly calculated Γ will be averaged with the previous values and the differences between the averaged and former values ∆Γ will serve as convergence criteria. Only when ∆Γ of every interval reaches the convergence criterion (here, that the maximum absolute difference between values of Γ is less than 10−8), the successive substitution will be terminated, otherwise the iteration starts again with the averaged Γ substituting the previous initial values.
Furthermore, the equations can be rewritten to remove the evaluation of the logarithm, as the evaluation of log(exp(x)) is computationally much more expensive than division. Therefore, a slightly more efficient implementation (here, demonstrated for the case of the mixture segment activity coefficients with one sigma profile) is
(42) |
In the present C++ code, the sum on the right hand side is carried out in a vectorized form with matrix-vector operations from the Eigen library. Furthermore, the matrix
(43) |
can be precalculated, as it does not depend on the current value of ΓS,old. This operation can be carried out by multiplying each row of the matrix exp(−∆W/(RT )) by pS (σn) in a coefficient-wise sense.
In the code by Mullins et al.,34 the sum in Eq. (42) contains all elements of the sigma profile, but this is not necessary as charge densities which do not exist in a molecule do not contribute to the sum. Especially in the case of relatively nonpolar molecules (e.g., the alkanes), only a small fraction of the range of σ is populated. Therefore, it is necessary to, at the time of loading the model, determine the range of σ that is found in any molecule. The range of σ is obtained by considering the non-hydrogen-bonding profile of an equimolar mixture of components. It is not necessary to consider the OT or OH profiles because the NHB will at least have a small contribution from each of the other profiles, according to Eq. (10). The minimum and maximum values of σ with a contribution p(σ)A greater than zero are retained, and only these elements are evaluated.
5. Vapor-Liquid Equilibrium Calculations
Pressure-composition diagrams are created by calculating boiling and dew pressures for a selection of compositions to generate the complete boiling and dew point curves. In order to calculate phase equilibria, the fugacity of the vapour phase for each component i has to be equal to the fugacity of the liquid phase of the same component in a mixture with given mole fractions xi
(44) |
The fugacity is defined as fi = φixip and the activity coefficient for the liquid phase as , so that Eq. (44) becomes
(45) |
with being the fugacity coefficient of the liquid phase of pure component i, the fugacity coefficient of the vapor phase of component i in the mixture and the activity coefficient of the liquid phase of component i. The first assumption is for the vapor phase as it is assumed to be an ideal gas. The second assumption is being independent of pressure p. Equating the fugacity of pure component i in the liquid phase to that of the pure fluid at saturation, and then also to that of the vapor phase
(46) |
becomes the vapor pressure psat,i of the pure component i. Equation (45) is now given by
(47) |
For a binary mixture this leads to
(48) |
(49) |
Their sum is equal to
(50) |
so that the equilibrium pressure p can be directly calculated once the activity coefficients and are known. Then, the mole fraction of component i in the vapor phase is given by
(51) |
The same process is repeated for a range of mixture compositions to create the boiling and dew point curves. An example of an isothermal phase-equilibrium calculation is presented in Fig. 5. Two isotherms are plotted for the mixture ethanol + water, overlaid with experimental data. The code used to generate the figure is in the jupyter notebook COSMO-SAC.ipynb in the repository (and in the archive).
6. Code and Validation
The development code including the sigma profiles and COSMO-SAC post-processing is contained in a git repository at https://github.com/usnistgov/COSMOSAC. The archival version of the code used in this paper is furthermore stored at the DOI of https://doi.org/10.5281/zenodo.3669311. Permission from BioVia was obtained to make the .cosmo files available for academic and non-commercial use. Additional information about the .cosmo file database is given in a README file in the file profiles/UD/README.txt relative to the root of the code.
The code workflow mirrors the analysis described in this paper. As a pre-processing step, the sigma profiles are generated from each of the .cosmo files according to the Python script in profiles/to_sigma.py. This script has a command-line interface that allows for selection of the charge averaging scheme, how many contributions the sigma profiles should be subdivided into (1 or 3), and from this script a single .sigma profile is generated.
Once the sigma profile has been generated, the COSMO-SAC analysis is applied. This code allows for the calculation of the activity coefficients, among other outputs. Either the C++ or Python interfaces may be used, according to the user’s preference. A wide range of other numerical analysis programming environments now support calling Python in a nearly-native fashion, so between C++ and Python most users should be able to find a way to call the COSMO code.
Examples of the use of the COSMO-SAC implementation are provided in jupyter notebooks provided with the code, along with the calculation of phase equilibrium calculations, for which the saturation pressure curves for the pure fluids provided by CoolProp49 are used. A limitation of this method is that vapor pressure curves must be available for the given fluid. Alternatively, vapor pressure curves could be obtained with the consistent alpha function parameters for the Peng-Robinson equation of state of Bell et al.50
Further verification is provided by a large set of calculated values from our model. Users should first ensure that they can precisely regenerate these values prior to making use of the library. The script that generates the verification data is in the file profiles/generate_validation_data.py relative to the root of the code.
7. Conclusions
Since Lin and Sandler10 proposed the original COSMO-SAC model, many modifications and improvements for this model have been proposed. The reproduction of COSMO-SAC models from the literature is often challenging, because on the one hand the model results strongly depend on the sigma profiles, which themselves depend on the program used to calculate them. Therefore, it is crucial to use the same sigma profiles as the authors of the COSMO-SAC model in order to reproduce their model. On the other hand, some misunderstandings regarding the description of COSMO-SAC models exist in the literature, which further complicate the reimplementation of COSMO-SAC models. In this work, we provide an open source C++ and python implementation of three different COSMO-SAC models10,23,24, together with a detailed documentation of the implemented models. Furthermore, we provide a consistent set of sigma profiles calculated with the software DMol3 based on the database provided by Mullins et al.34. The corresponding COSMO output files and computer code to calculate the sigma-profiles from the COSMO output files is also provided. Thus, this work intends to provide an open-source reference implementation of state-of-the-art COSMO-SAC models.
Acknowledgement
IB would like to thank the German Excellence Initiative for funding the research stay at TU Dresden.
References
- (1).Fredenslund A; Jones RL; Prausnitz JM Group-Contribution Estimation of Activity Coefficients in Nonideal Liquid Mixtures. AIChE J 1975, 21, 1086–1099. [Google Scholar]
- (2).Fredenslund A; Gmehling J; Michelsen ML; Rasmussen P; Prausnitz JM Computerized Design of Multicomponent Distillation Columns Using the UNIFAC Group Contribution Method for Calculation of Activity Coefficients. Ind. Eng. Chem. Process Des. Dev 1977, 16, 450–462. [Google Scholar]
- (3).Gmehling J; Li J; Schiller M A modified UNIFAC model. 2. Present parameter matrix and results for different thermodynamic properties. Ind. Eng. Chem. Res 1993, 32, 178–193. [Google Scholar]
- (4).Gmehling J; Wittig R; Lohmann J; Joh R A Modified UNIFAC (Dortmund) Model. 4. Revision and Extension. Ind. Eng. Chem. Res 2002, 41, 1678–1688. [Google Scholar]
- (5).Hansen HK; Rasmussen P; Fredenslund A; Schiller M; Gmehling J Vapor-Liquid Equilibria by UNIFAC Group Contribution. 5. Revision and Extension. Ind. Eng. Chem. Res 1991, 30, 2355–2358. [Google Scholar]
- (6).Weidlich U; Gmehling J A modified UNIFAC Model. 1. Prediction of VLE, hE, and y∞. Ind. Eng. Chem. Res 1987, 26, 1372–1381. [Google Scholar]
- (7).Klamt A Conductor-like screening model for real solvents: a new approach to the quantitative calculation of solvation phenomena. J. Phys. Chem 1995, 99, 2224–2235. [Google Scholar]
- (8).Klamt A; Jonas V; Bürger T; Lohrenz JC Refinement and parametrization of COSMO-RS. J. Phys. Chem. A 1998, 102, 5074–5085. [Google Scholar]
- (9).Klamt A; Eckert F COSMO-RS: a novel and efficient method for the a priori prediction of thermophysical data of liquids. Fluid Phase Equilib 2000, 172, 43–72. [Google Scholar]
- (10).Lin S-T; Sandler SI A Priori Phase Equilibrium Prediction from a Segment Contribution Solvation Model. Ind. Eng. Chem. Res 2002, 41, 899–913. [Google Scholar]
- (11).Lee M-T; Lin S-T Prediction of mixture vapor–liquid equilibrium from the combined use of Peng–Robinson equation of state and COSMO-SAC activity coefficient model through the Wong–Sandler mixing rule. Fluid Phase Equilib 2007, 254, 28–34. [Google Scholar]
- (12).Hsieh C-M; Lin S-T Determination of cubic equation of state parameters for pure fluids from first principle solvation calculations. AIChE J 2008, 54, 2174–2181. [Google Scholar]
- (13).Hsieh C-M; Lin S-T First-Principles Predictions of Vapor-Liquid Equilibria for Pure and Mixture Fluids from the Combined Use of Cubic Equations of State and Solvation Calculations. Ind. Eng. Chem. Res 2009, 48, 3197–3205. [Google Scholar]
- (14).Wang L-H; Hsieh C-M; Lin S-T Improved Prediction of Vapor Pressure for Pure Liquids and Solids from the PR+COSMOSAC Equation of State. Ind. Eng. Chem. Res 2015, 54, 10115–10125. [Google Scholar]
- (15).Wang L-H; Hsieh C-M; Lin S-T Prediction of Gas and Liquid Solubility in Organic Polymers Based on the PR+COSMOSAC Equation of State. Ind. Eng. Chem. Res 2018, 57, 10628–10639. [Google Scholar]
- (16).Hsieh C-M; Lin S-T First-Principles Prediction of Vapor-Liquid-Liquid Equilibrium from the PR+COSMOSAC Equation of State. Ind. Eng. Chem. Res 2011, 50, 1496–1503. [Google Scholar]
- (17).Silveira CL; Sandler SI Extending the range of COSMO-SAC to high temperatures and high pressures. AIChE Journal 2017, 64, 1806–1813. [Google Scholar]
- (18).Jäger A; Bell IH; Breitkopf C A theoretically based departure function for multi-fluid mixture models. Fluid Phase Equilib 2018, 469, 56–69. [Google Scholar]
- (19).Jäger A; Mickoleit E; Breitkopf C A Combination of Multi-Fluid Mixture Models with COSMO-SAC. Fluid Phase Equilib 2018, 476, 147–156. [Google Scholar]
- (20).Jäger A; Mickoleit E; Breitkopf C Accurate and predictive mixture models applied to mixtures with CO2. Proceedings 3rd European supercritical CO2 conference, Paris, France 2019. [Google Scholar]
- (21).Shimoyama Y; Iwai Y Development of activity coefficient model based on COSMO method for prediction of solubilities of solid solutes in supercritical carbon dioxide. J. Supercritic. Fluid 2009, 50, 210–217. [Google Scholar]
- (22).de P Soares R; Baladão LF; Staudt PB A pairwise surface contact equation of state: COSMO-SAC-Phi. Fluid Phase Equilib 2019, 488, 13–26. [Google Scholar]
- (23).Hsieh C-M; Sandler SI; Lin S-T Improvements of COSMO-SAC for vapor–liquid and liquid–liquid equilibrium predictions. Fluid Phase Equilib 2010, 297, 90–97. [Google Scholar]
- (24).Hsieh C-M; Lin S-T; Vrabec J Considering the dispersive interactions in the COSMO-SAC model for more accurate predictions of fluid phase behavior. Fluid Phase Equilib 2014, 367, 109–116. [Google Scholar]
- (25).Fingerhut R; Chen W-L; Schedemann A; Cordes W; Rarey J; Hsieh C-M; Vrabec J; Lin S-T Comprehensive Assessment of COSMO-SAC Models for Predictions of Fluid-Phase Equilibria. Ind. Eng. Chem. Res 2017, 56, 9868–9884. [Google Scholar]
- (26).Kuo Y-C; Hsu C-C; Lin S-T Prediction of Phase Behaviors of Polymer–Solvent Mixtures from the COSMO-SAC Activity Coefficient Model. Ind. Eng. Chem. Res 2013, 52, 13505–13515. [Google Scholar]
- (27).Hsieh M-T; Lin S-T A predictive model for the excess gibbs free energy of fully dissociated electrolyte solutions. AIChE Journal 2010, 57, 1061–1074. [Google Scholar]
- (28).Wang S; Song Y; Chen C-C Extension of COSMO-SAC Solvation Model for Electrolytes. Ind. Eng. Chem. Res 2011, 50, 176–187. [Google Scholar]
- (29).Lee B-S; Lin S-T A Priori Prediction of Dissociation Phenomena and Phase Behaviors of Ionic Liquids. Ind. Eng. Chem. Res 2015, 54, 9005–9012. [Google Scholar]
- (30).Lee B-S; Lin S-T Prediction of phase behaviors of ionic liquids over a wide range of conditions. Fluid Phase Equilib 2013, 356, 309–320. [Google Scholar]
- (31).Mohr PJ; Newell DB; Taylor BN CODATA Recommended Values of the Fundamental Physical Constants: 2014. J. Phys. Chem. Ref. Data 2016, 45, 043102. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (32).Cordero B; Gómez V; Platero-Prats AE; Revés M; Echeverría J; Cremades E; Barragán F; Alvarez S Covalent radii revisited. Dalton Trans 2008, 2832–2838. [DOI] [PubMed]
- (33).Xiong R; Sandler SI; Burnett RI An Improvement to COSMO-SAC for Predicting Thermodynamic Properties. Ind. Eng. Chem. Res 2014, 53, 8265–8278. [Google Scholar]
- (34).Mullins E; Oldland R; Liu YA; Wang S; Sandler SI; Chen C-C; Zwolak M; Seavey KC Sigma-Profile Database for Using COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res 2006, 45, 4389–4415. [Google Scholar]
- (35).Lin S-T; Sandler SI A Priori Phase Equilibrium Prediction from a Segment Contribution Solvation Model. - Additions and Corrections. Ind. Eng. Chem. Res 2004, 43, 1322–1322. [Google Scholar]
- (36).Mu T; Rarey J; Gmehling J Performance of COSMO-RS with Sigma Profiles from Different Model Chemistries. Ind. Eng. Chem. Res 2007, 46, 6612–6629. [Google Scholar]
- (37).Lin S-T; Chang J; Wang S; Goddard WA; Sandler SI Prediction of Vapor Pressures and Enthalpies of Vaporization Using a COSMO Solvation Model. J. Phys. Chem. A 2004, 108, 7429–7439. [Google Scholar]
- (38).Wang S; Sandler SI; Chen C-C Refinement of COSMO–SAC and the Applications. Ind. Eng. Chem. Res 2007, 46, 7275–7288. [Google Scholar]
- (39).Mullins E; Liu YA; Ghaderi A; Fast SD Sigma Profile Database for Predicting Solid Solubility in Pure and Mixed Solvent Mixtures for Organic Pharmacological Compounds with COSMO-Based Thermodynamic Methods. Ind. Eng. Chem. Res 2008, 47, 1707–1725. [Google Scholar]
- (40).Hsieh C-M; Lin S-T; Vrabec J Corrigendum to: Considering the dispersive interactions in the COSMO-SAC model for more accurate predictions of fluid phase behavior [Fluid Phase Equilib. 367 (2014) 109–116]. Fluid Phase Equilib 2014, 384, 14–15. [Google Scholar]
- (41).Mohr PJ; Newell DB; Taylor BN; Tiesinga E Data and analysis for the CODATA 2017 special fundamental constants adjustment. Metrologia 2018, 55, 125–146. [Google Scholar]
- (42).Chen W-L; Hsieh C-M; Yang L; Hsu C-C; Lin S-T A Critical Evaluation on the Performance of COSMO-SAC Models for Vapor–Liquid and Liquid–Liquid Equilibrium Predictions Based on Different Quantum Chemical Calculations. Ind. Eng. Chem. Res 2016, 55, 9312–9322. [Google Scholar]
- (43).Ferrarini F; Flôres GB; Muniz AR; de Soares RP An open and extensible sigma-profile database for COSMO-based models. AIChE Journal 2018, 64, 3443–3455. [Google Scholar]
- (44).Soares RDP; Flôres GB; Xavier VB; Pelisser EN; Fabrício Ferrarini,; Staudt PB lvpp/sigma: LVPP sigma-profile database (18.07); doi: 10.5281/ZENODO.3613786 2017. [DOI]
- (45).Gerber RP; Soares RP Assessing the reliability of predictive activity coefficient models for molecules consisting of several functional groups. Braz. J. Chem. Eng 2013, 30, 1–11. [Google Scholar]
- (46).Wang S; Lin S-T; Watanasiri S; Chen C-C Use of GAMESS/COSMO program in support of COSMO-SAC model applications in phase equilibrium prediction calculations. Fluid Phase Equilib 2009, 276, 37–45. [Google Scholar]
- (47).Possani LFK; de P Soares R Numerical and Computational Aspects of COSMO-based Activity Coefficient Models. Braz. J. Chem. Eng 2019, 36, 587–598. [Google Scholar]
- (48).Barr-David F; Dodge BF Vapor-Liquid Equilibrium at High Pressures. The Systems Ethanol-Water and 2-Propanol-Water. J. Chem. Eng. Data 1959, 4, 107–121. [Google Scholar]
- (49).Bell IH; Wronski J; Quoilin S; Lemort V Pure and Pseudo-pure Fluid Thermophysical Property Evaluation and the Open-Source Thermophysical Property Library CoolProp. Ind. Eng. Chem. Res 2014, 53, 2498–2508. [DOI] [PMC free article] [PubMed] [Google Scholar]
- (50).Bell IH; Satyro M; Lemmon EW Consistent Twu Parameters for More than 2500 Pure Fluids from Critically Evaluated Experimental Data. J. Chem. Eng. Data 2018, 63, 2402–2409. [Google Scholar]