Abstract
Crystallization plays a crucial role as a separation and purification technique, particularly in the chemical and pharmaceutical industries. By adjusting process parameters, the productivity, product quality, and efficiency of downstream processes can be improved. In complex processes, model-based design becomes invaluable. Population Balance Models (PBMs) have successfully aided the chemical industry in achieving more effective production processes for decades. These models can utilize various input data sources to identify the dominant mechanisms and calibrate model parameters. While inline particle monitoring tools serve as excellent qualitative descriptors, challenges arise from experimentation and data interpretation, hindering their direct application in the kinetic parameter estimation of PBMs. In this study, we present a novel approach that utilizes information from inline particle monitoring tools for the kinetic parameter estimation of PBMs, bypassing the associated obstacles. Our pioneering approach relies on offline product size data and the correlation-based utilization of inline particle monitoring information. The paper compares this novel strategy with two parameter estimation techniques: the classical method using solute concentration and product size data and a somewhat naïve approach, which assumes that the inline particle monitoring data can directly be compared with the simulations. The prediction capabilities are evaluated through two in-silico case studies. The results indicate that the precision and predictive capability of the correlation-based technique are comparable to the classical approach, both for noisy data and for a system undergoing significant agglomeration and deagglomeration. The use of Pearson's correlation coefficient yields the best results in the novel cases. These findings in in-silico datasets provide a foundation and motivation for the practical application of this idea, unleashing the so-far hidden model development potential of such measurements.
Keywords: Crystallization, Population balance modeling, Parameter estimation, Correlation, Inline particle monitoring tools
Graphical abstract
Highlights
-
•
Kinetic parameter estimation of population balance models in crytallization
-
•
Correlation-based incorporation of in-process particle measurement data.
-
•
Good parameter identification achieved without solute concentration data.
1. Introduction
Crystallization is a widely used separation and purification technique. It is paramount, especially in drug substance manufacturing, since well-adjusted process parameters can significantly improve the quality of the powder products and the effectiveness of downstream processes simultaneously [1]. This is underscored by its widespread use in the pharmaceutical industry: about 90 % of all solid active pharmaceutical ingredients (APIs) are crystals of small organic molecules [1]. Various advanced strategies can enhance crystallization to maximize their benefits: one can employ process analytical tools [2] (PAT; e.g., particle monitoring, IR/Raman-probes, etc.) to reveal valuable process understanding, or leverage simulation methods [3] (e.g., model-based design, optimization, and control) to identify optimal working conditions.
Population Balance Models [4] (PBMs) are mathematical tools commonly employed to simulate, optimize, and control particulate processes; their application has been fruitfully presented in many fields of chemical engineering science [5]. When building a model for simulating systems consisting of particles, there are generally four major development stages: (i) gathering the input (experimental) data, (ii) constructing the mathematical model that incorporates the experimentally observed phenomena, (iii) estimating the (kinetic) parameters of the model by systematically minimizing the deviation between the model's simulation and the input data, and (iv) deploying the calibrated model to solve a practical, well-defined problem. These stages are the steps of solving an inverse problem, a task that is well-known in other disciplines as well, such as in image processing [6], system identification [7], and geophysics [8]. An incomplete list of diverse applications includes the digital design of an agrochemical crystallization [9], the design space determination of a pharmaceutical crystallization [10], the simulation of droplets swelling and escaping in double emulsions [11], the prediction of droplet drying history [12], simulation of nanoparticle formation [13], and the modeling of a wet granulation process [14]. The model identification and accuracy determination are the key to a successful application.
The input data for a PBM identification can be gathered from various measurement sources during a (usually, but not necessarily) small-scale batch crystallization process: solute concentration profile, intermediate and product particle size distribution (PSD), and one may incorporate inline particle size and shape data as well. The potential experimental data include (1) the concentration collected with continuous inline tools or discrete sampling, (2) the crystal size distribution (CSD) of the product and in-process samples, and (3) the information provided by inline particle monitoring tools (i.e., chord-length distribution (CLD) or total counts). Product PSD data is routinely measured as it is often a critical quality attribute of the product. Acquiring in-process concentration requires careful tool calibration; hence, it is only deployed if absolutely necessary. Inline particle monitoring tools, in contrast, are widely and routinely used for calibration-free monitoring (for example, laser back-scattering-based FBRM or imaging-based Particle Track®, Blaze Metrics®, etc.). They found versatile applications: the CLD and relative particle number density can be tracked [15], various crystallization events can be monitored from nucleation to agglomeration [16], the processes can be controlled upon their signal [17], and it can be used quickly to determine the solubility [18,19]. The subset of tools relying on imaging can also provide invaluable particle shape information. Despite these, apart from a few rare examples, they are seldom incorporated in PBM identification [[20], [21], [22]]. A reason is that the measurement is impacted by the process conditions (probe position, mixing rate, solid concentration, viscosities, optical properties of the solids, etc.) [23]. Furthermore, the total counts and measured mean particle sizes are not linearly correlated with their actual values [24,25]. Given these, two identical CLDs do not necessarily translate to identical particle properties, and identical particle properties may result in different CLDs. Even though there are propositions for mitigating these issues (fixing the probe position, applying squared weighting [23,24]), those provide solutions only within limited particle size ranges.
Data interpretation is also challenging, mainly because the CLD is not PSD, which is often the critical quality attribute of the APIs. Therefore, a challenge here lies within the inherent need to convert the CLD into PSD, which transformation, mathematically speaking, is an ill-posed inverse problem. There have been many solutions propounded for this conversion, which can be grouped into the following categories: (a) known particle shape, and the transformation is carried out accordingly via geometrical modeling [[26], [27], [28], [29], [30], [31], [32]], (b) the essential optical properties are incorporated into the transformation model [33,34], and (c) specific mathematical procedures and tailormade experiments are executed to acquire the PSD empirically from the CLD [35,36]. These models require background information on the particles and a multiple-step model evaluation, which makes them rather meticulous and problem-specific. Due to these experimental and data-interpretation-related challenges, inline particle monitoring tools remain a qualitative data source, generally not used in PBM identification. This is unfortunate since the equipment is calibration-free, easy to use, widespread in the chemical and pharmaceutical industries, and large amounts of measurements have already been accumulated.
This article presents a novel approach to the parameter estimation (PE) of PBMs, applying the data of inline particle monitoring tools as a primary information source. We propose a transformation-free utilization of total counts and average chord length (CL) by introducing a new concept: instead of directly comparing the total counts and average crystal size to the simulation (which is alluring but erroneous), the correlation between the simulated and measured data is incorporated into the objective function. With this, we incorporate the practical knowledge that, e.g., a significant count increase indicates an ongoing particle formation (nucleation, fragmentation, etc.), and a count decrease may result from dissolution, aggregation, etc. The measured CL can be correlated simultaneously with the simulated particle sizes. This way, many of the issues that prohibited the direct utilization of inline particle monitoring data from model identification are bypassed in a transformation-free manner. The calibration curve (i.e., the relationship between the measured and simulated data) can also be made part of the inverse problem by optimizing parameters that are not model parameters but influence the inverse problem (i.e., hyperparameters) [37]. Routinely applied product PSD measurements remain necessary, as correlation provides relative information, leaving space for the emergence of multiple, nearly equivalent solutions, which may differ enormously in terms of absolute results and underlying kinetic parameters. However, product PSD samples are routinely collected for product quality verification reasons. We present this novel approach through two simulation case studies. The datasets in the case studies are simulated using PB models, divided into calibration and validation cases, and applied as data sources for subsequent PE runs with different approaches. The results of the PEs are then tested against the true kinetic parameters of the PBMs, and the validation cases are pure numerical tests.
Our article primarily covers the following key contributions.
-
•
We propose a novel approach for calibrating Population Balance models.
-
•
The method is based on quantifying the correlation of simulated and measured data to solve the inverse problem.
-
•
The new approach solely depends on inline and offline particle size measurements, eliminating the need for concentration measurements.
-
•
The method is presented through simulated case studies.
2. Methods
2.1. Population balance models
For the simulation of the crystal population, a univariate size density function is introduced (which will be noted as for clarity), which provides the number of crystals in a unit volume of suspension within the size interval . With the assumptions of perfect mixing and negligible breakage, the population balance equation (PBEs) for the supersaturated condition takes the form of Eq. (1), and for the undersaturated condition, the form of Eq. (2).
| (1) |
| (2) |
| (3) |
| (4) |
In Equations (1), (2), the is the crystal size (), is process time (), is the nuclei size (), and is the relative supersaturation (). In the supersaturated state, where is positive, the following mechanisms are considered: size-independent, standard crystal growth (, ), primary and secondary nucleation ( and , ), and size-independent, constant agglomeration (, ), whereas if the solution is undersaturated size-independent, standard dissolution (, ) and size-independent, constant deagglomeration (, ) are considered. The exact equations of the applied mechanisms are detailed in Table 3. The corresponding boundary and initial conditions (Eqs. (3), (4)) define the seed distribution and express that all the crystals have a finite size. The driving force of crystal formation, the supersaturation, is given by Eq. (5), where is the actual concentration and is the solubility.
| (5) |
Table 3.
The applied kinetic equations in the simulations.
| Case study | Primary nucleation | Secondary nucleation | Crystal growth | Dissolution | Agglomeration | Deagglomeration |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 |
The agglomeration and deagglomeration can be calculated upon their operators denoted in Equations (5), (6), which are taken without modification from the literature [20]. In Eq. (6), the function is called the agglomeration kernel. The deagglomeration can be modeled as a breakage process; therefore, an extended fragmentation model simulates it.([38])
In Eq. (7), is the breakage selection function, is the daughter distribution function, and is the agglomeration rate of particles. is calculated using Eq. (8), where is the number of agglomeration bridges in a unit volume of suspension given with Eq. (9) and the corresponding initial condition: .
| (6) |
| (7) |
| (8) |
| (9) |
The material balances are presented in Eq. (10), (11). Here, is the concentration in mass fraction, is the volume shape factor (), and is the density of the crystals (). These equations are solved to calculate momentary solute concentration in the system during the process. is a factor converting the units of particle size and density to the units of concentration.
| (10) |
| (11) |
To fit a PB model to a specific experimental system, one should typically solve an inverse problem of finding such a combination of kinetic parameters that result in minimal deviation between the simulated and measured values. The kinetic parameters are parameters of the applied empirical, mechanistic equations. Selecting the most appropriate mechanistic equations to model a crystallization system is one of the most crucial steps in fitting PB models on particulate systems; however, we do not deal with that question in this work. To solve the inverse problem, some form of (global) optimization technique is generally used. An objective function is constructed dependent on the model's parameters (denoted with a vector ) and can quantify the deviation between the measurement values and our model's corresponding predictions. By minimizing the objective function , one can eventually find the best-fitting model combination.
2.2. Objective function construction strategies
One can substitute various expressions for , , , , , , thus resulting in different kinetic model structures. This step is of great importance and is related to the problem of kinetic model discrimination and parameter estimability [39,40]. Nevertheless, regardless of the kinetic model structure, the model will have a few unknown constants, which must be estimated based on the experimental data as an optimization problem. Given the kinetic parameters, the objective function quantifies the aggregated deviations between the simulations and the measurements. Ideally, the calibrated model has a low objective function value and a small kinetic parameter uncertainty region. Different objective functions can be formulated depending on the available data. Table 1 summarizes different input data source types with advantages and disadvantages from a practical and a modeler's perspective.
Table 1.
A high-level summary of the input data sources is generally used for PBM's kinetic parameter estimation.
| Data type | Advantages | Disadvantages |
|---|---|---|
| Offline PSD (mainly for product, but may be available for intermediate samples, too) |
|
|
| Solute concentration |
|
|
| Inline particle monitoring data |
|
|
| Particle shape data |
|
|
The traditional way (referred to hereafter as Classic) to construct the kinetic PE's objective function is to incorporate the offline particle size distribution of the product (and, if available, intermediate samples) and the concentration profile throughout the process, which takes the form of Eq. (12). Here, the first term takes into account the sum of squared differences of the product size quantiles (PSQs) of the PSD and the second term regards the sum of squared differences of the measured and simulated concentrations.
| (12) |
In Eq. (12) is the weight factor that sets the balance between the two terms, is the number of available experiments of the specific type, denotes the available samples (i.e., time stamps during the experiment) of the specific measurement type in the experiment, denotes the number of product size quantiles. Generally, three quantiles, namely, the , and are utilized [41]. The subscripts of and designate the simulated and measured quantities. is offline product size measurement-related data.
Since inline particle monitoring data is widely available, it is alluring to incorporate it directly and, perhaps, use it to replace the concentration data. However, this is difficult due to sensitivity to process conditions [18], complexity, and uncertainty of CLD to PSD and vice-versa conversions [25], as described earlier in this paper. We propose a novel approach that relies on inline particle monitoring data and product PSD and does not involve the concentration profile. Two objective functions are presented: in Eq. (13), the inline particle monitoring tools’ data is used somewhat naïvely (which approach will be referred to as Naïve), i.e., the absolute deviation of the simulated and measured in-process particle number and mean crystal size is considered. In Eq. (14), the data is used in a correlation-based (Novel) manner, maximizing the correlation between the measured and simulated trends. While the Naïve approach struggles with all known weaknesses of inline particle monitoring tools (nonlinear characteristics, sensitivity of probe placement and process conditions, different nature of PSD and CLD), the Novel technique shortcuts these by focusing only on the similarity between the shape of the curves. Correlation is a qualitative metric, the purpose of which these tools were used with massive success in process monitoring and model-free control.
| (13) |
| (14) |
In Eq. (13), the second term considers naïvely the in-process particle monitoring data by assuming direct comparability to the simulated mean crystal size and crystal number density. This latter is, of course, normalized to the initial value of the measured particle count to bring them to the same range. In Eq. (14), the first term of the objective function is the correlation coefficient between the measured and simulated values (), which appears with a negative sign as the correlation is being maximized. This way, we seek kinetic parameter combinations that yield similar qualitative trends between the in-process count and mean CL and the simulated number density and mean crystal size. In our purely numerical case studies, the otherwise measured CL and total counts are approximated with the simulated average crystal size and total crystal number values.
2.3. Correlation coefficients
There are numerous types of correlations (), out of which three are used in this work (see Table 2). In Eq. (15) and are matrices of observations whose correlation is calculated; denotes the covariance, and is the standard deviation of the corresponding matrices. In Eq. (16), is the rank of the matrices [42]; thus, the Spearman rank correlation coefficient can be regarded as the Pearson correlation coefficient of the ranks. In Eq. (17), , , and are the number of pairs, concordant pairs, and discordant pairs [43], respectively, in the dataset. All measures range in the interval and quantify the magnitude of similarity between the and in their way.
Table 2.
The correlation coefficients tested in this work.
The profound explanations will not be covered here, but it is worthwhile noting that the three measures do not quantify the correlation equally. The Pearson coefficient is the strictest to penalize the nonlinearities as its value decreases steeply with the increase of nonlinearity and variance in the data; the Spearman rank correlation coefficient recognizes slighter correlation in datasets, resulting in a high value even for less obvious correlation between the observed values and results 1 for monotonically increasing functions. The Kendall rank correlation coefficient is in between the other two measures. These are illustrated in four representative datasets in Fig. 1.
Fig. 1.
The three employed correlation measures (, , ) are calculated on datasets specifically generated for representation.
In the parameter estimation, these measures will be incorporated into the objective function alongside the deviation of the product PSDs. The parameter estimations are executed with each correlation measure separately as part of investigating the novel formulation.
3. Results and discussion
3.1. Case study #1: N-G system with experimental noise
The first case study presents a population balance model with arbitrarily but realistically set kinetic parameters that are used to generate by simulation solute concentration, crystal number (∼total counts), and average crystal length (∼mean chord length) profiles for five experiments in a 22 full factorial design of experiments structure with one center point (factors are cooling rate and initial temperature). The fictitious in-process counts () and mean chord lengths () were generated based on the following arbitrary nonlinear functions of simulated crystal number densities (), Eq. (18), (19), and mean crystal sizes (), Eq. (20), (21):
| (18) |
| (19) |
| (20) |
| (21) |
where and are the zeroth and first moments of the distribution. According to Eqs. (18), (19), the simulated data directly relates to population balance simulation through the moments. The inline particle data is emulated from the exact number density and the mean size trends through the arbitrary nonlinear functions of Eqs. (20), (21). Note that these functions were chosen to bring the count and chord length to the typically measurable ranges while making them nonlinear functions, as reported in the literature [25].
Fig. 2 shows the scheme of the employed model that contains growth and secondary nucleation as the governing mechanisms of crystallization. This simulated set of 5 experiments is used as input data for kinetic PE with different objective functions. In this case study, the population and material balance equations were solved using the standard method of moments (SMOM).
Fig. 2.
Scheme of crystallization system of Case study 1.
Measurement errors were artificially added to the quasi-experimental data. Due to the lack of precise estimates, the added noise and shift listed below are somewhat arbitrary. Nevertheless, these make the data more realistic and, just like in a real case, exclude the possibility of a perfect fit of simulations on the data.
-
a.
A 1 % and 2 % normally distributed random noise with normal distribution was added to the simulated concentration and, respectively, in-process count and mean size data.
-
b.
The and diameters were increased by 3 % and 5 % to capture the artificial CSD broadening effect of laser diffraction-based CSD determination.
-
c.
Each experiment's noisy count data was increased by a random number between 0.5 and 1.5 to capture the count sensitivity on stirring and/or sensor placement. This is not changed within an experiment; it represents an experiment-to-experiment variation only.
Table 3 comprises the empirical equations used to simulate the incorporated crystallization mechanisms in both case studies. According to the first row, the first case study involves simple size-independent secondary nucleation and growth rate equations [4].
3.2. Case study #2: a complex system with N-G-D-A-B mechanisms
The input data of the second case study is generated using the calibrated population balance model and estimated kinetic parameters of Szilágyi et al. [20] This model incorporates crystal nucleation, growth, dissolution, aggregation, and deagglomeration (Fig. 3.).
Fig. 3.
Summary of applied mechanisms in case study #2.20.
This case study stresses the Novel approach: in this system, the agglomeration and deagglomeration are manifested directly in the inline particle monitoring tools’ signal but only indirectly through the varying crystal surface area in the concentration profile. The kinetic equations in the case studies are portrayed in the second row of Table 3. These equations are selected based on the particular needs of the corresponding experimental system of Szilágyi et al.
In study #2, the population and material balance equations were solved using the high-resolution finite volume method (HR-FVM) due to the presence of an undersaturated (dissolution) state necessary to trigger the deagglomeration of agglomerates.
3.3. Parameter estimation results and discussions
3.3.1. Comparison of the PE approaches
The comparison of the PE approaches is explained in Fig. 4. The workflow is the same for the case studies #1 and #2. First, the kinetic models with the arbitrary kinetic parameters () are used to simulate the quasi-experimental data: (I) the and of the product's PSD, which are adequate descriptors of the whole crystal size distribution, (II) the average crystal size, which is the number-based mean crystal size transformed using Eq. (20), (III) the crystal number, (IV) and the concentration profiles are saved for each simulated experiment. These experiments are divided into calibration (1st - 4th experiments) and validation (5th experiment) groups. The calibration experiments are then applied as inputs for the parameter estimations (Eqs. (1), (2), (3))). Each approach is repeated with different intrinsic settings to make sure that the best possible solution is found: in the (1) Classic and (2) Naïve cases, three concurrent PEs are executed with different weight factors, and in the (3) Novel case three optimizations are performed with the different correlation coefficients. Altogether, this results in nine PEs and nine corresponding kinetic parameter combinations () which are used to simulate the validation experiments' product and , which are subsequently compared to the true values as testing.
Fig. 4.
Comparing the introduced PE approaches.
3.3.2. Results of kinetic PE runs with three different approaches
Table 4 presents the results of kinetic PEs. The table highlights the kinetic parameters in either logarithmic or non-logarithmic form, just as they were applied in the parameter estimation. The transformation was applied to improve the optimization convergence. The table contains three cases for the Classic and Naïve approaches, respectively, designating different PE runs with different weight factors (). After finding sufficient values that help the optimization algorithm balance between the two components of the objective function and grasp the information content of the whole dataset, the PE runs were repeated with two different values around the original one to make the comparison of the approaches more robust. This way, it is ensured that the different approaches are not compared based on their somewhat arbitrary settings but the information content they convey to the optimization algorithm. In The average relative deviations of Table 4 are calculated using Eq. (22), where is the number of kinetic parameters in a case study, is the true and is the calibrated value of the parameter.
| (22) |
Table 4.
The result of kinetic PEs with the considered approaches.
| Case study | Parameter | Unit of measurea | True value | Novel |
Classic |
Naïve |
||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| #1 | ||||||||||||
| [−] | ||||||||||||
| Average relative deviation [%] | ||||||||||||
| #2 | ||||||||||||
| [−] | ||||||||||||
| Average relative deviation [%] | ||||||||||||
The unit of measure applies to a parameter's non-logarithmic form.
The true values were best matched by the parameters coming from the Naïve approach in case study #1: the parameters of case show, on average, a 15.65 % error. The was the worst performer in the first case study, with a 28.81 % error. In sharp contrast, in case study #2, the Novel approach's results are the closest to the true parameters; e.g., the average deviation is only 3.27 % for the case , while the least precise parameters are coming from the case that has a 10.72 % difference. If we look at the average relative deviation values of Table 4, we can see the performs poorly in the case study #1 while remarkably well in case study #2. To explain that, the system of case study #2 exhibits more complex particle number and size dynamics (e.g., agglomeration and deagglomeration events that have a direct effect on the in-process particle monitoring signal), where incorporating well the in-process data itself helps the parameter estimation better converge; at the same time, it may happen that the covered particle size and number density range in the experiments of case study #1 were narrow, attenuating the negative effects of nonlinear character of in-process particle tracking tools – which may have made the Naïve case less naïve, actually.
Although the corresponding confidence intervals of the kinetic parameters are not presented in this work as it was out of focus, a former work of the authors demonstrated that a similar formulation could significantly reduce the parameter uncertainty space [20].
3.3.3. Analysis of validation performance
The validation results, in terms of product mean crystal size and the correlation coefficients, are presented in Fig. 5 a and 5.b. Characterizing the temporal evolution of particle properties is important as there are cases when remarkably different kinetic parameter combinations can eventuate very similar mean product crystal sizes. This phenomenon is not necessarily caused by redundant parameters in the kinetic model; it is rooted in the nature of crystallization mechanisms: the effect of concurrent mechanisms can overlap during the processes, e.g., the molecular crystal growth and the agglomerative growth may have a similar global effect on the mean product size under certain circumstances. Fig. 5 is the graphical representation of the calculated results, also displayed numerically in the supplementary information (SI) Tables S1-S2. The effect of the measurement noise on the PE is seen in Fig. 5 a, which makes the relative deviations two times higher than the ones in Fig. 5 b. In the scenario when the FBRM signal is noisy, the Naïve approach fails regardless of the weight factor, whereas the relative deviation for the are comparable to the Novel and Classic approaches'. Notwithstanding, the joint deviation in the and are the highest in these cases. The performances of the Novel and Classic approaches are comparable in Case #1. It is worth noting that the Novel approach practically evolved the performance of the Naïve to the level of the Classic approach in this case. Moreover, the obtained precision with the noisy inline particle signal is comparable to the one achieved with the Classic approach regardless of the experimental noise, e.g., the relative deviation values for and are: = 1.27 % and 1.66 % and = 0.83 % and 1.77 %. These results are understandable if we regard the nature of the objective functions in the three different cases. The measurement noise misdirects the Naïve approach, which relies heavily on the precision of the online-particle-measurement-related data. The Classic approach can result in good precision despite the biased data. The concentration dynamics convey more reliable and important information about a crystallization that can guide the parameter estimation even with some experimental discrepancy. Lastly, the Novel approach's structure can help the parameter estimation process bypass the measurement errors, both the nonlinearity and the shift/drift, by not focusing on the exact match of the simulations with the measured data.
Fig. 5.
Model validation. The relative deviation from the product's true and values for each objective function construction approach in case studies #1 (Fig. 5a) and #2 (Fig. 5b); and the three employed correlation coefficient values of the simulated and measured average crystal size and crystal number are presented for case studies #1 (Fig. 5c) and #2 (Fig. 5d).
Lower relative deviations are observed in case study #2 despite the significantly more complex model structure. The Naïve and Novel approaches have the best performances, according to Fig. 5 b, which can be explained by the complex nature of the underlying crystallization mechanisms that explicitly affect the particle monitoring equipment's signal, supporting correct identification. The best performances are the ones coming from the Novel approach's case ( = 0.06 % and 0.10). These results support the concept that making a direct connection to the particles' evolution through correlation maximization is beneficial. The Pearson correlation coefficient penalizes the nonlinearities and the high variance between the measured and simulated particle-related data; however, it can result in a high value for good correlation without an exact match at the same time while decoupling the sensor positioning-related issues as well. These latter are features that the Naïve approaches do not own.
3.3.4. Analysis of the correlation coefficient values
Fig. 5c and 5 d displays the employed correlation coefficient obtained in the test set. In case study #1, the lowest correlation can be observed in the Naïve cases. Interestingly, the correlation coefficients are comparable in the Novel and Classic approaches, and the higher values belong to the Classic cases: on average, the aggregated correlation coefficient is 0.787 for the and 0.759 for the . This underscores that a well-fitted model also behaves well in the space of unmeasured states. In case study #2, the correlation coefficients are again comparable in the Classic and Novel cases: on average, the aggregated correlation coefficient value is 0.965 for the and 0.960 for the while the best results of the Naïve () is 0.945. These results show that the correlation of the experimental and simulated particle properties are not superior measures to quantify the model's precision and reinforce that a good correlation can be achieved without incorporating the in-process particle data into the objective function. The goal of this work, however, was not to achieve a good correlation but to displace the concentration data.
3.3.5. Simulated dynamic profiles of the best candidates for case study #1
Fig. 6 presents the parity plots of the measured and simulated particle properties and concentration profiles for the validation experiments. Before delving into the discussion of the further results, an apparent inconsistency must be addressed in Fig. 6, Fig. 7: as one examines the shown crystal numbers, a large deviation can be seen between the measured and simulated values. This is not an anomaly: the simulated experiments are deliberately transformed into realistic magnitude using Eq. (20). This is done with a goal to make the simulated PE runs closer to reality. In case study #1. In Fig. 6a-6 d the case and in Fig. 6e−6 h, the can be seen, which were the best performers based on particle size prediction. The remaining cases are in the SI in Figures S1 and S2. The measured and simulated particle sizes are in good correlation in both cases, characterized by relatively high correlation coefficients ( = 0.96, = 0.98, = 0.88; = 0.96, = 0.95, = 0.88). However, there is a large deviation among the correlation coefficients for the simulated and measured counts: = 0.38, = 0.97, = 0.28; = 0.46, = 0.89, = 0.33. results in the smallest deviation from the validation experiments, making the Pearson's coefficient the best choice, whose precision is comparable to the approach in this case study.
Fig. 6.
The validation results of the best-fitting approaches for case study #1. The results for are introduced in 6.a-6.d and in 6.e−6.h. Parity plot of measured and simulated particle size and counts (6.a-6.b and 6.e−6.f), the measured and simulated particle size and counts through the process (6.c and 6.g), and the simulated and measured concentration during the process (6.d and 6.h).
Fig. 7.
The validation results of the best-fitting approaches for case study #2. The results for are introduced in 7.a-7.d and in 7.e−7.h. The following plots are shown here: parity plot of measured and simulated particle size and counts (7.a-7.b and 7.e−7.f), the measured and simulated particle size and counts through the process (7.c and 6.g), and the simulated and measured concentration during the process (7.d and 7.h).
3.3.6. Simulated dynamic profiles of the best candidates for case study #2
Fig. 7 portrays the two best settings of case study #2. Fig. 7 a-7.b and 7.e−7.f, the correlation values in the parity plots are all higher. This is unsurprising: in each case, the particle size and count deviation are part of the objective function. According to Fig. 7 d and 7.h, the concentration fit of the is superior to the . The evolutions of particle number and average particle size are comparable, and even the product size predicted by the is closer to the experimental size. This may happen since the do not consider the exact match of the in-process particle properties, but it only maximized the correlation. The superiority of is still underlined with the fact that it resulted in the least deviation from the validation and , which was the stronger comparison criteria.
The results indicate that when the crystallization mechanisms significantly affect the particle number and count evolution, the correlation-based Novel technique involving the Pearson's correlation coefficient can outweigh the Classic and Naïve approaches. However, one may expect that for wider domains where phenomena such as crystal overlapping or double-counting of rod-like particles also become significant, a nonlinear correlation measure may perform better. The choice between them, at this stage, remains somewhat intuitive. The method may also bypass the batch-to-batch probe placement variations, the varying stirring conditions from experiment to experiment, and, e.g., imaging settings (as long as they remain constant within the experiment). On the other hand, the PE optimization was observed to be more prone to getting stuck in local optima, which makes it necessary to carefully set global optimization parameters with aggressive population sizes. These simulation-based results foreshadow the possible, successful, practical application of the widely available in-process particle monitoring data, even without concentration. This can allow model development and retrospective model-based optimization of existing processes.
4. Conclusions
This work proposed a new approach for the parameter estimation (PE) of population balance models (PBMs) in solution crystallization processes, which is presented and compared with two other arrangements. The motivation of the research was to enable the incorporation of the in-process particle measurements directly into kinetic PEs while bypassing the need for more experimentally demanding concentration profiles. All the examined approaches have two components in their objective function, and one of them is always the deviation between the measured and simulated mean crystal sizes, as this is often a critical product attribute and, subsequently, also routinely measured. The compared approaches differ in the formulation of the second objective function component. The reference scenario is considered to be based on concentration data. The naïve approach aims to directly minimize the difference between the measured in-process particle data and normalized simulated crystal number density and mean size and, as a novel proposal, the experiment-wise correlation maximization between the measured in-process particle monitoring data and simulated crystal number density and size. The novel approach bypasses some of the known issues of the in-process particle monitoring data, including the ill-posed CLD-CSD (or vice-versa) transformation, the nonlinear characteristic, and sensitivity to sensor placement within the reactor. Two simulation case studies are presented: parameter estimation of a nucleation and growth and a nucleation, growth, dissolution, agglomeration, and deagglomeration model. In the first case study, the reference approach gave the best validation performance but was followed tightly by the Novel approach. In the second case study, where agglomeration and deagglomeration were also present, the novel approach gave the best performance. Although concentration deviation minimization was not incorporated, it also resulted in a good concentration fit. These promising simulation results suggest that in-process particle monitoring data, available in large quantities in crystallization labs, can be used for PBM identification transformation-free if complemented with product PSD even in lack of concentration profile. These findings could and should be validated experimentally in subsequent studies.
CRediT authorship contribution statement
Álmos Orosz: Writing – review & editing, Writing – original draft, Software, Investigation, Formal analysis. Botond Szilágyi: Writing – review & editing, Writing – original draft, Supervision, Funding acquisition, Conceptualization.
Declaration of competing interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgment
The project supported by the Doctoral Excellence Fellowship Program (DCEP) is funded by the National Research Development and Innovation Fund of the Ministry of Culture and Innovation and the Budapest University of Technology and Economics under a grant agreement with the National Research, Development, and Innovation Office. This research was also supported by Hungarian National Scientific Research Fund (OTKA) grant FK-138475.
Footnotes
Supplementary data to this article can be found online at https://doi.org/10.1016/j.heliyon.2024.e39851.
Appendix A. Supplementary data
The following is the Supplementary data to this article:
References
- 1.Chen J., Sarma B., Evans J.M.B., Myerson A.S. Pharmaceutical crystallization. Cryst. Growth Des. 2011;11:887–895. [Google Scholar]
- 2.FDA Guidance for industry PAT - a framework for innovative pharmaceutical development, manufacturing, and quality assurance. 2004. http://www.fda.gov/cvm/guidance/published.html
- 3.Ahmed K.B.R., Pathmanathan P., Kabadi S.V., Drgon T., Morrison T.M. Editorial on the FDA report on “successes and opportunities in modeling & simulation for FDA”. Ann. Biomed. Eng. 2023;51:6–9. doi: 10.1007/s10439-022-03112-x. [DOI] [PubMed] [Google Scholar]
- 4.Randolph A.D., Larson M.A. In: Theory of Particulate Processes. second ed. Randolph A.D., Larson M.A., editors. Academic Press; 1988. Chapter 3 - the population balance; pp. 50–79. [DOI] [Google Scholar]
- 5.Ramkrishna D., Singh M.R. Population balance modeling: current status and future prospects. Annu. Rev. Chem. Biomol. Eng. 2014;5:123–146. doi: 10.1146/annurev-chembioeng-060713-040241. Preprint at. [DOI] [PubMed] [Google Scholar]
- 6.Song Y., Shen L., Xing L., Ermon S. Solving inverse problems in medical imaging with score-based generative models. ArXiv abs/2111.08005. 2021 [Google Scholar]
- 7.Liu J., Ouyang H., Han X., Liu G. Optimal sensor placement for uncertain inverse problem of structural parameter estimation. Mech. Syst. Signal Process. 2021;160 [Google Scholar]
- 8.Faucher F., Scherzer O., Barucq H. Eigenvector models for solving the seismic inverse problem for the Helmholtz equation. Geophys. J. Int. 2020;221:394–414. [Google Scholar]
- 9.Wu W.-L., et al. Digital design of an agrochemical crystallization process via two-dimensional population balance modeling. Org. Process Res. Dev. 2024;28:543–558. [Google Scholar]
- 10.Orosz Á., et al. Dynamic modeling and optimal design space determination of pharmaceutical crystallization processes: realizing the synergy between off-the-shelf laboratory and industrial scale data. Ind. Eng. Chem. Res. 2024 doi: 10.1021/acs.iecr.3c03954. [DOI] [Google Scholar]
- 11.Khadem B., Sheibat-Othman N. Modeling droplets swelling and escape in double emulsions using population balance equations. Chem. Eng. J. 2020;382 [Google Scholar]
- 12.Abdullahi H., Burcham C.L., Vetter T. A mechanistic model to predict droplet drying history and particle shell formation in multicomponent systems. Chem. Eng. Sci. 2020;224 [Google Scholar]
- 13.Handwerk D.R., Shipman P.D., Whitehead C.B., Özkar S., Finke R.G. Mechanism-Enabled Population Balance Modeling of Particle Formation en Route to Particle Average Size and Size Distribution Understanding and Control. J. Am. Chem. Soc. 2019;141:15827–15839. doi: 10.1021/jacs.9b06364. [DOI] [PubMed] [Google Scholar]
- 14.Ismail H.Y., Singh M., Albadarin A.B., Walker G.M. Complete two dimensional population balance modelling of wet granulation in twin screw. Int J Pharm. 2020;591 doi: 10.1016/j.ijpharm.2020.120018. [DOI] [PubMed] [Google Scholar]
- 15.Barthe S., Rousseau R.W. Utilization of focused beam reflectance measurement in the control of crystal size distribution in a batch cooled crystallizer. Chem. Eng. Technol. 2006;29:206–211. [Google Scholar]
- 16.Blanco A., Fuente E., Negro C., Tijero J. Flocculation monitoring: focused beam reflectance measurement as a measurement tool. Can. J. Chem. Eng. 2002;80:1–7. [Google Scholar]
- 17.Doki N., et al. Process control of seeded batch cooling crystallization of the metastable α-form Glycine using an in-situ ATR-FTIR spectrometer and an in-situ FBRM particle counter. Cryst. Growth Des. 2004;4:949–953. [Google Scholar]
- 18.Fang Y., Selomulya C., Chen X.D. Characterization of milk protein concentrate solubility using focused beam reflectance measurement. Dairy Sci. Technol. 2010;90:253–270. [Google Scholar]
- 19.Bosits M.H., et al. Population balance modeling of diastereomeric salt resolution. Cryst. Growth Des. 2023;23:2406–2416. [Google Scholar]
- 20.Szilagyi B., Eren A., Quon J.L., Papageorgiou C.D., Nagy Z.K. Application of model-free and model-based quality-by-control (QbC) for the efficient design of pharmaceutical crystallization processes. Cryst. Growth Des. 2020;20:3979–3996. [Google Scholar]
- 21.Szilágyi B., Eren A., Quon J.L., Papageorgiou C.D., Nagy Z.K. Digital design of the crystallization of an active pharmaceutical ingredient using a population balance model with a novel size dependent growth rate expression. From development of a digital twin to in silico optimization and experimental validation. Cryst. Growth Des. 2022;22:497–512. [Google Scholar]
- 22.Szilágyi B., Eren A., Quon J.L., Papageorgiou C.D., Nagy Z.K. Monitoring and digital design of the cooling crystallization of a high-aspect ratio anticancer drug using a two-dimensional population balance model. Chem. Eng. Sci. 2022;257 [Google Scholar]
- 23.Yu W., Erickson K. Chord length characterization using focused beam reflectance measurement probe - methodologies and pitfalls. Powder Technol. 2008;185:24–30. [Google Scholar]
- 24.Heath A.R., Fawell P.D., Bahri P.A., Swift J.D. Estimating average particle size by focused beam reflectance measurement (FBRM) Part. Part. Syst. Char. 2002;19:84–95. [Google Scholar]
- 25.Barrett P., Glennon B. In-line FBRM monitoring of particle size in dilute agitated suspensions. Part. Part. Syst. Char. 1999;16:207–211. [Google Scholar]
- 26.Szilágyi B., Nagy Z.K. Aspect ratio distribution and chord length distribution driven modeling of crystallization of two-dimensional crystals for real-time model-based applications. Cryst. Growth Des. 2018;18:5311–5321. [Google Scholar]
- 27.Hukkanen E.J., Braatz R.D. Measurement of particle size distribution in suspension polymerization using in situ laser backscattering. Sensor. Actuator. B Chem. 2003;96:451–459. [Google Scholar]
- 28.Worlitschek J., Mazzotti M. Model-based optimization of particle size distribution in batch-cooling crystallization of paracetamol. Cryst. Growth Des. 2004;4:891–903. [Google Scholar]
- 29.Agimelen O.S., et al. Estimation of particle size distribution and aspect ratio of non-spherical particles from chord length distribution. Chem. Eng. Sci. 2015;123:629–640. [Google Scholar]
- 30.Ruf A., Worlitschek J., Mazzotti M. Modeling and experimental analysis of PSD measurements through FBRM. Part. Part. Syst. Char. 2000;17:167–179. [Google Scholar]
- 31.Brivadis L., Sacchelli L. New inversion methods for the single/multi-shape CLD-to-PSD problem with spheroid particles. J. Process Control. 2022;109:1–12. [Google Scholar]
- 32.Honavar V.G., Pandit A.V., Singh M., Ranade V.V. Models for converting CLD to PSD for bimodal distributions of particles. Chem. Eng. Res. Des. 2023;200:576–591. [Google Scholar]
- 33.Czapla F., et al. Application of a recent FBRM-probe model to quantify preferential crystallization of dl-threonine. Chem. Eng. Res. Des. 2010;88:1494–1504. [Google Scholar]
- 34.Kail N., Briesen H., Marquardt W. Analysis of FBRM measurements by means of a 3D optical model. Powder Technol. 2008;185:211–222. [Google Scholar]
- 35.Li H., Kawajiri Y., Grover M.A., Rousseau R.W. Application of an empirical FBRM model to estimate crystal size distributions in batch crystallization. Cryst. Growth Des. 2014;14:607–616. [Google Scholar]
- 36.Pandit A.V., Ranade V.V. Chord length distribution to particle size distribution. AIChE J. 2016;62:4215–4228. [Google Scholar]
- 37.Agrawal S., Kim H., Sanz-Alonso D., Strang A. A variational inference approach to inverse problems with gamma hyperpriors. SIAM/ASA J. Uncertain. Quantification. 2022;10:1533–1559. [Google Scholar]
- 38.Gokhale Y.P., Kumar J., H W., W G., T J. In: Micro-Macro-interaction: in Structured Media and Particle Systems. Albrecht Bertram, Tomas J., editors. Springer Berlin Heidelberg; Berlin, Heidelberg: 2008. Population balance modelling for agglomeration and disintegration of nanoparticles; pp. 299–309. [DOI] [Google Scholar]
- 39.Orosz Á., et al. Diastereomer salt crystallization: comprehensive process modeling and DoE-driven comparison of custom-coded and user-friendly simulators. Chem. Eng. J. 2023;473 [Google Scholar]
- 40.Fysikopoulos D., Benyahia B., Borsos A., Nagy Z.K., Rielly C.D. A framework for model reliability and estimability analysis of crystallization processes with multi-impurity multi-dimensional population balance models. Comput. Chem. Eng. 2019;122:275–292. [Google Scholar]
- 41.Barhate Y., Kilari H., Wu W.-L., Nagy Z.K. Population balance model enabled digital design and uncertainty analysis framework for continuous crystallization of pharmaceuticals using an automated platform with full recycle and minimal material use. Chem. Eng. Sci. 2024;287 [Google Scholar]
- 42.Spearman C. The proof and measurement of association between two things. Am. J. Psychol. 1904;15:72–101. [PubMed] [Google Scholar]
- 43.Kendall M.G. A new measure of rank correlation. Biometrika. 1938;30:81–93. [Google Scholar]
- 44.Galton F. Typical laws of heredity. 1877 [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.








