Abstract

We have studied the primary nucleation of adipic acid from aqueous solutions in thousands of microdroplets generated in a fully automated microfluidic setup. By varying supersaturation in solution and residence time, we were able to estimate nucleation rates and growth times, while accounting for the stochastic nature of nucleation, the variability in microdroplet volumes (which is kept below 2%, thanks to a carefully designed experimental protocol), and the uncertainty in the automated image analysis procedure. Through a thorough statistical analysis we have obtained exact expressions for the expected values and the variances of all the random variables involved, all the way to the nucleation rate and the growth time associated with each supersaturation level explored and to the model parameters appearing in the corresponding constitutive equations. We have analyzed what controls the overall uncertainty in the estimation of the physical quantities above. We have shown that the distribution of droplet volumes at the level observed here is not limiting, whereas the detection technique and the image analysis algorithm play a critical role, together with the fact that the supersaturation levels and residence times that can be reasonably explored are limited. The tools and methods presented and made available to the scientific community will help in making microfluidics-based studies of nucleation more effective.
Short abstract
We continuously generated and analyzed thousands of monodispersed microdroplets via a crystal detection procedure based on an automated image analysis algorithm. Therewith the statistical process of primary nucleation of adipic acid from aqueous solution was studied.
1. Introduction
Since nucleation is stochastic, even under exactly the same experimental conditions the time for nucleation from a supersaturated solution to occur and for new crystals to be observed can vary a great deal. In order to estimate the nucleation rate, i.e., a quantity of great theoretical and practical importance, proper statistics need to be constructed, which entail a number of experiments and measurements as large as possible. Microfluidic platforms, which can generate large numbers of nearly identical microdroplets that can serve as crystallization microenvironments, have been proposed and utilized to build such statistics.1−3 Although other protocols have been proposed, a typical experiment involves observing identical droplets that have had a predefined time to form and grow new crystals;4−6 the fraction of crystal-containing droplets estimates the nucleation probability (the larger the number of droplets, the more accurate the estimate), which can then be correlated to the nucleation rate. Repeating experiments under different conditions allows determining nucleation rates as a function of, for example, supersaturation.
Such an experimental protocol is in principle simple and appealing. However, it is not devoid of issues, both experimental and theoretical, which make a careful consideration of experimental accuracy and of error propagation necessary in order to be able to attain estimates not only of nucleation rates and related model parameters but also of their uncertainty. Sources of uncertainty are several, namely the control of the experimental conditions themselves, the possibility of droplet volumes to be distributed, and the intrinsic errors associated with the observation of droplets and the detection of crystals therein, as well as the fact that there are multiple dependences that have to be explored experimentally in a statistically relevant manner, namely that of the nucleation probability on residence time and droplet volume and that of the nucleation rate on supersaturation and temperature.
The purpose of this work is a thorough analysis of these issues, based on a comprehensive set of experimental data on the nucleation of adipic acid from aqueous solutions in microdroplets of about 170 nL. The manuscript is divided into three parts, where the experimental setup and procedure are presented first (section 2), followed by the theoretical analysis of the statistics describing the experiments, accounting for all types of uncertainties mentioned above (section 3). Finally, in section 4 the theory is applied to our experiments, before finally conclusions are drawn.
2. Experimental Section
2.1. Experimental Setup
The main purpose of the experimental setup is to create a segmented flow, where microdroplets are continuously generated to act as individual (micro)crystallizers to study primary nucleation. A T-junction followed by 4 m long FEP tubing is used as the microfluidic platform. The stochastic behavior of nucleation is enhanced in small volumes; hence, the method applied in this work allows sampling of a large amount of data (droplets are mutually independent) to estimate kinetic parameters of nucleation. A similar setup was already presented by Rossi et al.6 but was operated under different fluid-dynamic conditions and at another droplet size scale. Figure 1 shows the schematic of the experimental setup, which consists of four zones; each of them serves a specific purpose and will be explained in detail in the following.
Figure 1.
Fully automated droplet-based microfluidic crystallization setup.
2.1.1. Generation Zone
First, in the generation zone (GZ), as the name suggests, droplets are generated by feeding the immiscible, dispersed, and continuous fluids into a PEEK (polyether ether ketone) T-junction (Upchurch Scientific, 0.020 in. through hole) using two precision syringe pumps (Harvard Apparatus, Pico Elite Plus). Both phases, prior to meeting at the T-junction, are continuously filtered with hydrophilic filters (Sartorius, Minisart NML Plus glass fiber 0.2 μm surfactant free cellulose acetate) to ensure a constant and low concentration of foreign particles in solution. On the continuous fluid line, a miniature low-pressure flow-through sensor (26PC Series, range 0–5 psi, Honeywell) is installed to monitor the pressure drop along the tube. For a more accurate measurement, the device is calibrated against a sensor of higher accuracy (range 0–2 bar, accuracy ±4 mbar, Omega). After formation, droplets are flown into a translucent FEP (fluorinated ethylene propylene) tubing (Upchurch Scientific, 1/16 in. OD × 0.020 in. ID, 508 μm nominal inner diameter), inserted in an in-house-built near-infrared (NIR) sensor for droplet frequency measurements. The NIR sensor is built based on the design presented by Ladosz et al.,7 with the modification of having inserted only one pair of diodes and detector perpendicular to the tube.
The NIR sensor consisted of a cube made of PEEK (nontransparent) with a through hole of 1.6 mm diameter, into which the FEP tubing was inserted. Perpendicular to this, another opening (diameter gradually reduced to 0.5 mm, coinciding with the tube ID) was bored to install, on one side, a diode emitting light in the near-infrared region with a peak wavelength at 940 nm (TSAL4400, Vishay Semiconductors) and, on the other side, the corresponding IR detector (TSL261-RLF, ams), which was further combined with an impedance amplifier.
The characteristic intensity of each media passing was detected by the NIR sensor, through a combination of reflection and refraction effects,8 thus enabling their detection. Figure 2A exemplifies the signal output for a two-phase segmented flow monitored over time. In the inset of the same figure, peaks correspond to the dispersed phase (droplets) and the baseline to the continuous phase, which exhibit a repeated pattern that resembles a periodic behavior.
Figure 2.
Droplet volume and frequency measurement. (A) NIR absorbance at a wavelength of 940 nm monitoring over time. Insert: example of NIR response for silicone oil–aqueous solution flow. (B) Droplet volume monitoring over time. (C) Pressure drop along the total length of the tube over time. (D) Histogram and corresponding gamma distribution (red line) of droplet volumes.
There are two pieces of quantitative information that can be extracted from such signal: first, the distance between the inception of successive droplets, i.e. Δtk = tk+1 – tk in Figure 2A, which is a distributed quantity, and second, the total number of droplets in an experiment, n. Assuming a constant flow rate of the dispersed phase, qd, over the whole duration of the experiment, texp, and enforcing the condition that the amount of the dispersed phase injected is distributed within all the n droplets generated yield the following estimate of the mean droplet volume, μv:
| 1 |
We attribute the variability of the measured quantity Δtk to an intrinsic variability in how droplets discharge from the T-junction; hence, we consider Δtk a random variable, with a mean and variance. From the Δtk value, the droplet volume vk can be calculated by assuming that the dispersed phase flow rate, qd, and that of the continuous phase, qc, remain constant during the whole experiment (hence the volumetric proportion of the two phases is the same in all fluid plugs between tk+1 and tk, for all k values; such assumption is confirmed by the regularity of the pressure drop profile during the course of the whole experiment, as shown in Figure 2C):
| 2 |
This procedure applied to the total number of droplets yields a droplet volume distribution, ϕ(v), here described by the gamma distribution,9,10 characterized by mean value μv, variance σv2, and coefficient of variation ψ = σv/μv. For the sake of clarity, in Figure 2B we plot droplet volumes recorded for 30 min. During this time, with a mean frequency of 0.5 Hz, we could produce 900 droplets with μv = 167 nL and ψ = 1.88%. In Figure 2D the same droplet volume distribution in the form of a histogram and its regressed gamma distribution are plotted. Note that the histogram of droplet volumes and the corresponding regressed gamma distribution are not the same quantity, and hence, they do not necessarily overlap; in fact, the distribution is normalized, the histogram is not. The comparison is made to show that the two span the same range of droplet volumes and have the same mode.
As shown in Figure 1, syringe pumps, the T-junction, and the NIR sensor were placed inside a temperature-controlled incubator, always kept at 40 °C, in order to maintain solutions at the desired dissolved state and to stay below the maximum operating temperature of the syringe pumps (45 °C).
2.1.2. Crystallization Zone
After exiting the GZ, droplets flow to the crystallization zone (CZ), which consists of an FEP tubing of a specific length, placed in a water bath at 12.6 °C. The temperature of the water bath is internally controlled using a thermostat (Huber Pilot ONE, Offenburg) and measured with a pT100 sensor to guarantee constant crystallization temperatures. Due to the supersaturation created by cooling, a number of droplets crystallize while traveling along the tube; the tube length can be varied in order to vary the residence time. We have verified that, under the experimental conditions adopted, droplets entering the CZ need to travel less than 10 mm to cool from 40 to 12.6 °C; at their average velocity, i.e., 2.06 mm/s, this corresponds to a cooling rate of about 7.5 °C/s. Since the tube lengths used in the experiments were on the order of 1 m or more, droplets can be assumed to attain the crystallization temperature instantaneously when they enter the CZ.
Note that temperature variations in the CZ are within the precision limits of the PT100 sensor, i.e., ±0.2 °C, which represents a ±1% variation of the equilibrium concentration. Considering also the precision of the balance (0.001 g) yields an uncertainty in supersaturation of about 1.5%.
2.1.3. Quench Zone
When droplets exit the CZ, they enter the quench zone (QZ) (the tube length inside this zone is 19 cm long, to be on the safe side), which is kept at a temperature 2 °C lower than that corresponding to a saturated solution at the feed concentration. Under these conditions, there is no further nucleation and negligible growth,6 but crystals that have already formed cannot dissolve. Moreover, we have verified that the quenching temperature is correctly chosen for each solution concentration, by setting the CZ temperature equal to that of the QZ and verifying that no nucleation occurred even for long residence times.
2.1.4. Observation Zone
Finally, droplets move into the observation zone (OZ), where the FEP tube is placed under a light microscope (Zeiss, AxioPlan), with which the nature of the segmented flow is observed. An AxioCam MRc5 is used to record the videos, containing the information on whether droplets carry crystals or not. The camera operates at a frame rate of roughly 12 fps; this together with the neofluar 10× objective, at a binning 8 × 8, yields images of the size 320 × 240 pixels and a resolution of 2.12 μm/pixel. For exemplary images of droplets containing crystals, selected from a recorded video, refer to Figure S1 in the Supporting Information.
2.2. Chemicals
Adipic acid (hexanedioic acid, (CH2)4(COOH)2, purity ≥99%, purchased from Sigma-Aldrich, U.K.) was used as received, and solutions were made in ultrapure deionized water (Millipore, 18.2 MΩ cm). The solubility of the only known form of crystals of adipic acid in water was determined by interpolating experimental data, covering the temperature range 0–40 °C, reported in the literature11 and given as a function of temperature by the relationship
| 3 |
where, c* and the pre-exponential factor are in grams of solute per grams of solvent (g/g), while T and the factor inside the exponent are in K. Silicone oil (Huber, M40.165.10, linear polydimethylsiloxane (PDMS)) was selected as the continuous fluid to form the segmented flow, because it features high wettability of the channel walls, insignificant partition with adipic acid, and moderate viscosity (ability to be filtered while flowing).
2.3. Experimental Procedure
Nucleation experiments in droplets were conducted using silicone oil and an aqueous solution of adipic acid, as the continuous and dispersed phases, respectively. Solution quantities were weighed in a balance (Mettler Toledo, precision 0.005 g). Around 40 g of particle-free, deionized water and the desired amount of adipic acid were filled in a beaker, covered with Parafilm to minimize evaporation losses, and stirred at 40 °C overnight, thus allowing enough time for dissolution. Various initial concentrations of the adipic acid solutions were prepared to enable investigating different supersaturations: i.e., nominal supersaturations of 2.08, 2.18, 2.28, and 2.37 were created at the chosen CZ temperature. Note that supersaturation has always been computed using the value of solubility at 12.6 °C: i.e., 0.0128 g/g.11 Before the experiment was started, the temperature of the CZ and QZ was set to 40 °C, and 10 mL of water was flushed to dissolve any adipic acid crystals that might have remained in the tube. The total tube length was fixed to 4 m (total volume equal to 0.81 mL), so that the pressure drop, ΔP, along the tube was constant for all experiments (see Figure 2C for an illustration). Then, the tube was flushed with silicone oil (qc = 200 μL/min), thus wetting the tube entirely to guarantee proper formation and downstream flow of liquid droplets. Subsequently, both flow rates were set to 100 μL/min for a few minutes to get rid of air bubbles and soon thereafter set to the experimental values reported in Table 1.
Table 1. Experimental Operating Conditions.
| fluid | Q (μL/min) | U (mm/s) | f (Hz) | μv (nL) |
|---|---|---|---|---|
| adipic acid (aq) | 5 | 2.06 | 0.5 | 167 |
| silicone oil | 20 |
The experiments in this work were carried out at very low values of the capillary number, Ca, and of the Reynolds number, Re. It is known that in such regimes the lubricating film around the droplet is thin, thus enabling the approximation of the droplet velocity to the superficial velocity, U = Q/A (where Q is the total flow rate and A is the tube cross section). Accordingly, the residence time (i.e., how long droplets travel at the crystallization temperature) is reliably calculated by the ratio of the tube length placed inside the CZ and the droplet velocity: i.e., τ = LTube/U. Thus, after placing a certain known length of the FEP tubing was placed inside the CZ, the temperatures of CZ and QZ were set to the experimental values. The CZ takes 40 min to cool to 12.6 °C (temperature measured by the Pt100 sensor inside the water bath), whereas the system needs 1 h to equilibrate the flow rates until uniform series of droplets are produced; this transient process is monitored by the real time measurement of the NIR signal. Consequently, we let the uniform droplets flow downstream for another 30 min, i.e., until they reach the OZ, before recording movies (i.e., carrying out the experiment).
Experiments at specific residence times were repeated in order to verify reproducibility and to inspect whether the probability of nucleation follows a correct trend. A repetition in this context implies refilling the syringe with adipic acid solution of virtually the same concentration, either from a fresh solution or from a redissolved one (no difference between the two conditions was observed), while other conditions, such as flow rates, tube length, and CZ and QZ temperatures, were kept the same. A summary of the operating conditions is provided in Table 1.
2.4. Crystal Detection Procedure
The images of the droplets traveling through the observation zone recorded on video can be evaluated by both a human operator and an automated image analysis algorithm. The goal is to determine which droplets contain crystals and which do not. To this end, we processed the movies by applying foreground detection, using the Computer Vision Toolbox of Matlab, which highlights the silhouette of the traveling droplet. The human operators based their assessment on visual observation. The automated procedure exploits a machine learning routine in Matlab, on the basis of a decision tree algorithm, chosen due to its computational efficiency and its capability of performing multiclass classification on a data set.12
3. Theory
3.1. Statistics of Nucleation Experiments in Identical Microdroplets
A single nucleation experiment in microdroplets consists of the generation of n virtually identical droplets, having a volume v and containing a solution that at the selected temperature and initial concentration exhibits a supersaturation S. All droplets are observed at the same residence time τ, when it is determined whether they contain a crystal (i.e., nucleation has occurred and particles have grown enough to be observable) or not (i.e., either nucleation has not occurred, or it has occurred but nuclei are too small to be observed). Due to the experimental protocol utilized droplets are isolated from each other, hence each droplet represents an independent crystallizer.
From a statistical point of view, the outcome of the observation of each droplet (success, i.e., yes, when crystals are present, and failure, i.e., no, when they are not) constitutes a Bernoulli trial. The success probability, p, depends on the three operating conditions above, i.e., v, S, and τ, and it is in principle the same for all droplets, since droplets are ideally identical. As such, the sequence of outcomes from the observation of the n droplets belonging to one experiment constitutes a Bernoulli process, whose outcome in a specific realization is the number of successes, thus the random variable h, with 0 ≤ h ≤ n. The random variable h follows the binomial distribution that governs Bernoulli processes, i.e., the probability that h is exactly k is
| 4 |
Therefore, the expected value of h is E[h] = np and its variance is V[h] = np(1 – p).
It is worth noting that the success probability under the operating conditions selected is indeed the quantity of physical interest. There is a one-to-one mapping between the success probability, p, estimated through a nucleation experiment in microdroplets, at specific values of v, S, and τ, and the nucleation rate, J, at the same level of supersaturation. Such correspondence is based on the observation that each microdroplet is a (micro)crystallizer, where stochastic nucleation is assumed to be a Poisson process. Therefore, the nucleation time in each individual droplet is distributed according to the relevant exponential distribution, whose process intensity is the product of crystallizer volume, v, and nucleation rate, J.13−15
As a consequence, when an ensemble of identical droplets is considered, the value of the success probability, p, represents an estimate also of the exponential cumulative probability corresponding to the specific value of residence time, τ, selected (this is the probability that nucleation occurs in a droplet at any time before τ). By carrying out experiments with the same v and S but at different τ, one can obtain estimates of the cumulative probability at different residence times and hence an estimate of the whole cumulative probability distribution. Such distribution for a Poisson process can be written as
| 5 |
where Δτ = τ – τg is the nucleation time, τ is the detection time that corresponds to the experimental residence time, and τg is the growth time, which is necessary for nuclei to grow to a detectable size, depends on the experimental supersaturation as well, and needs to be estimated from the experiments.16 On the basis of the discussion above, the cumulative probability F and the success probability p represent the same physical quantity.
3.2. Droplet Volumes and Success Probabilities Are Distributed
In a real experiment, however, droplets are not identical and droplet volumes are distributed, as shown in Figure 2 and discussed in section 2.1.1. Since at constant supersaturation and residence time the success probability and the droplet volume are related one to one through eq 5, with F = p, success probabilities are also distributed. The normalized volume distribution is called ϕ(v) and is characterized in this work as a gamma distribution with parameters α and β:
| 6 |
where Γ(·) is the gamma function. Therefore, the expected value of v is E[v] = μv = α/β, its variance is V[v] = σv2 = α/β2, and its coefficient of variation is 1/√α. Note also that our experimental protocol allows us to obtain droplets with a coefficient of variation as small as 2% (see Figure 2). Even though variations up to 15% have been reported in the literature5,17 and considered acceptable, we will question this consideration on the basis of the results reported below.
The corresponding normalized distribution of success probabilities, π(p), is calculated by enforcing the condition ϕ(v) dv = π(p) dp and using eq 5, with F = p. One obtains
| 7 |
The expected value and the variance of π(p) are called μp and σp2, respectively, and are calculated as
| 8 |
| 9 |
Note that these two quantities converge correctly to (1 – exp(−μvJΔτ)) and to zero, respectively, when α → ∞ (or else the coefficient of variation of the droplet volume distribution approaches 0): i.e., when the droplet volume distribution approaches a Dirac delta centered in μv. It is also worth noting that when the coefficient of variation increases from 2% to 15%, while everything else remains constant, the value of μp varies of about 0.5%, whereas the variance σp2 becomes more than 50 times larger.
The sequence of outcomes from the observation of nucleation in n droplets with nucleation probabilities distributed as in eq 7 is a random variable called hp, which is the outcome of the sum of n independent nonidentical Bernoulli trials. It follows a ”generalized” binomial distribution, i.e., the probability that hp = k is
| 10 |
The expected value of hp is nμp, since combining eq 10 and eq 8 yields
![]() |
11 |
Its variance is given by the expression
| 12 |
where we have used the identities V[hp] = E[hp2] – (E[hp])2 and V[p] = σp = E[p2] – μp2, as well as the following relationship for E[hp] (obtained by combining again eqs 10 and 8 and the last two identities):
![]() |
13 |
It is apparent from eq 12 that the variance of hp is the sum of the variance of the outcome of a Bernoulli process with success probability equal to μp and the variance of the distribution of the success probability multiplied by a factor approximately equal to the square of the number of Bernoulli trials (i.e., of droplets). Since we are going to estimate the mean of the distribution of the success probabilities as hp/n, we can use eq 12 to calculate the variance of such an estimate:
| 14 |
It is worth noting that the Bernoulli process contribution is inversely proportional to the number of trials, i.e., of droplets (which in our experiments is, with two exceptions, between about 1000 and 3000), whereas the contribution due to the droplet volume distribution is virtually independent of the number of droplets and equal to the variance of the π(p) distribution of success probabilities.
The dependence on μp of the left-hand side of the last equation (blue line) and of the two terms on the right-hand side (red and green lines for the first and second parts, respectively) is illustrated in Figure 3. The solid lines (in the main figure and in the inset) are characteristic of the data in this work: i.e., n = 1000 (red curve) and σp2 corresponding to a coefficient of variation in the distribution of volumes of 2%. The dashed lines in the main figure are obtained for a coefficient of variation in the distribution of volumes of 15% (which is reported in similar studies in the literature and considered to be acceptable5). It is striking how the variance of the mean of the distribution of the success probabilities is dominated by the intrinsic variance of the Bernoulli process (which applies also when all droplets are identical) in our case, where we painstakingly control the narrowness of the volume distribution, whereas it is dominated by the variance in the volume distribution for the data reported in the literature above. The maximum value of the variance in the latter case is about 10 times larger than in the former.
Figure 3.

Variance of the estimated success probability. The solid curves (in the main figure and in the inset) correspond to of the data of this work. The red curve is the variance in success probability due to the Bernoulli process, i.e., the first term of the right-hand side of eq 14 for n = 1000. The green curve is the contribution to the variance due to the distribution of droplet volumes, i.e., σp2 in the right-hand side of eq 14 for ψ = 1.5%, which is an average value for all experiments. The dashed curves in the main figure are obtained for ψ = 15%. The blue curves, the left-hand side of eq 14, represent the overall variance as a function of μp.
In the context of microfluidic studies of nucleation, this analysis points at the importance of measuring and reporting the droplet volume distribution in detail and in controlling it tightly, so as to minimize its effect on the propagation of experimental uncertainties. It is worth noting that, in our previous study on droplet volume distributions,9 we had concluded that even a 15% value of the coefficient of variation of the volume distribution would be acceptable for an accurate estimate of the nucleation rate through experiments such as those carried out in this study. That consideration is in principle valid if one considers that the mean value of the success probability is not affected by the coefficient of variation of the volume distribution strongly. However, the variance of, and hence the uncertainty on, the mean value of the success probability increases quickly when such a coefficient of variation increases (see eqs 8 and 9 and the exemplary values given in the discussion thereof).
3.3. Solid Detection in Droplets Is Error Prone
In the previous sections, we considered the outcome of the observation of each droplet in an experiment consisting of n droplets as a success when crystals are present and as a failure vice versa. In practice, the online camera (see section 2.4) captures images of all n droplets; out of these n images, no images (about 10% of the n droplets in an experiment) are evaluated both by a human operator and by an automatic image analysis software (which ultimately analyzes all images for all n droplets in each experiment). Neither is perfect; hence, the number of successes determined in an experiment, i.e., hd, where the subscript refers to detection, is itself a random variable. Its realization in a specific experiment depends on the statistics of the Bernoulli process and of the specific droplet volume distribution, as well as on the statistics of the image analysis technique. We consider solid detection a critical step in the whole experimental procedure, which deserves special attention; the next three subsections are dedicated to it.
3.3.1. Automated Image Analysis
Since we are fully aware that human observation is also fallible, we have assessed its quality by comparing the values of ho determined by different human operators for the same set of images. We have observed a deviation among the different operators of a few percent, i.e., of about 3%, and thus we have decided to consider the human observation as a trustworthy reference train to validate the image analysis software.
The training uses two-thirds of the droplets analyzed by the human operator, ntrain; the remaining droplets (one-third, i.e., ntest, with no = ntrain + ntest, are used as a validation set, thus allowing for the evaluation of the so-called confusion matrix12 corresponding to that experiment. Such a matrix consists of four numbers: namely, the fraction of true positives, fTP, i.e., the number of droplets where both the operator and the algorithm detect crystals divided by ntest, the fraction of true negatives, fTN, i.e., the fraction of droplets tested where neither the operator nor the software sees crystals, and the fraction of false positives, fFP, and that of false negatives, fFN, where the image analysis software assessment of the existence of crystals differs from that of the operator (which is assessed to be accurate).
The elements of the confusion matrix are used to calculate six additional quantities that will be necessary in the following: (1) xTP = fTP/(fTP + fFN), i.e., the ratio of crystals correctly detected by the algorithm to those detected by the operator; (2) xFP = fFP/(fFP + fTN), i.e., the ratio of crystals wrongly detected by the algorithm to the number of droplets that the operator has judged to contain no crystal; (3) yd = fTP + fFP, which is the fraction of droplets judged by the algorithm to be containing crystals relative to the droplets used as a validation set, ntest; (4) yo = fTP + fFN, which is the fraction of droplets containing crystals, as determined by the operator, relative to the droplets used as a validation set; (5) z1 = 1/(xTP – xFP), which will be needed in section 3.3.3; (6) z2 = xFP/(xTP – xFP), which will also be needed in section 3.3.3. These six quantities are random variables, whose statistics have been estimated for all experiments through a bootstrap procedure. Out of the no available droplets, Nbs random partitions between the training set and validation set are generated (Nbs = 250 in all experiments), and for each of them the elements of the confusion matrix and the six quantities xTP, xFP, yd, yo, z1, and z2, including their expected values (means) and variances, are calculated (see Table S1 in the Supporting Information for the former and latter numbers).
Because of the importance of understanding the role played by the automatic image analysis method, we have decided to apply two different approaches to determine the statistical properties of the random variable hd, which is the number of droplets containing crystals as determined by the image analysis algorithm. The first approach is empirical and utilizes the quantities yd and yo. The second approach has an analytical character and exploits the properties of the four quantities xTP, xFP, z1, and z2.
3.3.2. Empirical Evaluation of the Statistics of the Image Analysis Algorithm
The empirical approach is based on the following simple assumption: the number of droplets containing crystals that are detected by the image analysis algorithm, hd, equals the number of droplets containing crystals, hp, plus an error, hε:
| 15 |
All three quantities are uncorrelated random variables. Accordingly, the following relationships hold:
| 16 |
| 17 |
The three equations above constitute the model of the physical process comprising both physical experiment and automated image analysis. In order to use them, a model of the image analysis itself is needed, through which one estimates the statistical properties of the error hϵ appearing in eqs 16 and 17. This is obtained by using the results of the validation test of the image analysis algorithm (using ntest droplets and corresponding images) and its Nbs random combinations obtained through the bootstrap procedure. If Δhtest is the difference between estimated (by the algorithm) and observed (by the human operator) droplets containing crystals out of the ntest droplets, then Δhtest/ntest = yd – yo (see section 3.3.1). Through the expected value and variance of the right-hand side of this identity, one can estimate the statistical properties of the error hϵ appearing in eqs 16 and 17 as
| 18 |
| 19 |
where μϵ and σϵ2 are expected value and variance, respectively, of the difference yd – yo and are known from the bootstrap procedure. One can then rewrite eqs 16 and 17 as
| 20 |
| 21 |
One can now estimate the expected value of hp, called ĥp. This is related to the estimated value of μp, called p̂, because p̂ = ĥp/n. In fact the expected value of hd can be estimated through its measured value, called ĥd and obtained via the automated image analysis algorithm. Equations 20 and 21 yield
| 22 |
| 23 |
where ŝ = ĥd/n.
The variance of ĥp follows immediately using eqs 12 and 22 and using the property that the variance of the expected value of a random variable is the variance of that random variable divided by the number of data points, on which the expected value is calculated, i.e., V[E[hε]] = V[hε]/Nbs in this case:
| 24 |
where μp in eq 21 has been substituted by its estimate p̂ and σp2 is calculated for a π(p), whose mean is given by p̂. For the variance of p̂, one obtains from eq 24
| 25 |
Equation 23 is plotted in Figure 4A for a value of με = 0.018, which is calculated as the average of the mean errors over the 42 experiments that have been carried out and evaluated. The average variance among all 42 experiments is σε2 = 3.5 × 10–3, which has to be compared with the values plotted in Figure 3 (solid lines). Since Nbs is quite large, the variance of the estimate of the success probability p̂ can be approximated very well as V[p̂] ≈ p̂(1 – p̂)/n + σε.
Figure 4.

Parity plot of ŝ and p̂. Markers correspond to values of the 42 experiments; these are not necessarily on top of the dashed lines. Each experiment has its own variance, represented by the x,y error bars plotted for selected experiments. (A) Empirical evaluation of the statistics of the image analysis algorithm. The straight line is defined by the eq 23 for a value of με = 0.018. (B) Analytical evaluation of the statistics of the image analysis algorithm. The straight line is defined by eq 33 for values of E[w1] = 1.91 and E[w2] = 0.41.
In Figure 4A, the points corresponding to the 42 experiments are also plotted; these are close to the straight line, but not necessarily on it, because each experiment has its own value of με. It is worth noting that eq 23 does not map the interval (0, 1) of ŝ (note that 0 ≤ ŝ = ĥd/n ≤ 1 by definition) onto the interval (0, 1) of p̂, because of the systematic detection error με. Error bars in both ŝ and p̂ are also shown for some exemplary points, so as to highlight that the error is different in different experiments because of different values of p̂ and of the number of droplets n.
Additionally, the empirical approach described here includes the special case where the automated image analysis is 100% accurate. In this particular case, which is idealized and unrealistic but relevant from a theoretical point of view as we see in the following, με = σε2 = 0; hence, the straight line in Figure 4A coincides with the diagonal on which all experimental points fall.
3.3.3. Analytical Evaluation of the Statistics of the Image Analysis Algorithm
The analytical approach tries to capture the specific features of the image analysis algorithm: namely, the fact that it does not detect crystals in all droplets containing at least one and that sometimes detects that there are crystals in droplets that have none. Such features are represented by two quantities: wTP, fraction of true positives, and wFP, fraction of false positives, respectively.
As a consequence of the definitions above, the number of crystals containing droplets detected is
| 26 |
All quantities here are random variables. The corresponding relationship in terms of expected values is obviously
| 27 |
The variance of hd is calculated, by assuming that the random variables in the two multiplications on the r.h.s. of eq 27 are independent, as
| 28 |
where E[hp] and V[hp] are given by eqs 11 and 12 obeying the physical process of nucleation in microdroplets of distributed volumes.
There is a dual representation of the affine relationship between hd and hp, in eq 26, namely
| 29 |
where w1 = 1/(wTP – wFP) and w2 = wFP/(wTP – wFP) and are also random variables. This can also be written in terms of expected values:
| 30 |
Equations 26–30 constitute the model of the physical process comprising both physical experiment and automated image analysis. Their use is possible when a model of the image analysis algorithm is available. This allows estimating the statistical properties of the random variables wTP, wFP, w1, and w2 appearing in those equations. This is obtained by using the results of the validation test of the image analysis algorithm (using ntest droplets and corresponding images) and its Nbs random combinations obtained through the bootstrap procedure. The estimated values and variances of the four coefficients above, i.e., wTP, wFP, w1, and w2, can be computed through the estimated values and variances of xTP, xFP, z1, and z2 obtained from the bootstrap procedure, because the latter and the former have the same physical meaning, though the former refer to the whole set of n droplets, whereas the latter refer to the subset ntest.
As in the case of the empirical approach, one estimates the expected value of hd using the actual value of ĥd obtained by the image analysis algorithm. Then, one uses eq 31 to estimate the value of hp
| 31 |
from which one can calculate the variance of ĥp as follows (after noting that V[E[wi]] = V[wi]/Nbs, i = 1, 2):
| 32 |
The quantity of physical interest in this analysis is the mean of the distribution of success probabilities, μp, whose estimate is defined as p̂ = ĥp/n. One can calculate it by dividing both terms of eq 31 by n, thus obtaining the following expression, where—as in the empirical approach—the estimated quantity ŝ = ĥd/n is used:
| 33 |
Finally, one calculates the variance of the estimate p̂ as
| 34 |
The variance of ĥd is evaluated using eq 30 with E[hp] = np̂ and V[hp] from eq 12 with μp substituted by p̂ and σp2 calculated for a distribution π(p), whose mean is p̂.
The straight line defined by eq 33 is plotted in Figure 4B for values of E[w1] and E[w2] of 1.91 and 0.41, respectively, which are representative of the average of these quantities among the 42 experiments carried out and evaluated. The points corresponding to the 42 experiments are also plotted; they are close to the straight line, but not on it, because the two expected values above are different in each experiment. It is noteworthy that according to eq 34 the interval (0, 1) of ŝ maps onto a much broader interval of p̂, as clearly illustrated in Figure 4B. This is due to the relatively large values of the two expected values above. On the one hand, even when there are no droplets containing particles, i.e., when p̂ = 0, due to the detection error in terms of false positives the expected value of ŝ is E[w2]/E[w1] = 0.22. On the other hand, when all droplets contain crystals, i.e., p̂ = 1, due to the detection error in terms of true positives, the expected value of ŝ is (1 + E[w2])/E[w1] = 1.22.
Error bars in both ŝ and p̂ are also shown for some exemplary points. It is remarkable that the error bars are very large for experimental points where ŝ is close to its lower and upper bound and p̂ attains unphysical values, i.e., below 0 and above 1. This is a consequence of the functional form of eq 34 and of the fact that the statistics of the random variables w1 and w2 estimated through the bootstrap procedure show that these variables exhibit larger uncertainty in the limit cases where either only a few droplets contain crystals (p̂ close to 0) or most of the droplets contain crystals (p̂ close to 1).
3.3.4. Comparison of the Two Approaches
The empirical and analytical approaches to the evaluation of the statistics of the automated image analysis algorithm can be compared by plotting for the 42 experiments carried out and evaluating the values of the estimated success probability p̂ and the values of the variance associated with such estimate. This is shown in Figure 5, where the blue and red lines and symbols refer to the analytical and empirical evaluations, respectively. The symbols represent values from the experiments. It is worth noting that for each experiment different values not only of the variance but also of p̂ are obtained when the two different methods are applied and that when the values of p̂ are negative or larger than 1, they have been plotted as 0 and 1, respectively. The variance obtained applying the analytical method is always larger than that obtained with the empirical method. This is confirmed by the two curves that are calculated using eq 25 (red curve, for the empirical approach) and eq 34 (blue curve, for the analytical approach). The curves are calculated using average values of the expected values and variances of the image analysis features. In addition to the obvious observation that the analytical approach yields values that are about 10 times larger than the empirical approach, it is remarkable that the two curves have different shapes. This is due to the fact that, while in the empirical case the distribution of the image analysis error is homogeneous in p̂, in the analytical case the error is larger for p̂ approaching 0 and 1, as discussed above. Figure 5 highlights also the key role played by the image analysis algorithm in determining the accuracy of the estimate of the success probability.
Figure 5.

Variances of p̂. Blue and red solid curves and markers (connected by dashed lines for visual clarity) refer to the analytical and empirical evaluations of the image analysis statistics. Curves are calculated with average values of the expected values and variances of both image analysis features; these are found in Table S1 in the Supporting Information.
4. Evaluation of Nucleation Experiments in Microdroplets
In this section we report the experimental results and analyze first the internal consistency of each experiment, as explained in section 4.1. Then, the experiments that are indeed internally consistent are used to estimate nucleation rates and growth times, as discussed in detail in section 4.2.
4.1. Consistency and Reproducibility
The nucleation experiments in microdroplets presented and analyzed in this work involve thousands of droplets and have a duration of thousands of seconds. In an analysis of the experimental evidence it is necessary to assume that the nucleation probability (i.e., the success probability) is the same in all droplets (apart for the volume effects discussed in section 3.2), which implies that experimental conditions when droplets are formed are indeed the same for the whole duration of a single experiment. It is a legitimate question whether all droplets in a single experiment can indeed be considered similar in terms of nucleation probability or whether the droplets generated and reaching the end of the microfluidic device first are indeed similar to the droplets generated and observed last. On answering these questions hinges the possibility of judging whether two repeated experiments, i.e., carried out at the same nominal conditions of supersaturation, temperature, and droplet volume, which exhibit nevertheless different values of ŝ, could be considered comparable or not from a statistical viewpoint: i.e., could be considered belonging to the same stochastic process.
In order to answer these questions, we have devised a protocol to analyze the outcome of each individual experiment, as illustrated in Figure 6, where the horizontal coordinate is the number of droplets considered (on a logarithmic scale) and the vertical coordinate is the value of ŝ obtained for that (sub)set of droplets (each of the eight subfigures corresponds to a different experiment).
Figure 6.

Evaluation of internal consistency and reproducibility of single experiments. The rightmost symbol in each of the subplots is the ŝ value for the original population of droplets. This is further sliced into smaller subsets—going from right to left—down to 100 droplets. The shaded conelike regions are calculated by applying the confidence intervals of eq 18 to the estimated values of ŝ for the entire population. Each set of plots in the three subfigures shows repetitions of experiments carried out under the same operating conditions: (A) S = 2.28 and τ = 972.9 s; (B) S = 2.08 and τ = 1557 s; (C) S = 2.37 and τ = 681.1 s.
The first step involves slicing the set of all the droplets obtained in the specific experiment into subsets of smaller and smaller sizes and determining the value of ŝ for each of the subsets. In the case of the experiment illustrated in Figure 6A (upper left plot), the whole group consisted of 3600 droplets and exhibited a value ŝ = 0.33 (rightmost symbol in the figure). The original set was divided into two subsets of 1800 droplets each, i.e., the first 1800 droplets and the last 1800 droplets, each yielding its own ŝ value: they are represented by the two symbols vertically aligned at nslice = 1800 in the figure. Then the original is further divided—going from right to left—into three subsets of 1200 droplets each, and so on and so forth, all the way to 36 subsets containing 100 droplets each. Each subset represents an independent Bernoulli process that should be characterized by the same underlying success probability if the whole experiment consisting of 3600 droplets were indeed internally consistent. As the original set of 3600 droplets is sliced more and more, the estimate of the success probability is based on a smaller and smaller number of droplets and the associated variance (and the size of the confidence interval) becomes larger and larger. As a consequence, the 36 symbols representing the subsets with nslice = 100 occupy a vertical segment, which is much longer than the confidence interval associated with the ŝ value at n = 3600.
We can now define a criterion to establish if an experiment is internally consistent (the second step of our protocol). An experiment is internally consistent if the segments obtained as described above at all different values of nslice, from 100 to the total number of droplets in an experiment, are contained in the shaded conelike region, also depicted in Figure 6A. Such a region is calculated by applying the following confidence intervals, ±ϵ, to the ŝ value estimated for the entire droplet population of that experiment:
| 35 |
where α = 1 – C and C is the confidence level (we have used C = 0.95 and α = 0.05 in all cases). Note that the confidence interval depends only on the number of droplets in the subset of droplets and not on the estimated success probability (contrary to that obtained for a Bernoulli process). The shaded conelike region in Figure 6A accounts for the statistical features of any stochastic process, particularly of a Bernoulli process or of a sum of a number of independent nonidentical Bernoulli trials; it shows that there is an increase of uncertainty associated with a decrease in the number of trials and it quantifies it.
The confidence intervals of eq 35 are calculated using Hoeffding inequality,18,19 which gives a lower bound to the probability that the estimated value of a stochastic variable, ŝ in our case, differs in absolute value from its expected value by a quantity smaller than ϵ:
| 36 |
Equation 35 is obtained from eq 36 by setting n = nslice and Pr{|Δŝ| < ϵ} = C and solving for ϵ.
Therefore, the fact that the ŝ estimates spread vertically as nslice decreases, e.g., from 3600 to 100 in Figure 6A, while remaining within the conelike region defined by the Hoeffding inequality, indicates that all Bernoulli trials are (with 0.95 confidence level) statistically representative of the same underlying physical phenomena, as requested by internal consistency. In this study, we have carried out 45 experiments under different operating conditions, and all of them have fulfilled the criterion for internal consistency.
Let us now consider repetitions of the same experiment: e.g., the three experiments in Figure 6A (at S = 2.28 and τ = 972.9 s), the two in Figure 6B (at S = 2.08 and τ = 1557 s), and the three in Figure 6C (at S = 2.37 and τ = 681.1 s). We use again the ŝ estimates obtained using all droplets in each experiment and the confidence intervals defined by eq 35 to assess whether the different experiments carried out under the same experimental conditions can be considered as representing the same physical phenomena, and hence, whether they can be used to estimate comparable values of the nucleation rate.
In the three cases illustrated in Figure 6 the different repetitions yield slightly different values of the ŝ estimate, but the Hoeffding conelike regions for the different repetitions exhibit a major overlap, interpreted as the fulfilment of a reproducibility criterion. When two repeated experiments at the same nominal conditions do not overlap, we consider one of the two an outlier, which has to be excluded from further analysis. Out of 45 experiments, three of them (not shown here for brevity) have not fulfilled such a criterion and have been excluded, thus leaving 42 experiments (i.e., about 93% of the total) that have been analyzed further as discussed in the following. We were unfortunately not able to determine the root cause of the anomalous behavior of the three outliers.
4.2. Estimation of the Parameters in the Constitutive Equations Relevant to Nucleation in Microdroplets
In order to accomplish the overall objective of this study, namely the evaluation of nucleation rates and related quantities from the microfluidic experiments analyzed as discussed above, one needs several ingredients: (1) the set of estimated values of probabilities, p̂, and associated variances, V[p̂], obtained from experiments through the procedures discussed above, (2) the physical model of the process to be characterized, which in this case links the observed success probability to the underlying nucleation rate, (3) the constitutive equations, i.e., giving the nucleation rate, J, and the growth time, τg, in terms of supersaturation, temperature, and droplet volume, (4) the optimization procedure to estimate the parameters in the constitutive equations and the associated methodology to evaluate the variance and the confidence intervals of the estimated parameters. In this section, three sets of data will be considered, namely those obtained from the 42 experiments when the 3 approaches to the statistical evaluation of the image analysis algorithm are used: namely, those called empirical (section 3.3.2) and analytical (section 3.3.3) and the approach where we assume that the automated image analysis is error-free (briefly discussed in section 3.3.2, and denominated for brevity ”error-free” in the following). To each of these, two parameter regression methods will be applied, consisting of one or two steps, in both cases based on the weighted nonlinear least-squares method. Thus, six different sets of model parameters (and related uncertainties) will be obtained and comparatively assessed.
4.2.1. Data
The m experiments in this work, with m = 42, have been carried out at four different supersaturation levels, Sj, j = 1, ..., 4 (which is defined as the ratio of actual concentration to solubility at the experimental temperature, assuming that the activity coefficients at the two concentration levels are equal); at each Sj level, mj experiments have been carried out (with m1 + m2 + m3 + m4 = m). The values of p̂ obtained by applying the three different approaches to image analysis are presented in Table S2 in the Supporting Information and plotted in Figure 7 as a function of the residence time; the subfigures in the same column are obtained using the same approach, whereas the subfigures in each row correspond to the same supersaturation level, Sj. The error bar shown for every experimental value of p̂ is calculated as the standard deviation from the corresponding variance V[p̂]: i.e., as its square root. Whenever automated image analysis delivers a value of p̂ below 0 or above 1, p̂ is assigned the value 0 or 1, respectively. The three approaches to assess the quality of the automated image analysis algorithm yield different, though similar, values of p̂ but very different values of the standard deviation: i.e., very small (almost not visible) for the error-free approach, small for the empirical approach, and larger in the case of the analytical approach.
Figure 7.

Estimation of success
(nucleation) probability. (A–D) p̂ estimated
considering that the image analysis algorithm
is error-free, presented in filled squares. (E–H) p̂ estimated applying the empirical approach to evaluate the statistics
of image analysis algorithm and presented in filled diamonds. (I–L) p̂ estimated with the analytical image analysis statistics
and presented in filled circles. Probabilities result from experiments
performed at different supersaturations, Sj, each of which are plotted in different subfigures
as a function of residence times, τ. Error bars correspond to
the standard deviation, i.e.
, calculated with the corresponding methods.
Dashed lines are computed with eq 37 using values of the estimated parameters Ĵ and
found in Table 2.
Solid and dashed-dotted lines are computed
with eq 45 using the
parameters ã, b̃, γ̃,
and δ̃ found in Table 2, estimated via the 2-step method (see section 4.2.5) and via
the 1-step method (see section 4.2.6), respectively.
4.2.2. Physical Model
The mean of the success probability μp is related to the nucleation rate J through the relationship of eq 8, that can be recast as
| 37 |
where μv and α are the mean volume and squared reciprocal of the coefficient of variation of the distribution of droplet volumes, respectively. In this equation J and τg depend on the supersaturation, S; the mean probability μp depends also on the residence time τ and on the mean droplet volume μv.
4.2.3. Constitutive Equations
The nucleation rate is expressed through classical nucleation theory, i.e., using the following standard two-parameter relationship:
| 38 |
where a and b are the dimensionless parameters to be estimated, A0 = 1 × 108 m–3 s–1 is a constant reference value of the pre-exponential coefficient A, and a = ln(A/A0). It is worth noting that by choosing this nucleation rate expression we are assuming that the phenomenon we are observing is primary nucleation. Ascertaining whether this is homogeneous or heterogeneous, e.g., at the interface between continuous and dispersed phase, is beyond the scope of this work; eq 38 can describe both.
For the growth time the following two-parameter semiempirical relationship is used:
| 39 |
where γ and δ are positive dimensionless parameters to be estimated; C0 = 1 × 103 s is a constant reference value of the numerator. The last equation expresses the growth time as the ratio between a characteristic length and an empirical growth rate, proportional to the driving force for growth, namely S – 1, to the power of δ; the positive model parameters guarantee that τg decreases with increasing S.
4.2.4. Variances and Confidence Intervals of the Estimated Parameters
Approximate values of the variance and of the confidence intervals of the parameters estimated through the two methods presented below, called here for the sake of simplicity βk, can be determined as follows. One calculates the sensitivity matrix: i.e., the matrix of the derivatives of the model predictions with respect to the model parameters, which are based on a linearization of the model in the vicinity of the estimated parameter values. Then one uses the sensitivity matrix to estimate the covariance matrix of the parameter estimates, by exploiting the weighted sum of the residuals. The variance of the kth model parameter, V[βk] is given by the kth diagonal element of such covariance matrix, whereas the standard deviation is obviously its square root. The confidence interval of the parameter βk with a confidence of, e.g., 95% is then given by its standard deviation multiplied by the value of the t distribution for a 95% confidence and a number of degrees of freedom equal to the number of data points minus the number of parameters.
4.2.5. Parameter Estimation via a Two-Step Method
The first method consists of two optimization steps. First, the mj experiments carried out at each of the four supersaturation levels, Sj, are used to estimate the corresponding nucleation rate Jj and growth time τg,j, and the associated variances. Then the four values of these quantities thus obtained are exploited to estimate the two plus two parameters in the corresponding constitutive equations. In all cases we use the fminsearch function as implemented in the optimization toolbox in Matlab.
Therefore, in the first step the optimization problem involving the minimization of the following weighted sum of the squared residuals is solved:
| 40 |
where p̂j,i is the value of the mean success probability in the ith experiment of the series at Sj (with V[p̂j,i] being its variance) and p̃j,i is the corresponding predicted value, defined as
| 41 |
It is worth noting that the values of μv, α, and τ are known, though specific to the ith experiment of the series at Sj, as indicated by the subscripts.
This optimization is performed four times for the four supersaturation levels, thus yielding the estimated values of nucleation rate, Ĵj and τ̂g,j (with j = 1, ..., 4), and the associated variances V[Ĵj] and V[τ̂g,j], calculated through the procedure summarized in section 4.2.4. Such procedure is repeated for each approach in evaluating the experimental data: i.e., the empirical, analytical, and error-free evaluation of the automated image analysis algorithm. The results are reported in detail in Table 2 and plotted in Figure 8 (where the columns refer to the three different approaches to assess image analysis, whereas the first and the second row are for nucleation rate and growth time, respectively). There is an important remark to be made here. It is rather apparent that, while the error bars on the Jj values increase with S, those on the τg values exhibit the opposite trend. We attribute this effect to the fact that at low (high) supersaturation the experimental points span a larger (smaller) interval of residence time values, thus making the estimate of the nucleation rate more (less) accurate and that of the growth time less (more) accurate. Substituting the estimated values of nucleation rate and growth time in eq 37 yields the curves plotted as dashed lines in Figure 7. It is rather apparent that in all cases the experimental points are described rather well by such regressions.
Table 2. Nucleation Rates, Ĵ, and Growth Times, τ̂g, Estimated with Eq 40 via the 2-Step Method with p̂ and V[p̂] Stemming from the Error-Free, Empirical and the Analytical Treatments of the Image Analysis Statistics and Parameters of the Nucleation Rate, a and b, and Parameters of the Growth Time, γ and δ, Estimated, either Using Ĵ and τ̂g Values, Obtained Using Nominal Supersaturations, Sj, and Mean Volumes, via the 2-Step Method, or Obtained Applying Eq 45 Directly for All Supersaturations, Si, via the 1-Step Regressiona.
| Sj | J (10–7 m–3 s–1) | τg (s) | ã | b̃ | γ̃ | δ̃ | |
|---|---|---|---|---|---|---|---|
| Error-Free | |||||||
| 2-step | 2.08 | 0.92 ± 0.18 | 1153.9 ± 37.3 | 0.36 ± 1.18 | 1.86 ± 0.80 | 1.40 ± 0.31 | 2.36 ± 1.19 |
| 2.18 | 1.59 ± 0.41 | 988.7 ± 58.1 | |||||
| 2.28 | 1.94 ± 0.42 | 836.4 ± 36.4 | |||||
| 2.37 | 3.44 ± 1.49 | 615.0 ± 40.5 | |||||
| 1-stepb | –1.89 ± 3.19 | 0.79 ± 1.92 | 1.43 ± 0.27 | 3.15 ± 1.39 | |||
| Empirical | |||||||
| 2-step | 2.08 | 0.82 ± 0.26 | 1105.7 ± 73.3 | 0.48 ± 1.23 | 1.96 ± 0.85 | 1.36 ± 0.35 | 2.16 ± 1.389 |
| 2.18 | 1.62 ± 0.46 | 973.8 ± 66.7 | |||||
| 2.28 | 1.98 ± 0.34 | 827.8 ± 30.3 | |||||
| 2.37 | 2.92 ± 1.53 | 581.3 ± 67.1 | |||||
| 1-stepb | –1.47 ± 3.68 | 1.05 ± 2.26 | 1.30 ± 0.40 | 2.87 ± 1.78 | |||
| Analytical | |||||||
| 2-step | 2.08 | 0.77 ± 0.39 | 1143.3 ± 104.6 | 2.11 ± 2.76 | 2.87 ± 1.94 | 1.60 ± 0.42 | 2.79 ± 1.47 |
| 2.18 | 2.14 ± 0.85 | 1057.3 ± 58.1 | |||||
| 2.28 | 2.44 ± 0.69 | 858.3 ± 38.1 | |||||
| 2.37 | 4.99 ± 2.19 | 623.5 ± 35.1 | |||||
| 1-stepb | –0.76 ± 4.80 | 1.44 ± 2.99 | 1.28 ± 0.52 | 2.67 ± 2.05 | |||
The errors reported are the 95% confidence intervals.
ã, b̃, γ̃, and δ̃ values are for Si for the 1-step method.
Figure 8.

Nucleation rate and growth time estimates.
Filled and open markers
indicate Ĵ and
values, respectively, together with associated
confidence levels, estimated with eq 40 at each nominal supersaturation. Curves correspond
to the constitutive relationships, eqs 38 and 39, employing values of ã, b̃, γ̃, and
δ̃, obtained via either the 2-step method (solid lines)
or the 1-step method (dashed-dotted lines) and found in Table 2. Columns refer to the three
different approaches to assess image analysis, (A.1 and A.2) error-free,
(B.1 and B.2) empirical, and (C.1 and C.2) analytical, whereas the
first and second rows are for nucleation rate and growth time, respectively.
Then, in the second step the following two optimization problems are solved:
| 42 |
| 43 |
where the values of Ĵj and τ̂gj (together with their variances) are the outcome of step 1 and the predicted values J̃j and τgj are calculated using eqs 38 and 39, with S = Sj.
These two optimizations yield the estimated values of the model parameters ã, b̃, γ̃, and δ̃, with the associated variances and confidence intervals (see Table 2 for all of the values obtained). The results are illustrated in two ways. On the one hand, the constitutive relationships eqs 38 and 39 are plotted in Figure 8 as solid lines, using the estimated values of the model parameters. On the other hand, such values are used to draw the solid lines in Figure 7; they differ from the corresponding dashed lines, because of the deviation in predicting the experimental values of Jj and τgj using the constitutive equations and the estimated model parameters ã, b̃, γ̃, and δ̃.
4.2.6. Parameter Estimation via a One-Step Method
The second parameter estimation method consists of a single step where the four model parameters in the constitutive equations are estimated altogether, by solving the following optimization problem:
| 44 |
where p̂i is the value of the mean success probability in the ith experiment (with V[p̂i] being its variance), and p̃i is the corresponding predicted value, defined as
| 45 |
where the known values μv,i and αi, Si, and τi are those characterizing the droplet volume distribution, supersaturation and the residence time, respectively, in the ith experiment. These values are reported in Table S2 in the Supporting Information.
The solution of the optimization problem above (once for each of the three approaches to the characterization of the image analysis algorithm) yield the estimated values of the model parameters ã, b̃, γ̃, and δ̃, as well as their variances and confidence intervals (see Table 2). We have also compared the values of the objective function of eq 44, calculated using these values of the model parameters, with those of the same objective function calculated using the corresponding model parameters obtained through the 2-step method (which solve the minimization problem of eqs 40, 42, and 43). It is apparent that the former are always smaller than the latter, thus providing an indication that the four parameters obtained with the 1-step method correspond indeed to a global optimum. The outcome of the parameter estimation carried out using the 1-step method is illustrated in two figures. First in Figure 8 the constitutive relationships eqs 38 and 39 are plotted vs supersaturation as dashed-dotted lines, using the estimated values of the model parameters. Then, in Figure 7 the same values are used to draw as dashed-dotted lines the exponential distributions of nucleation times for the four experimental supersaturation levels.
4.2.7. Comparative Assessment
Let us first analyze more in depth the results obtained by applying the 2-step method. On the one hand the uncertainties on the experimental p̂ values shown in Figure 7 increase significantly on going from the error-free approach to the empirical and to the analytical approach. On the other hand, the uncertainties on the estimated nucleation rates are similar for the error-free and the empirical approaches, whereas they are significantly larger in the case of the analytical approach. Such uncertainties are primarily determined by the limited number of residence time values explored at each supersaturation level and only secondarily by the errors due to the automated image analysis method. Nevertheless, it is worth noting that even at the highest supersaturation level, where the estimated nucleation rates are quite different following the three different approaches and the error bars on the estimated nucleation rates are the largest (see Figure 8, top row), the confidence intervals calculated using the three different approaches overlap at least partially. In contrast, the values of growth time are remarkably similar when the different approaches are applied, with comparatively much smaller uncertainties than for nucleation rates.
As far as the values of the estimated model parameters reported in Table 2 are concerned, it is worth noting that neither γ nor δ changes sign in their entire 95% confidence intervals (whatever the approach applied to evaluate the automated image analysis is): i.e., they fulfill the physical constraints associated with the chosen functional form of τg given by eq 39. The nucleation rate parameter b does not change sign either within the 95% confidence interval, which is also needed to fulfill the requirement that the nucleation rate increases for increasing supersaturation. In contrast, the nucleation rate parameter a changes sign when it is varied within the whole 95% confidence interval, which is physically possible, since this does not affect the sign of the pre-exponential rate constant in the nucleation rate expression of eq 38. It can be readily observed that the estimated value ã is larger when it is estimated applying the analytical approach, which is compensated by a larger value of b̃.
Finally, let us consider the absolute values of the growth time and of the nucleation rate obtained using the three different approaches to evaluating the automated image algorithm (see Figure 8). On the one hand, the growth time values at the individual supersaturation levels, as well as the overall growth time correlation, are remarkably similar in the three cases. On the other hand, the individual nucleation rates are similar when they are obtained from the error-free and the empirical approach but rather different from the analytical approach (with differences between 25% and 50%). The nucleation rate correlation describes well the estimated values in all three cases, as observed in the top row of Figure 8. The differences among the three cases underline the importance of image analysis in the quantitative determination of nucleation rates using nucleation experiments in microdroplets. Image analysis has to be as accurate as possible, and its associated uncertainties have to be evaluated carefully. However, it is remarkable that even with very different assumptions about the quality of the automated image analysis algorithm the estimated values of the nucleation rate and of its correlation differ by less than a factor of 2 within the supersaturation interval explored.
Let us now consider the 1-step method, which is appealing at least from a theoretical viewpoint. With reference to both Figures 7 and 8, it is rather obvious that the curves obtained using the parameters estimated via the 1-step method are not able to describe the experimental data as accurately as those drawn based on the 2-step method. This is certainly due to the fact that in the 1-step case only 4 parameters are used to describe all 42 experiments, whereas in the 2-step case the data at each supersaturation level are fitted using 2 parameters, namely J and τg, and then the J values and the τg values obtained in this way are regressed independently, each of the two sets using two parameters. Such an observation also justifies the fact that the confidence intervals on ã, b̃, γ̃, and δ̃ obtained through the 1-step method are larger than those on the same parameters estimated via the 2-step method, in fact much larger in the case of the nucleation rate parameters ã and b̃. However, in the case of b̃ such large uncertainty leads to confidence intervals within which the parameter b may change sign and become negative, which is physically inconsistent, as it would lead to the wrong dependence of J on S: namely, J would decrease with increasing S, which is obviously unphysical.
5. Concluding Remarks
In this work we report a study of primary nucleation of adipic acid from microdroplets consisting of a supersaturated aqueous solution. The study has been carried out using a fully automated microfluidic setup (including a crystal detection procedure based on an automated image analysis algorithm), with which we were able to continuously generate and analyze thousands of monodispersed microdroplets. A careful statistical analysis of the experimental results has allowed us to assess how uncertainties propagate from the stochastic occurrence (or not) of nucleation in an individual droplet, to statistically characterize the ensemble of hundreds or thousands of virtually identical droplets in an individual experiment, and to estimate the nucleation rate and the growth time of the related model parameters. Such a comprehensive study has allowed us to obtain a number of novel results, which not only apply to the specific system studied but also bear a much more general validity.
From an experimental perspective we have devised ways to generate microdroplets exhibiting a very narrow droplet size distribution and to measure it, and we have developed a new protocol to verify the internal consistency of an individual experiment: i.e., a protocol that allows us to identify and eliminate possible outliers (only about 7% of our experiments).
Also from an experimental viewpoint, we have developed an automated image analysis procedure, which exploits training from the operator first (based on about 10% of all the droplets belonging to an individual experiment), and it is utilized thereafter to analyze all the droplets of the same experiment, thus enabling the slicing procedure described in section 4.1 and used to assess internal consistency of that experiment.
From a theoretical point of view, we have been able to characterize the statistics of the estimated value of the success (or nucleation) probability in each experiment in terms of expected value and variance. We have done this by accounting explicitly for the nature of the stochastic process (which is the sum of independent nonidentical Bernoulli trials), for the distribution of microdroplet volumes, and for the error of the automated image analysis procedure.
Whereas the variance associated with the Bernoulli process can be reduced by increasing the number of droplets (the variance of the estimated success probability is inversely proportional to the number of droplets on which the estimate is based), that due to the volume distribution is not. The latter is effectively negligible for droplet volume distributions as narrow as that in our experiments (coefficient of variation of about 2%) but becomes dominant if the coefficient of variation of the volume distribution exhibits values of e.g. 15% (not uncommon in the microfluidic literature on microdroplet generation).
The accuracy of image analysis is critical, not only because it tends to dominate uncertainty in the estimation of nucleation rates and related model parameters but also because such accuracy is very difficult to assess. In this context, we have considered three alternative approaches on how to evaluate it: namely, an analytical approach based on a description of the underlying physics and statistics of the image analysis algorithm, an empirical approach, and an idealized approach that assumes that image analysis is error-free (which is used for reference and represents a best-case scenario). These different approaches treat the outcome of the image analysis procedure as a random variable, whose expected value and variance follow from those of the physical process (nucleation in microdroplets) and of the image analysis algorithm; the latter statistics are obtained by applying the bootstrap procedure described in section 3.3.1. The variance of the estimated success probability decreases significantly on going from the analytical to the empirical and to the error-free approach.
We have shown how estimated success (nucleation) probabilities obtained at the same supersaturation but for different residence times can be used to estimate the nucleation rate and the growth time associated with that supersaturation. It is striking that the uncertainty on such physical quantities is less sensitive on the assumed accuracy of the image analysis procedure, because it is affected a great deal by the amplitude of the range of residence times explored (which is by definition limited, as one can readily understand by inspecting Figure 7) and by the steepness of the exponential distribution of success probabilities as a function of nucleation time (given by eq 37 and illustrated in the same Figure 7), which is a physical feature of the system.
Finally, we have estimated the two plus two model parameters appearing in the constitutive equations for the nucleation rate and the growth time. To this aim, we have tested two procedures and shown that the one consisting of two steps, i.e., estimating nucleation rates and growth times at each supersaturation level first, and then estimating the model parameters by fitting the two constitutive equations independently to those values, outperforms that based on a single step. This is true despite the fact that the estimated values of the parameters depend on which of the image analysis evaluation approaches is applied (though the obtained values differ one from the other by no more than 50%) and that the associated confidence intervals are rather large (though physically consistent).
The key conclusion of our analysis is that microfluidics can indeed be used to estimate nucleation rate values and nucleation rate relationships because it enables performing a huge number of nucleation experiments in microdroplets under the same controlled conditions. However, the accuracy of these estimates very much depends not only on the number of droplets but also on the droplet volume distribution, on the image analysis algorithm, and on the number and range of experimental conditions (residence times and supersaturation levels) explored. We have developed and reported exact equations that allow calculations of the statistics of how uncertainty propagates by accounting for all the effects above. We hope that researchers in the field, who are motivated to exploit the valuable features of microfluidics to characterize nucleation, will apply such relationships and will carefully report the uncertainty of their measurements and of their estimates.
Acknowledgments
The authors are thankful to Agnieszka Ladosz for fruitful discussions, to Bernhard Roth for conducting proof of concept experiments of the experimental setup, to Daniel Trottmann for the crystallization and quench zone construction, to Jovo Vidic for setting up the IR sensors and adjusting its signal reception, to Ashwin Rajagopalan and Stefan Bötschi for their help with the image analyzer, and to Pietro Binel and Tuvshee Otgonbayar for classifying a large training set of droplets. Furthermore, the authors gratefully acknowledge the financial support of the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation program (grant agreement number 2-73959-18).
Glossary
Notation
- ã
estimated parameter of the nucleation rate J expression
- A
tube cross section (m2)
- A0
reference value of the pre-exponential coefficient of the nucleation rate J expression (m–3 s–1)
- b̃
estimated parameter of the nucleation rate J expression
- c*
solubility of the solute in solvent (g/g)
- C
confidence level
- C0
reference value of numerator of the growth time τg semiempirical expression
- E
operator of expected value
- f
frequency of droplet formation and flow (Hz)
- F
cumulative exponential probability distribution
- fTP
fraction of true positives
- fFP
fraction of false positives
- fTN
fraction of true negatives
- fFN
fraction of false negatives
- hp
number of droplets containing crystals/number of success for distributed droplets
- ho
number of droplets containing crystals observed by the human expert
- hd
number of droplets containing crystals detected by the image analyzer
- hε
error of the droplets containing crystals detected by the image analyzer
- ĥd
estimated expected value of hd
- ĥp
estimated expected value of hp
- J
nucleation rate (m–3 s–1)
- LTube
length of the crystallization zone (m)
- m
number of experiments
- n
number of droplets
- no
number of droplets classified by the human expert
- ntest
number of droplets classified used as a training set for the image analyzer
- ntrain
number of droplets classified used for validation of the image analyzer
- nslice
subdivision of the number of droplets for assessing internal consistency
- Nbs
number of bootstrap made per experiment
- p
success probability
- p̂
estimated value of the success probability
- Pr
operator of probability
- qd
flow rate of the dispersed phase (μL/min)
- qc
flow rate of the continuous phase (μL/min)
- Q
total flow rate (μL/min)
- ŝ
estimated value of the measured success probability
- S
supersaturation
- texp
duration of the experiment (s)
- T
temperature (°C)
- U
superficial velocity (mm/s)
- vk
volume of each droplet (nL)
- V
operator of variance
- wTP
ratio of n droplets containing crystals correctly predicted
- wFP
ratio of n droplets wrongly predicted to contain crystals
- w1
rearranged relation of xTP and xFP using ntest droplets
- w2
rearranged relation of xTP and xFP using ntest droplets
- xTP
ratio of the fraction of ntest droplets containing crystals correctly predicted
- xFP
ratio of the fraction of ntest droplets wrongly predicted to contain crystals
- yo
fraction of droplets containing crystals, as determined by the operator
- yd
fraction of droplets containing crystals, as detected by the image analyzer
- z1
rearranged relation of xTP and xFP using n droplets
- z2
rearranged relation of xTP and xFP using n droplets
Glossary
Greek Letters
- α
shape parameter of the gamma volume distribution
- β
rate parameter of the gamma volume distribution (m–3)
- γ̃
estimated parameter of the growth time τg semiempirical expression
- Γ
gamma function
- δ̃
estimated parameter of the growth time τg semiempirical expression
- ϵ
confidence intervals calculated using the Hoeffding’s inequality
- μp
mean of the success probability distribution
- μv
mean volume of a population of droplets (nL)
- με
expected value of the difference yd – yo
- π(p)
probability density function of success probability distribution
- σv
standard deviation of droplet volumes
- σp2
variance of the success probability distribution
- σε2
variance of the difference yd – yo
- τ
residence time (s)
- τg
growth time (s)
- ϕ(v)
probability density function of droplet volume distribution
- ψ
coefficient of variation
Glossary
Latin Letters
- ΔP
pressure drop (mbar)
- Δtk
time interval between two successive droplets (s)
Glossary
Subscripts and Superscripts
- i
index of number of experiments
- j
index of number of supersaturations employed
- k
running variable of number of droplets
Supporting Information Available
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acs.cgd.9b00562.
Exemplary figures including the scale of droplets containing crystals of the images recorded at the observation zone, values for estimating the uncertainty added to nucleation probability due to processing images with the classification decision software, and all experimental conditions (i.e., superaturations, residence times, mean droplet volumes, shape factors of the volume distribution, and total number of droplets) (PDF)
Author Present Address
§ School of Chemical and Biomolecular Engineering, Georgia Institute of Technology, Atlanta, United States.
The authors declare no competing financial interest.
Supplementary Material
References
- Laval P.; Crombez A.; Salmon J.-B. Microfluidic droplet method for nucleation kinetics measurements. Langmuir 2009, 25, 1836–1841. 10.1021/la802695r. [DOI] [PubMed] [Google Scholar]
- Teychene S.; Biscans B. Microfluidic device for the crystallization of organic molecules in organic solvents. Cryst. Growth Des. 2011, 11, 4810–4818. 10.1021/cg2004535. [DOI] [Google Scholar]
- Toldy A. I.; Badruddoza A. Z. M.; Zheng L.; Hatton T. A.; Gunawan R.; Rajagopalan R.; Khan S. A. Spherical crystallization of glycine from monodisperse microfluidic emulsions. Cryst. Growth Des. 2012, 12, 3977–3982. 10.1021/cg300413s. [DOI] [Google Scholar]
- Dombrowski R. D.; Litster J. D.; Wagner N. J.; He Y. Crystallization of alpha-lactose monohydrate in a drop-based microfluidic crystallizer. Chem. Eng. Sci. 2007, 62, 4802–4810. 10.1016/j.ces.2007.05.033. [DOI] [Google Scholar]
- Lu J.; Litster D.; Nagy Z. K. Nucleation studies of active pharmaceutical ingredients in an air-segmented microfluidic drop-based crystallizer. Cryst. Growth Des. 2015, 15, 3645–3651. 10.1021/acs.cgd.5b00150. [DOI] [Google Scholar]
- Rossi D.; Gavriilidis A.; Kuhn S.; Candel M. A.; Jones A. G.; Price C.; Mazzei L. Adipic acid primary nucleation kinetics from probability distributions in droplet-based systems under stagnant and flow conditions. Cryst. Growth Des. 2015, 15, 1784–1791. 10.1021/cg501836e. [DOI] [Google Scholar]
- Ladosz A.; Rigger E.; von Rohr P. R. Pressure drop of three-phase liquid-liquid–gas slug flow in round microchannels. Microfluid. Nanofluid. 2016, 20–49. 10.1007/s10404-016-1712-7. [DOI] [Google Scholar]
- Wolffenbuttel B. M. A.; Nijhuis T. A.; Stankiewicz A.; Moulijn J. A. Novel method for non-intrusive measurement of velocity and slug length in two- and three-phase slug flow in capillaries. Meas. Sci. Technol. 2002, 13, 1540–1544. 10.1088/0957-0233/13/10/305. [DOI] [Google Scholar]
- dos Santos E. C.; Ladosz A.; Maggioni G. M.; von Rohr P. R.; Mazzotti M. Characterization of shapes and volumes of droplets generated in pdms t-junctions to study nucleation. Chem. Eng. Res. Des. 2018, 17, 2852–2863. 10.1016/j.cherd.2018.09.001. [DOI] [Google Scholar]
- Prileszky T. A.; Ogunnaike B. A.; Furst E. M. Statistics of droplet sizes generated by a microfluidic device. AIChE J. 2016, 62, 2923–2928. 10.1002/aic.15246. [DOI] [Google Scholar]
- Mullin J. W.Crystallization; Oxford University Press: 2001. [Google Scholar]
- Rajagopalana A. K.; Schneeberger J.; Salvatori F.; Bötschi S.; Ochsenbein D. R.; Oswald M. R.; Pollefeys M.; Mazzotti M. A comprehensive shape analysis pipeline for stereoscopic measurements of particulate populations in suspension. Powder Technol. 2017, 321, 479–493. 10.1016/j.powtec.2017.08.044. [DOI] [Google Scholar]
- Chen K.; Goh L.; He G.; Kenis P.J.A.; Zukoski C.F.; Braatz R.D. Identification of nucleation rates in droplet-based microfluidic systems. Chem. Eng. Sci. 2012, 77, 235–241. 10.1016/j.ces.2012.03.026. [DOI] [Google Scholar]
- Brandel C.; ter Horst J. H. Measuring induction times and crystal nucleation rates. Faraday Discuss. 2015, 179, 199–214. 10.1039/C4FD00230J. [DOI] [PubMed] [Google Scholar]
- Maggioni G. M.; Mazzotti M. Modelling the stochastic behaviour of primary nucleation. Faraday Discuss. 2015, 179, 359–382. 10.1039/C4FD00255E. [DOI] [PubMed] [Google Scholar]
- Maggioni G. M.; Bosetti L.; dos Santos E. C.; Mazzotti M. Statistical analysis of series of detection time measurements for the estimation of nucleation rates. Cryst. Growth Des. 2017, 17, 5488–5498. 10.1021/acs.cgd.7b01014. [DOI] [Google Scholar]
- Zhang S.; Curien C. G.; Veesler S.; Candoni N. Prediction of sizes and frequencies of nanoliter-sized dropletsin cylindrical t-junction microfluidics. Chem. Eng. Sci. 2015, 138, 128–139. 10.1016/j.ces.2015.07.046. [DOI] [Google Scholar]
- Wassermann L.All of Nonparametric Statistics; Springer: 2006. [Google Scholar]
- Massart P. The tight constant in the dvoretzky-kiefer-wolfowitz inequality. Ann. Probab. 1990, 18, 1269–1283. 10.1214/aop/1176990746. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.




