Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Jan 1.
Published in final edited form as: Anal Chem. 2010 Jan 1;82(1):307–315. doi: 10.1021/ac901982u

Estimation of Migration-time and Mobility Distributions in Organelle Capillary Electrophoresis with Statistical-Overlap Theory

Joe M Davis 1,*, Edgar A Arriaga 2
PMCID: PMC2803746  NIHMSID: NIHMS162752  PMID: 20041721

Abstract

The separation of organelles by capillary electrophoresis (CE) produces large numbers of narrow peaks, which commonly are assumed to originate from single particles. In this paper, we show this is not always true. Here, we use established methods to partition simulated and real organelle CEs into regions of constant peak density and then use statistical-overlap theory to calculate the number of peaks (single particles) in each region. The only required measurements are the number of observed peaks (maxima) and peak standard deviation in the regions, and the durations of the regions. Theory is developed for the precision of the estimated peak number and the threshold saturation above which the calculation is not advisable due to fluctuation of peak numbers. Theory shows that the relative precision is good, when the saturation lies between 0.2 and 1.0, and is optimal when the saturation is slightly greater than 0.5. It also shows the threshold saturation depends on the peak standard deviation, divided by the region’s duration. The accuracy and precision of peak numbers estimated in different regions of organelle CEs are verified by computer simulations having both constant and non-uniform peak densities. The estimates are accurate to 6%. The estimated peak numbers in different regions are used to calculate migration-time and electrophoretic-mobility distributions. These distributions are less biased by peak overlap than ones determined by counting maxima and provide more correct measures of the organelle properties. The procedure is applied to a mitochondrial CE, in which over 20% of peaks are hidden by peak overlap.

Introduction

Biological particles including microorganisms, viruses, intact cells, and organelles have negative electrophoretic mobilities that are complex functions of the particles’ surface zeta potentials, morphologies, and electrical membrane potentials, as well as the analytical conditions and medium (e.g., electric field and ionic strength)16. The mobility of individual particles has been characterized using capillary electrophoresis (CE) and is highly heterogeneous even when particles appear to be identical7. When fluorescently labeled particles are analyzed by CE coupled to a laser-induced-fluorescence detector (LIF), the electropherogram consists of a collection of narrow peaks that migrate out in a migration-time window that is dependent on the separation conditions and, most importantly, on the mobility range of the particles in the biological sample. The resultant window can be interpreted as a migration-time distribution, which can be transformed into a mobility distribution.

Mobility distributions are important, because they show quantitative differences among different organelle types8 and the influence of organellar content on mobility6. For example, they reveal the differences among mitochondria from different muscle types of animals with different ages9. The distributions also are indicators of the gentleness or harshness of procedures for sample preparation10. For distributions to have diagnostic potential, they must be determined without bias. A source of bias addressed by this paper is the overlap of peaks in the CEs, from which the distributions are determined.

A “peak” is defined as a fluorescence intensity profile of a single organelle as it travels through the LIF detector, whereas an “observed peak” is a detected maximum that is comprised of one or more organelle peaks. The numbers of peaks and observed peaks differ because of peak overlap. If we could separate most peaks, we could approximate the migration-time distribution by partitioning a CE into bins and counting the peaks in each. In a recent paper, we used statistical-overlap theory (SOT) and an analogy to the Type-II error of hypothesis testing to predict conditions, for which peak overlap is small enough that the numbers of peaks and observed peaks in such bins are statistically indistinguishable11. Under these conditions, migration-time and mobility distributions can be estimated simply by counting observed peaks in the CE.

For any analysis the organelle concentration has a value, beyond which these conditions are not met because too many peaks overlap. In principal, the analysis can be improved or the sample can be diluted or injected in smaller amounts. However, it may be difficult to improve the analysis. Furthermore, sample dilution and subsequent analysis are not always possible, because organelles have a finite lifetime. Samples also may be limited in size or to a single injection, as are individual cells12,13. In such cases, we may be forced to interpret organelle CEs with many overlapping peaks. In this paper, we develop procedures for using SOT to estimate the actual numbers of peaks in the different bins from the numbers of observed peaks. With these procedures, the total number of peaks in the CE, and the migration-time and mobility distributions, can be estimated.

Because organelle CE entails the injection of only hundreds to thousands of discrete particles, statistical fluctuations among otherwise replicate injections affects the numbers of peaks and observed peaks. Because these numbers differ, the histograms determined from different CEs vary, causing an uncertainty in the migration-time and mobility distributions determined from any one CE. We must know the uncertainty to evaluate the distributions’ reliability.

Theory

Assumptions

The objectives of this paper are to use SOT to estimate accurate and precise numbers of peaks, migration-time distributions, and mobility distributions from organelle CEs. Several approaches exist in SOT1418, but in all cases a separation is a member of a large ensemble of separations, in which the number, migration times, and heights of peaks are governed by probability distributions. Here, the CE bins are chosen such that the peak density in each is nearly constant. However, the density can vary among different bins, and each bin has its own ensemble. The variation of peak numbers models the variation of the injected number of organelles; the variation of migration times models the variation of mobilities among different organelles. The number of observed peaks also varies among ensemble members.

Review of basic equations

Let m and p equal the numbers of peaks and observed peaks in an ensemble member (i.e., bin), with and equaling the means of these numbers in the ensemble. The means are related by the saturation14,17

α=4m¯σRs/X (1a)

which is a metric of peak crowding. Peak numbers and migration times usually are modeled by Poisson statistics for empirical15,19,20 and theoretical14,16,21,22 reasons. For a Poisson distribution of peaks having constant density and width14

p¯=m¯exp(α) (1b)

In eq 1a, σ is the peak standard deviation, X is the bin duration, and Rs is the average minimum resolution that separates adjacent peaks. The value of Rs is not arbitrary but depends on the ensemble’s distribution of peak heights23 and amount of peak overlap24 in a manner that is non-intuitive to most practitioners. The peak-height distribution usually is modeled by an exponential function, which is consistent with the empirically measured or estimated distribution of peak heights in complex mixtures19,2528 and also the prediction of theory28,29. Recently, this Rs function was absorbed into the saturation by defining a new variable, the effective saturation αe30

αe=α/Rs=4m¯σ/X (2a)

which allowed eq 1b to be fit by the empirical expression

p¯=m¯/(1+β1αe+β2αe2) (2b)

over the range, 0 ≤ αe ≤ 25, with β1 = 0.775 ± 0.002 and β2 = 0.0750 ± 0.0009.

Eqs 1 and 2 express the same result. In this paper, we express all theory relative to eq 1, because SOT fundamentally depends on the saturation α. Thus, the effective saturation αe is not explicitly used, although some findings are reported relative to it. Practitioners may prefer eq 2 because it is independent of Rs and avoids the numerical analysis needed with eq 1 (see Procedures section). Conversions between α and αe are reported in Part One of Supplementary Material to this paper.

For Poisson statistics, the variance of the peak number m, σm2, among ensemble members is . The variance of the observed peak number p, σp2, is31

σp2=m¯exp(α)[12αexp(α)] (3)

Protocol

We partition CEs into bins of known duration X, as shown in Figure 1a, using a standard statistics formula and count the number of observed peaks (maxima) p in each bin. For each bin, we set p equal to , eq 1b, and calculate from the known peak standard deviation σ and predicted Rs value. Here, σ is assumed constant to model peak widths in CEs having a large post-column sheath flow32. Unlike previous calculations of determined by least-squares fittings to eq 1b of multiple p values from different separations19,27,33, the calculation of is exact, requiring new theory that restricts p to values less than eq 1b, avoids the “double-value” problem14, and evaluates the calculation’s precision. This theory is discussed here but details are deferred to Supplementary Material for brevity’s sake. These calculations produce an estimate and its standard deviation for each bin. The sum of the ’s is the total number of peaks in the CE, whereas the ’s of different bins determine the discrete SOT-estimated migration-time distribution. The dashed curve, dashed-line histogram, and solid-line histogram in Figure 1b are the actual migration-time distribution in Figure 1a, the SOT-estimated distribution, and the approximate distribution obtained by simply counting the numbers of observed peaks. The graph ordinate is the density, or the number of peaks (or observed peaks) per unit time. The first two distributions agree closely, but the last one is biased because of peak overlap, illustrating the need for SOT calculations. The error bars are the standard deviations of the ’s, which gauge the precision of the SOT-estimated distribution. The inset to Figure 1b is the SOT-estimated mobility distribution, calculated from the SOT-estimated migration-time distribution by mapping the temporal bin boundaries into mobilities. Because the relation between time and mobility is non-linear and because mobilities are signed, the orientation, bin height, and bin width differ from the SOT-estimated migration-time distribution.

Figure 1.

Figure 1

a) Simulated organelle CE. Dashed lines represent bin boundaries spanned by interval X. b) Histograms of SOT-estimated (dashed line) and observed-peak (solid line) migration-time distributions of CE in a). Dashed curve is actual migration-time distribution. Inset is SOT-estimated mobility distribution. c) Graphs of four normalized model migration-time distributions f (ζ) vs reduced time ζ used to simulate organelle CEs.

Figure 1c is a graph of four model migration-time distributions used to simulate CEs and test the protocol. The distributions are graphed against reduced time ζ (which lies between 0 and 1) and scaled relative to ordinate f (ζ), such that the area under each is unity. The distributions differ markedly and are assigned names for easy reference (Gaussian, bimodal, asymmetric, and constant). On interpreting the simulations by SOT, we assume the distributions are unknown and subsequently assess their accuracy.

Interpretation of calculated m̄

The implications of calculating from a single p value and eq 1b are discussed. Figure 2a is a graph of vs based on eq 1, when peak heights are exponentially random. This curve describes a relation between means. The p value of any bin rarely equals the mean , however, because of statistical fluctuation. Figure 2b shows the consequences of equating them. The circle on the vs curve represents the means (, ) of a particular ensemble. The bold curve in the figure’s lower center is the probability distribution of the number of peaks (i.e., the m distribution). The bold curve to the right is the probability distribution of the number of observed peaks (i.e., the p distribution). Their means coincide with the circle. Both distributions are discrete but are shown as continuous functions for simplicity.

Figure 2.

Figure 2

Distribution and precision of me. a) Graph of vs for exponentially random peak heights. b) Mapping of p into by the vs curve. Circle represents means of a particular ensemble; bold curves are p and m distributions of the ensemble; dashed curves are m distributions of other ensembles; squares are members of the ensemble’s m distribution. c) Graph of vs m̄, showing broadening of the me distribution from reduction of ∂p̄/m̄. Error bars represent standard deviation σp of p distributions. d) Relation among upper limit to p̄, curve maximum, and width of the p distribution.

Consider the p value identified by the horizontal arrow, a, in Figure 2b. On its identification with eq 1b, it is mapped by the vs curve into the value associated with the vertical arrow, a′. However, this is not the ensemble mean. Rather, it is the mean of another m distribution, represented by the finely dashed curve. A similar argument can be made for the p and values connected by arrows b and b′; this is the mean of a third m distribution, represented by the coarsely dashed curve. However, the ’s so determined almost coincide with m values actually belonging to the ensemble, as represented by the squares in Figure 2b. This coincidence is the basis of our analysis: m̄’s determined by identifying eq 1b with single random p values form a distribution related to the m distribution. These determined m̄’s henceforth are called me to distinguish them from the ensemble mean, and they comprise the me distribution.

Standard deviation of me distribution

The me distribution is broadened by our analysis. This is shown in Figure 2c, in which circles represent two different (, ) coordinates. The arrows have the same meaning as in Figure 2b and show that the me distribution broadens as ∂/∂ decreases (the derivative is written as a partial to show that σ/X is constant). This broadening decreases the precision of me, reducing its reliability. The standard deviation σme of me is derived in Part Two of the Supplementary Material and can be expressed as

σme(σX)1/2=12(αRs[12αexp(α)])1/2exp(α/2)[α1dlnRs/dα]α11dlnRs/dα (4a)

or as the coefficient of variation (CV), equal to 100 σme/

CV(σ/X)1/2=200(Rsα[12αexp(α)])1/2exp(α/2)[α1dlnRs/dα]α11dlnRs/dα (4b)

These equations can be evaluated at any α if Rs is known. For exponential peak heights, Rs can be predicted from theory24; alternatively, the numerical solution so predicted is well fit by the empirical equation

Rs=Rs(α)=0.725+β3α+β4α2+β5α3 (4c)

over the range, 0 ≤ α ≤ 3.85, with β3 = −0.1910 ± 0.0001, β4 = 0.0039 ± 0.0001, and β5 = 0.00188 ± 0.00002. The coefficient 0.725 in eq 4c is the theoretical value of Rs at α = 0 23. The functions σme and CV in eqs 4a and 4b equal or exceed their counterparts for the Poisson distribution, which are m¯ and 100/m¯. Monte Carlo simulations of the me distribution are reported in Part Three of the Supplementary Material.

For good precision, me should be calculated to the left of the maximum of the vs curve in Figure 2a (i.e., the low- side), where ∂p̄/ is large. This action also avoids the “double-value problem”14, i.e., the correspondence of to two m̄’s except at the maximum. However, should not be near the maximum for another reason.

Largest interpretable p value and threshold saturation

Figure 2d shows that , , and α have threshold values on the low- side of the p vs curve, beyond which large outliers of the p distribution can exceed the curve maximum. If this happens, eq 1 has no solution. We can derive a relation among the threshold value of , the curve maximum, and the width of the p distribution that assures us that this rarely happens.

The maximum value, max, is found by calculating ∂/∂ from eq 1 and setting it equal to zero

p¯m¯=eα[1m¯αm¯]=eα[1(α1dlnRs/dα)1]=0 (5)

The expression ∂α/ is given by the reciprocal of eq S-9 in Part Two of the Supplementary Material, whereas dlnRs/dα is evaluated from eq 4c. The solution to eq 5 is given in the Results and Discussion; for the moment, we observe that max is a multiple of the reciprocal σ/X ratio, i.e., max = δ X/σ, where δ is a constant.

For ≥ 10, the p distribution is described well by a Gaussian envelope having mean and standard deviation σ p 11,15. A Gaussian distribution has negligible density three to four standard deviations from its mean. Therefore, the threshold saturation αt, or the largest saturation at which statistical fluctuations of p cause very few (if any) problems with estimating me, is established by the equation

p¯t+γσp=p¯max (6a)

where t is the threshold value of and γ is a parameter between 3 and 4. Figure 2d shows the relation among t, the threshold peak number t, γ σp, and max. Using eqs 1a, 1b, and 3, and the expression max = δ X/σ, we can explicitly write eq 6a as

αt4Rs(αt)Xσexp(αt)+γ2αtRs(αt)Xσexp(αt)[12αtexp(αt)]=δXσ (6b)

with Rs(αt) equal to eq 4c, as expressed with α =αt. Eq 6b shows the threshold saturation αt is determined by parameter γ and the reciprocal σ/X ratio.

Procedures

Threshold values

Eq 5 was solved by bisection to determine the scalar δ. For various σ/X, eq 6b was solved for αt by bisection, with γ = 3. The threshold t value then was calculated from αt, eq 1a, and eq 4c. Values of the threshold effective saturation were calculated from αt as described in Part One of the Supplementary Material. CEs were simulated to verify α t.

CE simulations

CEs were simulated as previously described24 along a reduced time coordinate ζ = (tto)/1D between 0 and 1, with t, to, and 1D equaling time, the time of the first peak’s elution, and the CE duration, respectively. The Box Muller transform34 was used to mimic Poisson distributed peak numbers. In any CE, peaks were Gaussians with constant standard deviations and exponentially random heights. The number of observed peaks (maxima) was determined.

Calculation of me

The peak number me was calculated from the number of observed peaks p in a bin, the σ/X ratio, and eq 1. The calculation was a dual bisection, in which = me determined α via eqs 1a and 4c, and then in which was adjusted so that eq 1b equaled p. (Alternatively, practitioners may wish to determine me from eq 2b, whose solution is

me=18β2(σ/X)[14p(σ/X)β1β124β2β12p(σ/X)+1{4p(σ/X)}2] (7)

Evaluating eq 7 is a straightforward computation and simpler than performing the dual bisection, but the me so determined is slightly less accurate since it is based on a fit.)

Assessment of eqs 4a and 4b

Five hundred CEs of constant peak density (i.e., single-bin CEs) were simulated for different and for σ/X ratios ranging from 0.00005 to 0.0013. Peak numbers me were estimated as described above. To assess accuracy, the percentage error between the average me and was calculated. The standard deviation and CV of me were compared to eqs 4a and 4b. The minimum of eq 4b was determined with the golden search algorithm34.

Analysis of migration-time distributions

Five hundred CEs were simulated with the four model migration-time distributions in Figure 1c. The ratio of the peak standard deviation to the CE duration, σ/1D, was 8 × 10−5 or 8 × 10−6 (these are realistic values in organelle CE, e.g., σ = 0.008 s and 1D = 100 – 1000 s). Equations for the distributions and their migration times are reported in Part Four of the Supplementary Material.

For each distribution and σ/1D ratio, the mean number of peaks in the entire CE, tot, was incremented (e.g., tot = 100, 200, …) until the saturation of the bin of greatest peak density exceeded the threshold saturation αt determined as described below. The CE was partitioned into N bins of equal duration X

N=trunc[(2ptot)1/3]+1 (8)

where ptot is the total number of observed peaks in the entire CE and trunc means truncation. Eq 8 is one bin larger than the minimum bin number of a common statistics formula35 (with ptot equaling the number of data points for binning) and it determined the σ/X ratio as Nσ/1D (i.e., X = 1D/N). The peak number me in each bin was calculated as described above. The number of peaks in the entire CE, me,tot, was calculated by summing estimated peak numbers in all bins

me,tot=i=1Nmei (9)

where m is the me estimate for the ith bin. The average, standard deviation, and CV of me,tot values were calculated.

To assess accuracy, the percentage error between the average value of me,tot and tot was calculated. The standard deviation of me,tot was compared to its theoretical counterpart calculated by error propagation of eq 9

σme,tot=[i=1Nσmei2]1/2 (10)

with σmei2 equaling the variance of the ith bin, as computed from eqs 4a and 4c. The saturation α in these equations was calculated by estimating the mean numbers of peaks and observed peaks in each bin from SOT equations for non-uniform migration-time distributions36, identifying these estimates with and m̄, and substituting them in eq 1b. This calculation provides the best estimate of the saturation, with which to associate eq 1b. The theoretical CV was calculated as the product of eq 10 and 100/tot, and was compared to the CV’s determined by simulation.

Average histograms of the four migration-time distributions were calculated for different tot from me’s determined from 500 simulations. They were compared to the actual normalized distributions by scaling the histogram areas to equal the average of me,tot, divided by tot. The scaling makes it easy to compare distributions for different tot and to assess if the distributions are underestimated or overestimated.

Application to mitochondrial CE

The total number of peaks in, and the migration-time and mobility distributions of, a mitochondrial CE37 were estimated. The CE was acquired at 200 Hz. Maxima produced by baseline noise were removed by clipping as described elsewhere11. In brief, the standard deviation of the baseline noise was estimated from three signal-free regions spanning 50 to 98.6 s, six multiples of it were added to a linear interpolation of the lower noise bound estimated at 145 points, and all signal below the sum was clipped. This action is appropriate for Gaussian noise, which is negligible three or more standard deviations from the mean. The peak standard deviations σ of 10 isolated observed peaks spanning the CE were determined by moments analysis. The migration-time distribution was constructed as described above and transformed into a mobility distribution by mapping the times t of its bin boundaries into mobilities μ

μ=L/(Et)μeo (11)

where L is the capillary length (0.4 m), E is the electric field strength (40 kV/m), and μeo is the electroosmotic mobility (5.2 × 10−8 m2/V-s).

Results and Discussion

Threshold values of p̄ vs m̄ curve

By solving eq 5, we find the vs curve maximizes at α ≈ 1.601. (If Rs were constant, it would maximize at α = 114.) On substituting this α and its associated Rs value (see eq 4c) into eq 1, we discover that max is 0.185 X/σ. Thus, the scalar δ in eq 6 is 0.185.

Figure 3a is a graph of the threshold saturation αt, threshold effective saturation αe,t, and logarithm of the threshold peak number t vs the logarithm of the σ/X ratio, as determined from eq 6b. All thresholds increase with decreasing σ/X because σp/ decreases, allowing t to move closer to the curve maximum. For a Gaussian p distribution, the probability that p exceeds max is [1erf(3/2)]/2=0.00135. The circles in Figure 3a are αt values determined by extrapolating p in 50,000 simulations to the expected number of excesses, 67.5 = (0.00135)(50000). Least-squares fits to the curves in this figure are reported in Part Five of the Supplementary Material.

Figure 3.

Figure 3

Results for single-bin separations of constant peak density. Symbols in panels b) - d) are simulation results for σ/X equaling 0.00005 (Inline graphic), 0.0001 (○), 0.0005 (□), 0.0009 (◇), and 0.0013 (△). a) Graph of threshold saturation αt, threshold effective saturation αe,t, and logarithm of threshold peak number t vs log (σ/X). Circles are αt values determined by simulation. b) Graph of percentage error between and average me vs saturation α. c) Graph of standard deviation of me, σme, multiplied by (σ/X)1/2 vs α. d) Graph of coefficient of variation, CV = 100σme/m̄, divided by (σ/X)1/2 vs α.

Analysis of simulations with constant peak density

Our procedure is based on applying SOT to different bins of a partitioned CE. The peak density in each bin is roughly constant, but it can vary among different bins. We first assess the accuracy and precision of me values determined from a single bin of constant peak density.

Accuracy of me

Figure 3b is a graph of the percentage error between and the average me of 500 simulations vs saturation α for five σ/X ratios spanning a 26-fold range. (The σ/X ratios in subsequent determinations of migration-time distributions lie within this range.) The results are somewhat scattered but the trend is clear. The accuracy decreases with α, because eq 1 slightly underestimates peak overlap at high saturation. For α ≤ 1, the accuracy is −6% or better.

Precision of me

Figure 3c is a graph of the standard deviation of me, σme, multiplied by (σ/X)1/2 vs α. Figure 3d is a graph of the coefficient of variation, CV = 100σme/m̄, divided by (σ/X)1/2 vs α. The symbols are results of 500 simulations for the same σ/X ratios in Figure 3b; the curves are graphs of eqs 4a and 4b. These equations are based on a propagation of errors (see Part Two of the Supplementary Material) and overestimate σme and CV at large σme but they agree with simulation for α ≤ 0.8 or so. The bold curves are graphs of

σme(σ/X)1/2=12αRs;CV(σ/X)1/2=200Rsα (12)

which are the limits of eqs 4a and 4b as α approaches zero. On substituting eq 1a into eq 12, we find the limits are σme=m¯ and CV=100/m¯, that is, the Poisson limits stated earlier. With increasing α, σme and CV exceed these limits, because the me distribution is broadened by the decreasing slope, ∂p̄/m̄, of the vs curve.

The graph of CV/(σ/X)1/2 has a shallow minimum at α ≈ 0.524, with a value of 320.5. The minimum occurs, because Poisson statistics reduces the CV as (or α) increases, whereas the diminishing slope, ∂p̄/m̄, increases the CV as (or α) increases. Ultimately the latter effect dominates. It is fortuitous that the minimum is shallow, with roughly the same precision over the range, 0.2 ≤ α ≤ 1.0. Regardless of the threshold saturation, we suggest that α not exceed unity if good precision is sought.

Figure 3d shows that the relative precision is best, when σ/X is small. For example, if is decreased by 10-fold and σ/X is increased by 10-fold, the saturation α doesn’t change. Therefore, the right-hand sides of eqs 4a and 4b don’t change, but the left-hand sides do. The estimate me is 10-fold smaller, but σme is smaller by only 10 and the CV is larger by 10.

Analysis of simulations with different migration-time distributions

Our estimates from CE simulations having the migration-time distributions in Figure 1c follow the same trends as do simulations of constant peak density. This is not surprising, since the estimates entail repetitive applications of eq 1 to contiguous bins of approximately constant peak density. However, the quantitative results vary with the distribution.

Accuracy of me,tot

Figure 4 is a graph of the percentage error between the mean total number of peaks tot and the average me,tot of 500 simulations vs the logarithm of tot. Each me,tot was calculated from eq 9 for the four model distributions and the two σ/1D ratios, 8 × 10−5 and 8 × 10−6. The tot range is greater for the smaller σ/1D ratio, because the saturation is proportional to the product of and σ. The number N of bins ranged from 6 to 45, with σ/X ratios between 1.0 × 10−4 and 1.3 × 10−3. The symbols at the curve ends have tot values, at which the saturation in the bin of maximum peak density exceeded the threshold saturation. As in Figure 3b, the results are somewhat scattered but the trend is the same, with the average of me,tot underestimating tot at high saturation and an accuracy of roughly −6% or better. Some tot values greatly exceed typical particle numbers in organelle CE, but they show the potential of theory.

Figure 4.

Figure 4

Accuracy of SOT estimates for migration-time distributions in Figure 1c. Graph of percentage error between tot and average me,tot of simulations vs log (tot). Line types are the same as in Figure 1c (e.g., solid for Gaussian migration-time distribution). Normal-weight curves are results for σ/1D = 8 × 10−5; bold curves, for σ/1D = 8 × 10−6. Ordinates of symbols are tot values, at which the largest saturation exceeds αt

Precision of me,tot

Figure 5a is a graph of σme,tot vs tot for σ/1D = 8 ×10−6. The symbols are results of 500 simulations for the four model distributions. The curves were calculated from eq 10 and agree with simulation, indicating the total variance can be modeled by a discrete sum of variances for different bins. As before, σme,tot exceeds the Poisson limit, σme,tot=m¯tot, represented by the bold curve. For a given tot, σme,tot differs for the different distributions because eq 10 and α vary with changing peak density. Figure 5b is the corresponding graph of the coefficient of variation, CV, vs tot. As before, minima are found, but the tot at which minimization occurs and the minimum itself differ for different distributions.

Figure 5.

Figure 5

Precision of SOT estimates for migration-time distributions in Figure 1c and σ/1D = 8 × 10−6. a) Graph of standard deviation σme,tot vs tot. Symbols are simulation results (Gaussian (○), bimodal (△), asymmetric (◇), constant (□)). b) As in a), but for the coefficient of variation CV. c) Graph of weighted coefficient of variation CVwt vs tot. d) Graph of saturation α vs reduced time ζ, when tot corresponds to the minimum CVwt. Values of tot are 10750 (Gaussian), 19000 (bimodal), 16000 (asymmetric), and 23000 (constant).

The CV is a metric of precision for the entire CE. We may be more interested in good precision in bins of high peak density, since bins of low peak density have little impact on the distribution. By weighting the CV by the fraction of peaks in different bins, we can target tot values for optimal precision where most peaks are found. Unlike most separations, in favorable cases the CV can be adjusted in organelle CE by diluting or concentrating the sample. Accordingly, we define a weighted CV

CVwt=i=1Nm¯iCVim¯toti=1Nme,iCVime,tot (13)

where CVi is the CV of the ith bin. Figure 5c is a graph of CVwt vs tot for the four distributions. The small discontinuities in the curves occur as the bin number changes. As before the tot producing the minimum CVwt differs with the distribution, but now the minimum itself is the same (about 6). The unifying attribute is shown in Figure 5d, which is a graph of the saturation α in different bins vs the reduced time ζ. For nonuniform distributions, the α’s of bins with large peak density are slightly greater than 0.5. This α range produces a small CV for a single bin, as shown in Figure 3d, and it is not surprising that a sum of CV’s weighted by the fraction of peaks follows the same trend. Similar trends to those in Figure 5 are found for the σ/1D ratio, 8 × 10−5.

Estimation of migration-time distribution

Figure 6 reports graphs of the four normalized distributions f (ζ) vs reduced time ζ, first shown in Figure 1c, for the two σ/1D ratios. The symbols and error bars are the means and standard deviations of me’s determined from 500 simulations for three tot values, scaled such that the distribution area is the average me,tot, divided by tot. Each symbol corresponds to a different bin. For each distribution, one value of tot corresponds to the minimum CVwt, whereas the other tot values are smaller and larger. Although all distributions are estimated well, the precision is best at the minimum CVwt if we consider only the bins of large peak density. If possible, separation conditions should be adjusted such that these bins have saturations slightly greater than 0.5.

Figure 6.

Figure 6

Scaled SOT-estimated migration-time distributions for tot values less than (circles), equal to (squares), and greater than (diamonds) tot at the minimum CVwt. a) σ/1D = 8 × 10−6, with tot equaling 750, 10750, and 21800 (Gaussian); 1450, 19000, and 34750 (bimodal); 1275, 16000, and 31300 (asymmetric); and 1575, 23000, and 51000 (constant). b) σ/1D = 8 × 10−5, with tot equaling 160, 950, and 1675 (Gaussian); 270, 2000, and 2700 (bimodal); 250, 1650, and 2425 (asymmetric); and 325, 2500, and 4900 (constant).

Application to mitochondrial CE

Figure 7a is a noise-clipped mitochondrial CE containing 1707 observed peaks and first discussed in ref 37. The inset expands the CE and shows overlapping maxima, suggesting that SOT estimates could be useful. The CE first was partitioned among uniform bins. For such bins, however, detail was lost around the maximum peak density near 380 s and slowly varying densities were binned to excess. Accordingly, eq 8 was applied to three regions spanning 0 to 332 s (ptot = 99; N = 6; X = 55.3 s), 332 to 770 s (ptot = 1355; N = 14; X = 31.3 s), and 770 s to 1848.6 s (ptot = 253; N = 8; X = 134.8 s). The average standard deviation σ calculated from 10 observed peaks was 11.1 ms, with a coefficient of variation equaling 8.4. These values are almost constant due to a large post-column sheath flow. It also is likely they are peaks of single mitochondria, because the coefficient of variation is small. The σ/1D ratio is 6.0 × 10−6; the σ/X ratios range from 8.2 × 10−5 to 2.0 × 10−4, for which the threshold saturation αt is 1.1 or larger (see Figure 3a).

Figure 7.

Figure 7

Experimental organelle CE data. a) Mitochondrial CE. Arrows near time axis are demarcations among three regions. Inset shows peak overlap in region of maximum peak density. b) Histograms of migration-time distributions estimated by SOT (dashed line) and obtained by counting observed peaks (solid line). Main panel shows distributions in second region; inset shows distributions in first and third regions, with break between them. Error bars are SOT-estimated standard deviations. c) Mobility distributions determined from SOT-estimated and observed-peak migration-time distributions.

The estimated peak number me in each bin was determined from eq 1 and the number of observed peaks p. The saturation α of each bin was approximated from eq 1a as19

αln(p/me) (14)

and used to evaluate equations dependent on α. Eq 14 was used to estimate α instead of the detailed procedures outlined earlier, because p and me are the only metrics we have (i.e., we do not know and m̄). The saturation in the bin of greatest peak density is 0.52, which is less than αt and the upper saturation, α ≈ 1, for precise determinations of me.

Figure 7b reports histograms of migration-time distributions, as determined by estimating me (dashed line) and counting observed peaks (solid line). Because the bins have different widths, the ordinate was scaled such that the bin area equals the number of peaks or observed peaks. The error bars are the scaled standard deviations of me. The estimated total number of peaks me,tot is 2163̄ ± 56̄ (CV = 2.6). Over 20% of the peaks are hidden by peak overlap, mostly in the second region.

The numbers of peaks and observed peaks differ in the three bins of highest density, and they differ somewhat in the two bins to their right (Figure 7b). We evaluated the observed-peak numbers in these bins with our simple procedure based on the Type-II error analogy11. As noted earlier, this procedure allows us to decide if the numbers of peaks and observed peaks are statistically the same. For the three bins of highest density, the procedure is not satisfied, and for the two other bins it shows the numbers of observed peaks are questionable estimates. However, the observed-peak numbers in the other bins satisfy the procedure. Thus, our current work is consistent with our earlier work.

Figure 7c shows the mobility distributions calculated from the migration-time distributions and eq 11. As before, the bin areas equal the numbers of observed peaks or peaks. Because the relation between time and mobility is non-linear, and the mitochondrial and electroosmotic mobilities have opposite signs, the two distribution types appear differently. Most peaks overlap in the mobility range, −2 × 10−8 to −3 × 10−8 m2/V-s.

Conclusions

We have shown that SOT can be used to estimate both accurate and precise migration-time and mobility distributions in partitioned organelle CEs from a series of single observed-peak numbers, as long as the saturation does not exceed unity. Usually, organelle CEs are assumed to be free of peak overlap. Our work shows this is not always true. The mitochondrial CE in Figure 7a has significant peak overlap, requiring SOT calculations to reduce distribution bias. We recommend first screening organelle CEs for peak overlap by our simple procedure based on a Type-II error analogy11, and then making SOT calculations if necessary.

It may be tempting first to transform the migration times of observed peaks into mobilities and then to make the peak-overlap analysis in mobility space, with bins of equal width in mobility. This is unwise, because peak standard deviations that are constant in time (as in organelle CE with post-column sheath flow) are not constant in mobility. They not only vary from bin to bin but within a bin, complicating the analysis.

We recognize that the number of bins in the SOT-estimated migration-time and mobility distributions is not optimal, because the bin number is calculated from the total number of observed peaks (see eq 8). Ideally, the bin number should be calculated from the total number of peaks, but of course this is unknown prior to the SOT estimation. Fortunately, Figure 6 shows the error in the estimated distributions is small.

Unfortunately, our procedure cannot be extended with high precision to chromatography, except in special cases. Consider gas and liquid chromatography, wherein one might find (for example) 300 observed peaks in a 30-min separation. According to eq 8, these separations would partition among 9 bins of duration X = 3.33 min. For peak standard deviations σ of 2 – 4 s, σ/X would range from 0.01 to 0.02. Even the minimum CV of a bin would vary from 32 to 45, as evaluated from eq 4b. Much smaller peak widths would be required for a precise application.

Supplementary Material

1_si_001

Acknowledgments

E.A.A. is supported through an NIH grant AG20866.

Contributor Information

Joe M. Davis, Department of Chemistry and Biochemistry, Southern Illinois University at Carbondale, Carbondale, IL 62901 USA.

Edgar A. Arriaga, Department of Chemistry, University of Minnesota, Minneapolis, MN 55455 USA

Literature Cited

  • 1.Tsoneva IC, Tomov TC. Bioelectrochemistry and Bioenergetics. 1984;12:253–258. [Google Scholar]
  • 2.Radko SP, Chrambach A. J Chromatogr B. 1999;722:1–10. doi: 10.1016/s0378-4347(98)00307-7. [DOI] [PubMed] [Google Scholar]
  • 3.Duffy CF, Gafoor S, Richards DP, Admadzadeh H, O’Kennedy R, Arriaga EA. Anal Chem. 2001;73:1855–1861. doi: 10.1021/ac0010330. [DOI] [PubMed] [Google Scholar]
  • 4.Fuller KM, Arriaga EA. J Chromatogr B. 2004;806:151–159. doi: 10.1016/j.jchromb.2004.03.050. [DOI] [PubMed] [Google Scholar]
  • 5.Pysher MD, Hayes MA. Langmuir. 2005;21:3572–3577. doi: 10.1021/la0473097. [DOI] [PubMed] [Google Scholar]
  • 6.Chen Y, Arriaga EA. Langmuir. 2007;23:5584–5590. doi: 10.1021/la0633233. [DOI] [PubMed] [Google Scholar]
  • 7.Duffy CF, McEathron AA, Arriaga EA. Electrophoresis. 2002;23:2040–2047. doi: 10.1002/1522-2683(200207)23:13<2040::AID-ELPS2040>3.0.CO;2-3. [DOI] [PubMed] [Google Scholar]
  • 8.Xiong G, Aras O, Shet A, Key NS, Arriaga EA. Analyst. 2003;128:581–588. doi: 10.1039/b301035j. [DOI] [PubMed] [Google Scholar]
  • 9.Ahmadzadeh H, Andreyev D, Arriaga EA, Thompson LV. J Gerontol Series A: Biol Sci Med Sci. 2006;61A:1211–1218. doi: 10.1093/gerona/61.12.1211. [DOI] [PubMed] [Google Scholar]
  • 10.Fuller KM, Arriaga EA. Anal Chem. 2003;75:2123–2130. doi: 10.1021/ac026476d. [DOI] [PubMed] [Google Scholar]
  • 11.Davis JM, Arriaga EA. J Chromatogr A. 2009;1216:6335–6342. doi: 10.1016/j.chroma.2009.07.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Johnson RD, Navratil M, Poe BG, Xiong G, Olson KJ, Ahmadzadeh H, Andreyev D, Duffy CF, Arriaga EA. Anal Bioanal Chem. 2007;387:107–118. doi: 10.1007/s00216-006-0689-6. [DOI] [PubMed] [Google Scholar]
  • 13.Chen Y, Xiong G, Arriaga EA. Electrophoresis. 2007;28:2406–2415. doi: 10.1002/elps.200600628. [DOI] [PubMed] [Google Scholar]
  • 14.Davis JM, Giddings JC. Anal Chem. 1983;55:418–424. [Google Scholar]
  • 15.Martin M, Herman DP, Guiochon G. Anal Chem. 1986;58:2000–2007. [Google Scholar]
  • 16.Felinger A, Pasti L, Dondi F. Anal Chem. 1990;62:1846–1853. [Google Scholar]
  • 17.Pietrogrande MC, Dondi F, Felinger A, Davis JM. Chemom Intell Lab Syst. 1995;28:239–258. [Google Scholar]
  • 18.Dondi F, Bassi A, Cavazzini A, Pietrogrande MC. Anal Chem. 1998;70:766–773. [Google Scholar]
  • 19.Herman DP, Gonnord MF, Guiochon G. Anal Chem. 1984;56:995–1003. [Google Scholar]
  • 20.Davis JM, Pompe M, Samuel C. Anal Chem. 2000;72:5700–5713. doi: 10.1021/ac000613u. [DOI] [PubMed] [Google Scholar]
  • 21.Felinger A. Anal Chem. 1995;67:2078–2087. [Google Scholar]
  • 22.Dondi F, Pietrogrande MC, Felinger A. Chromatographia. 1997;45:435–440. [Google Scholar]
  • 23.Felinger A. Anal Chem. 1997;69:2976–2979. doi: 10.1021/ac970241y. [DOI] [PubMed] [Google Scholar]
  • 24.Davis JM. Anal Chem. 1997;69:3796–3805. [Google Scholar]
  • 25.Nagels LJ, Creten WL, Vanpeperstraete PM. Anal Chem. 1983;55:216–220. [Google Scholar]
  • 26.Nagels LJ, Creten WL. Anal Chem. 1985;57:2706–2711. [Google Scholar]
  • 27.Dondi F, Kahie YD, Lodi G, Remelli M, Reschiglian P, Bighi C. Anal Chim Acta. 1986;191:261–273. [Google Scholar]
  • 28.Pietrogrande MC, Cavazzini A, Dondi F. Rev Anal Chem. 2000;19:123–156. [Google Scholar]
  • 29.El Fallah MZ, Martin M. Chromatographia. 1987;24:115–122. [Google Scholar]
  • 30.Davis JM, Carr PW. Anal Chem. 2009;81:1198–1207. doi: 10.1021/ac801728k. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 31.Rowe K, Davis JM. Chemom Intell Lab Syst. 1997;38:109–126. [Google Scholar]
  • 32.Cheng YF, Wu S, Chen DY, Dovichi NJ. Anal Chem. 1990;62:496–503. [Google Scholar]
  • 33.Dondi F, Gianferrara T, Reschiglian P, Pietrogrande MC, Ebert C, Linda P. J Chromatogr. 1990;485:631–645. [Google Scholar]
  • 34.Press WH, Teukolsky SA, Vetterling W, Flannery BP. Numerical Recipes in FORTRAN. Cambridge University Press; New York: 1992. [Google Scholar]
  • 35.Terrell GT, Scott DW. J Am Stat Assoc. 1985;80:209–214. [Google Scholar]
  • 36.Davis JM. Anal Chem. 1994;66:735–746. [Google Scholar]
  • 37.Whiting CE, Arriaga EAJ. Chromatogr A. 2007;1157:446–453. doi: 10.1016/j.chroma.2007.04.065. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1_si_001

RESOURCES