Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Aug 30.
Published in final edited form as: J Chromatogr A. 2020 May 31;1626:461266. doi: 10.1016/j.chroma.2020.461266

Estimation of low-level components lost through chromatographic separations with finite detection limits

Nicole M Devitt a,1, Joe M Davis b,2,*, Mark R Schure c,*
PMCID: PMC7748966  NIHMSID: NIHMS1612429  PMID: 32797862

Abstract

The search for biomarkers allowing the assessment of disease by early diagnosis is facilitated by liquid chromatography. However, it is not clear how many components are lost due to being present in concentrations below the detection limit and/or being obscured by chromatographic peak overlap.

First, we extend the study of missing components undertaken by Enke and Nagels, who employed the log-normal probability density function (pdf) for the distribution of signal intensities (and concentrations) of three mixtures. The Weibull and exponential pdfs, which have a higher probability of small-concentration components than the log-normal pdf, are also investigated. Results show that assessments of the loss of low-intensity signals by curve fitting are ambiguous.

Next, we simulate synthetic chromatograms to compare the loss of peaks from superposition (overlap) with neighboring peaks to the loss arising from lying below the limit of detection (LOD) imposed by a finite signal-to-noise ratio (SNR). The simulations are made using amplitude pdfs based on the Enke-Nagels data as functions of relative column efficiency, i.e., saturation, and SNR.

Results show that at the highest efficiencies, the lowest-amplitude peaks are lost below the LOD. However, at small and medium efficiencies, peak overlap is the dominant loss mechanism, suggesting that low-level components will not be found easily in liquid chromatography with single channel detectors regardless of SNR. A simple treatment shows that a multichannel detector, e.g., a mass spectrometer, is necessary to expose more low-level components.

Keywords: Biomarkers, Detection, Peak overlap, Trace analysis

1. Introduction

The search for biomarkers indicative of diseases is one of the most intensively investigated aspects of modern biomedical research. For example, cardiovascular disease biomarkers [1,2] are sought that can distinguish healthy individuals and those with early developing disease. Cancer biomarkers derived from the plasma proteome [3,4] can be very diverse and may be found in exceedingly smaller concentration, specifically in the picogram per milliliter range, i.e., parts per trillion (ppt) [4]. The concentration range of proteins often exceeds 11 orders of magnitude [4]. Many of these molecules may be present in less than a ppt, and these can be exceeding difficult to analyze. Furthermore, dynamic range limitations inherent in the detection process affect the useful concentration range that can be studied [5], although high concentration proteins can be removed by affinity methods [6-8]. This large dynamic range poses a great problem in identifying cancer biomarkers in the proteome, especially when analyzing intact proteins using the so-called “top-down” methods [9,10]. These methods employ very high-resolution detection by mass spectrometry (MS), often with tandem mass spectrometry stages (i.e., MS/MS) of detection.

High-resolution liquid chromatography (LC) is most often used as a separation technique prior to MS detection for biomarker-related research [11]. LC methods are extremely powerful. However, the selectivity of LC is insufficient to separate such complex samples due to finite peak capacity; the possibility of tens of thousands of compounds pose a nearly impossible separation task for both one-dimensional (1DLC) and two-dimensional liquid chromatography (2DLC). Even with a four-dimensional separation [10], the human proteome has far too many components to be utilized directly for biomarker detection, let alone at the concentration levels that are thought to be present.

In the “bottom-up” approach to proteomics, proteins are digested so that specific peptide fragments can be selectively detected and identified by database lookup. This eases the difficulties of limited peak capacity in LC column methodology, although this problem is still extremely complex. One of the most selective approaches appears to use the analysis of the carbohydrate component of the glycoproteins, which appear to offer a number of possibilities for cancer detection [12-18]. This is due to specific cellular processes, which are often associated with the glycan part of the protein.

As is the case for top-down (intact) protein studies, bottom-up proteomics also requires unraveling of the complex mixture that defines the sample. This is most often accomplished using liquid chromatography with mass spectrometry detection (LCMS) [19-21]. Both bottom-up and top-down analyses place severe demands on the separation stage. In many cases, especially where the sample volume is extremely small, capillary techniques [22,23] are utilized in spite of typically long run times because the small dilution volume allows for higher sensitivity and very high resolution. In the search for biomarkers and other naturally occurring sample mixtures using LCMS, not enough attention has been given to trace-level components that may be important in the specific identification of conditions such as disease recognition or other key properties. Two aspects of this include the data processing step [24-26], where digital filtering may obliterate low-level signals, and the understanding of the MS detector noise characteristics [27-32]. For single channel detectors such as a UV detector, high resolution is extremely important as these samples have characteristically large peak overlap, which can mask very low-level signals. For example, if a low concentration peak is partially overlapped with a high concentration peak, then the low concentration peak will essentially be indistinguishable. This overlap also causes problems for MS detection and quantitation due to ion suppression [33].

A number of studies have looked at specific samples found in nature and have attempted to characterize the concentration distribution, especially that which is low-level and may be below the limit of detection (LOD). Nagels and coworkers [34] have looked at the sample concentration distribution of plant extracts and concluded that the relative peak areas are exponentially distributed, suggesting that low-level components, perhaps below the LOD, are most prevalent. Other reports exist on the exponential-like distributions of chromatographic peak areas [35] and amplitudes [36] from plant extracts. Enke and Nagels [37] have further examined datasets of extracellular metabolites, light crude oil and plant extracts for insight into the distribution of peak intensities produced by complex mixtures of natural origin. These authors suggest that these complex naturally occurring systems adhere to a log-normal distribution in signal intensities. They showed through curve fitting that these samples do indeed fit a log-normal distribution.

Enke and Nagels also claim that by seeking to model the responses and thus to determine the log-normal parameters of the distribution, it is theoretically possible to predict the “degree of analytical selectivity and dynamic range that would be required to detect any additional fraction of the components present” [37]. Further work on this problem [38] examined the response factors of the signals in an attempt to statistically correlate concentration and signal intensity factors. These observations may be very sample dependent.

These works are both insightful and important. However, a problem may arise on fitting a distribution to measured signals, because unmeasured low-level signals do not influence the fit. For example, both exponential and log-normal distributions have been considered, as discussed above: however, the density of the former maximizes at zero concentration whereas that of the latter approaches zero. This uncertainty exists, in part because the low-level signals cannot be measured to clarify the distribution. Consequently, it is possible that the log-normal distribution of Enke and Nagels is only one of several distributions that fit the data. In general, it is difficult to be certain of the statistical distribution of components or their relative number below the LOD. This suggests a more general approach is necessary, and further work is needed to understand this intriguing problem, especially in light of its implications in the search for low-concentration biomarkers.

In this paper, we consider the problem from two distinct but related approaches. In the first approach and given in the first part of the Results section, we expand on the efforts of Enke and Nagels [37] by fitting their datasets for light crude oil and plant extracts to two additional models of amplitude distribution, noting that amplitude refers to the detector signal which may be from experiment or may be computer-generated, as the case is here. This paper deals exclusively with signal amplitudes; however, the link is made between concentration and amplitude as discussed below. Accordingly, the models used in this study include the log-normal [37-41], Weibull [41-44] and exponential [41,45] probability density functions (pdfs).

These pdfs can be similar at large amplitudes, thus providing good fits to data, but have different densities at low amplitudes. The curve-fitting parameters are given in terms of the mean, variance, prefactor, root-mean square error and adjusted coefficient of determination, noting that the prefactors are used to equate normalized pdfs to experimental data that is not normalized. The curve-fit coefficients for the log-normal pdf are those determined by Enke and Nagels [37].

In the second approach, we extend the mechanics of low-level signal detection when chromatographic separations are involved using parameters calculated from the Enke and Nagels datasets [37]. To study this problem, we utilize synthetic (i.e., computer-generated) chromatograms to compute the probability of peak loss from two different sources. The first is chromatographic peak overlap, in which peaks of large amplitude obscure those of small amplitude, because of insufficient separation. The second is from finite detection limits, in which noise obscures peaks of small amplitude. The latter assessment is similar but not identical to previous statistical studies of determination limits, in which probabilities were assigned to the accuracy of quantitative determinations [34,35,46-49]. Estimates of the fraction of components lost, as a function of relative column efficiency and finite signal-to-noise ratio (SNR), are given in detail. It will be shown for a single channel detector such as a UV detector or a total ion current (TIC) MS signal, chromatographic separations tend to lose the lowest level components due to peak overlap.

2. Theory

2.1. Chromatographic signal synthesis

In this section we describe the construction of synthetic chromatograms, which is the main tool used in the second part of this investigation. To avoid confusion, we first distinguish between a single component peak (SCP) and a peak. An SCP is the signal that is produced in a chromatogram by a single mixture constituent, whereas a peak is a chromatographic signal (i.e., not noise) with a maximum. A peak can be either one resolved SCP or multiple unresolved SCPs that overlap to produce one maximum.

The chromatographic signal is specified by distributions of SCP amplitudes, retention times, functional types, and widths. To this signal we can add noise, as specified by the signal-to-noise ratio. In this treatment, the following specifications apply. SCP amplitudes are random values selected from log-normal, Weibull and exponential amplitude pdfs. SCP retention times are uniformly random. All SCPs are approximated as Gaussian functions. The SCP width is specified by a Gaussian standard deviation which is assumed to be constant here for all SCPs, as is commonly assumed with gradient elution LC.

Given chromatographic parameters, we formed synthetic chromatograms by summing Gaussian SCPs to model the signal produced by a single-channel detector. The jth SCP, gj, is composed of discrete data points having index i at equally spaced times t.

gj(hj,ti,tRj,σj)=hjexp[(titRj)22σj2] (1)

where hj and σj are the jth SCP amplitude and SCP standard deviation. The maximum signal of gj is the amplitude hj and occurs at retention time tRj.

The vector index j of the SCPs is such that j ∈ [1, m], where m is the number of SCPs and the randomly generated retention times are ordered (i.e. sorted) so that tRj < tRj+1.

Noiseless, pure signal chromatograms HS(ti) are composed of the superposition of m SCPs:

HS(ti)=j=1mgj(hj,ti,tRj,σj) (2)

On superposition, p peaks (observable maxima) are produced. We note that pm due to peak overlap. The addition of noise to Eq. (2) is discussed at length below.

2.2. Amplitude distribution

The amplitude distributions used here for determining the hj factors in Eq. (1) are the log-normal, Weibull and exponential pdfs. The functional form of these distributions, their means and variances, and code for calculating hj as random deviates, are given in Table 1. The parameters of these distributions are first determined by fitting the Enke-Nagels data and then transformed to produce a common mean among distributions while preserving the original coefficients of variation. The transformation is discussed in the Results section.

Table 1.

The functional forms, means, variances, and random deviates of the pdfs used in this study. Detailed notes are given beneath table.

pdf name pdf forma mean, h¯ variance, υ random deviate
log-normalb 1hσ2πe(lnhμ)22σ2 eμ + σ2/2 (eσ2 – 1) e2μ + σ2 Matlab lognrnd
Weibullc kλ(hλ)k1exp[(hλ)k] λΓ1 λ2(Γ2Γ12) Matlab wblrnd
Exponentiald 1λehλ λ λ2 ξe = −λ ln ξu
a

pdfs are functions of the random variable h.

b

For the log-normal pdf: μ is the location parameter and σ is the scale parameter. The latter is the traditional symbol and differs from the SCP standard deviation. Given mean h¯ and variance υ:μ=ln(h¯2υ+h¯2)=ln(h¯υh¯2+1) and σ=ln(1+υh¯2). The random deviate equals exp(μ + ξzσ). where ξz is a normal deviate with zero mean and unit standard deviation.

c

For the Weibull pdf: k is the shape parameter and λ is the scale parameter. Functions Γ1 and Γ2 are Γ1=Γ(1+1k) and Γ2=Γ(1+2k). where Γ is the gamma function. On dividing variance υ by the square of mean h¯. one sees that k is obtained by numerically finding the root of the equation, 1+υh¯2Γ2Γ12=0. Then λ is calculated from h¯Γ1. The random deviate equals λ(− ln ξu)1/k , where ξu is a uniform random variable between 0 and 1.

d

The exponential pdf equals the Weibull pdf for k = 1. ξe is the exponential deviate and ξu is a uniform random variable between 0 and 1.

These distributions are only models chosen for their differences among low-level amplitudes. We do not claim that actual amplitudes follow them, nor do we consider in this paper means to measure the distribution of actual amplitudes.

2.3. Concentration model

In this treatment, we assign values of amplitude hj to all SCPs but do not consider SCP concentrations. However, we outline the relevant issues for future study.

The signal dictated by the vector of SCP amplitudes, hj, can be related to a vector of maximum SCP concentrations, Cmax, j. To accomplish this, it must be realized that the signal is dependent on the instrument response factor [38] and the chromatographic efficiency [50]. The duality between the signal and concentration of chromatographic SCPs is illustrated in Fig. 1, along with the noise discussed below.

Fig. 1.

Fig. 1.

The relationship between the maximum concentration Cmax,j and signal amplitude hj of the jth SCP as embodied in Eq. (4). The blue labels apply to concentration; the black labels apply to signal amplitude. The signal-to-noise ratio is 500. Inset: The error bar represents the range of the noise standard deviation, σN, centered on zero amplitude (the dashed line). Two limits of detection are represented by arrows having amplitudes of 3σN and 5σN.

The maximum concentration Cmax, j of component j in the detector is [50]:

Cmax,j=minj,jVdil,j (3)

where min j, j is the mass of injected component and Vdil, j is its dilution volume. The response factor Rj of component j relates amplitude hj to Cmax,j

hj=Cmax,jRj (4)

where Rj has units of signal amplitude per unit concentration. The response factor includes contributions from detectors, electronics and other parts of the signal chain. For this study all Rj are assumed unknown; hence, all Cmax, j are unknown. It has previously been reported [38] that response factors may follow the same distribution (log-normal) as the amplitude pdf under study.

The signal processing software for chromatographic instruments typically determines the areas of peaks. For a Gaussian SCP with standard deviation σj, the area Aj is

Aj=hjσj2π=Cmax,jRjσj2π (5)

where the right hand side is obtained on substituting Eq. (4) for hj. Thus hj, Aj and Cmax, j are directly proportional. Eq. (5) shows that the distributions of SCP amplitudes hj and areas Aj are the same for constant σj, differing only by a constant.

2.4. Signal-to-noise ratio (SNR)

The SNR can be defined in a number of ways, and these are discussed in the detector and signal processing literature [24-32,51,52]. Most differences exist in defining the signal. The first of the two most common approaches is to take an amplitude, usually a voltage, and square this voltage to calculate a power by assuming a certain load resistance (1 Ω) and dividing by the noise variance [52]. The other way, and the approach used in mass spectrometry-based investigations of noise [25-32], is to record a signal amplitude from a known injected compound, usually one with a large response factor. Then the background signal is sampled and the noise standard deviation determined. The SNR is then defined as the ratio of signal amplitude to noise standard deviation. Variations of this scheme are known.

In this work, we need a SNR metric that is characteristic of the entire chromatogram, not just one SCP. The mean SCP amplitude, h¯, is a priori specified by the amplitude distribution. The corresponding noise variance is then scaled to this mean level. Hence, the SNR is defined here as the ratio of the mean SCP amplitude to noise signal standard deviation σN

SNR=h¯σN (6)

Rearranging Eq. (6) gives the noise standard deviation as

σN=h¯SNR (7)

As shown in Eq. (7), given h¯ and specifying the SNR, one can obtain σN.

Noise can be explicitly added to the synthetic signal at different times ti by producing a vector of random Poisson, Gaussian or mixed noise deviates ξi with mean zero and standard deviation σN. By adding the signal vector HS(ti), Eq. (2), to this noise vector, the total signal H(ti) is calculated:

H(ti)=HS(ti)+ξi (8)

The addition of uncorrelated, random Gaussian deviates is often referred to as additive white Gaussian noise (AWGN) and its power spectrum is flat. This form of noise is often used as a model for noise in electronic circuits. However, Poisson noise is more characteristic of the random arrival of ions [32] in a MS detector. An illustration of how noise appears for low-level signals is shown in the inset to Fig. 1.

In this study, we don’t explicitly add noise to the signal, as suggested by Eq. (8), but rather designate a constant level, referred to as a “noise gate”, above which peaks are considered detectable, and below which peaks are assumed to be lost. This approach is free of digital filtering which would add additional complications to the study.

2.5. Limit of detection (LOD)

The LOD in signal terms [53,54] is given as

LODS=3σN (9)

noting the baseline is essentially zero or nulled to zero. Eq. (9) states that the minimum detectable signal LODS, that can be detected with a reliability dependent on the noise type, is three times the noise standard deviation, assuming the noise has a mean of zero. Other treatments have used a constant of five instead of three [53,54]. The term, limit of quantitation (LOQ), is often associated with a factor of 10 standard deviations of the noise. We utilize the 3σN criteria here.

By combining Eqs. (7) and (9), we find the value of the noise gate, above which peaks are detectable and below which they are lost, is

LODS=3h¯SNR (10)

2.6. Assessments of SCP loss

SCPs are lost in chromatograms through two mechanisms. The first is by peak overlap, wherein unresolved SCPs overlap to form a single maximum. The second is by lack of detection, wherein peaks with amplitudes less than LODS are assumed to be undetectable.

To assess loss by peak overlap, the retention times of all maxima in synthetic chromatograms are compared to the retention times of their constituent SCPs. For each maximum, the SCP having the retention time closest to the retention time of the maximum is judged as found, whereas the other SCPs in the maximum are judged as lost. This assessment also works for a resolved SCP, as the maximum and SCP retention times coincide. Of the found SCPs, those with amplitudes less than LODS are assumed to be undetectable. SCPs already lost to peak overlap are not considered, even if their amplitudes are less than LODS. Computational details are given in the Software section below.

2.7. Statistical overlap theory (SOT)

A statistical model of chromatography was developed by Davis and Giddings [55] a number of years ago and reviewed recently by two of us [56]. The theory has considered only a single channel noiseless detector but was modified [57] and subsequently refined [58] to include a threshold below which detection was impossible. To study the model’s dependence on the SCP amplitude distribution, computer calculations are utilized here.

As given in ref. 55, the fraction of observable peaks, γ, in a chromatogram of randomly distributed SCPs is

γ=pm=eαeRs (11)

where Rs is the average minimum resolution, discussed below, and αe is the effective saturation [59,60]:

αe=4mσtRmtR1=mncRs (12)

In Eq. (12), the peak capacity, nc, is the number of equi-spaced SCPs that can be put within a designated temporal or spatial window. For Gaussian SCPs with equal temporal standard deviation σ, it is commonly defined as [61,62]

nc=tRmtR14σRs (13)

where tRm and tR1 are the last and first retention times in the chromatogram, and Rs is the resolution between two neighboring SCPs that is sufficient for separation. One sees that Rs cancels out on substitution of Eq. (13) into Eq. (12); nc depends on one’s choice of Rs but αe depends on only the chromatographic attributes m, σ and tRmtR1.

Thus, αe is a metric of the relative efficiency of separation, equal at Rs = 1 to the number m of SCPs requiring separation, divided by the number nc of contiguous intervals available for separation. Small αe values signify high relative efficiencies, whereas large αe values signify low ones.

In contrast to the freely chosen Rs, the attribute Rs in Eq. (11) is not a free parameter. In the simplest SOT, Rs may equal Rs, but for peaks that are maxima, as in this study, Rs has a complicated dependence [56]. In particular, it depends on the SCP amplitude distribution, with a calculable limiting value as αe approaches zero [57,63]. The product of αe and Rs is called saturation [59].

Eq. (11) is used here to compare SOT predictions to the number of peak maxima in synthetic chromatographic calculations. All values used for Rs are given in Part 1 of the Supporting Information as is the derivation for Rs, not previously published, for the Weibull amplitude pdf. Also derived there is a closed-form equation for the distribution of minimum resolution of the log-normal pdf.

2.8. Software

The intensities of the crude oil and plant extracts discussed by Enke and Nagels [37] were fit to the Weibull and exponential amplitude distributions using a Levenberg-Marquardt nonlinear least-squares algorithm [64-66] contained in MATLAB (Mathworks, Natick, Massachusetts). The program for the synthesis and analysis of synthetic chromatograms was also written in MATLAB. This program makes extensive use of spreadsheets for parameter input and calculated output. Random values of amplitude hj in Eq. (1) for different amplitude pdfs were generated with functions inherent in MATLAB, as shown in Table 1, except for the exponential pdf, for which hj was generated explicitly with the inverse transform sampling method [64,67].

The chromatograms were calculated and then analyzed, given the following input parameters contained in the spreadsheet: SCP standard deviation σ, the number of SCPs m, the start and stop times t1 and tm of the chromatogram, and the mean and standard deviation of the amplitude pdf. The retention times of SCPs were computed using a uniform random number generator. For given t1, tm, and m, different effective saturations αe were obtained by changing σ, per Eqs. (12) and (13) (Rs = 1). The chromatograms produced for visualization in Fig. 3 were generated with starting times of 60 s, ending times of 5940 (=6000-60) s, and m = 600 with ≈100 points per SCP. For obtaining the fractions of SCPs that are found, lost to overlap, and lost below the detection limit at specific SNRs, shown in Figs. 4 and 5, single chromatograms were synthesized with starting times of 60 s, ending times of 59940 (=60000-60) s, and m = 6,000. Using a larger m aids in getting higher statistical accuracy and avoids the need of averaging the results of multiple chromatograms.

Fig. 3.

Fig. 3.

Comparison of chromatograms synthesized with different amplitude pdfs and effective saturations αe. Under each chromatogram is a loss map, wherein a lost SCP is shown as a red line and a found SCP nearest to a recognizable maximum is shown as a green line. The amplitude scale for all chromatograms is the same. Mean SCP amplitude h¯=1; number of SCPs m=600.

Fig. 4.

Fig. 4.

Density of SCP (blue), found (green) and lost (red) peak amplitude distributions as a function of peak amplitude, as constructed from synthetic chromatograms with different amplitude pdfs and αe. Mean SCP amplitude h¯=1.

Fig. 5.

Fig. 5.

Cumulative fractions as a function of peak amplitude for the SCP (blue), found (green) and lost (red) peak amplitude distributions. Figures A-C show fractions for the three amplitude pdfs at αe = 0.10. Figures A and D-F show fractions for the log-normal pdf with progressively larger αe. Vertical dashed lines represent LODS=3h¯SNR at different SNRs. Mean SCP amplitude h¯=1.

Part of the analysis consisted of detecting peaks (maxima) and determining their retention times using the zero-crossing of the derivative signal, which was calculated by finite difference of the signal HS(ti), Eq. (2). In addition, the difference between the amplitude of the two points used for the zero-crossing determination had to exceed a threshold of 10−6 to prevent roundoff error from triggering a false peak. Because the summation of SCPs using Eq. (2) can slightly displace the peak maxima retention time from the SCP retention time, a range window of 2% was used to identify if peaks were matched with the correct SCPs.

The retention times of maxima were compared to the retention times of their constituent SCPs. For each maximum, the SCP having the retention time closest to the retention time of the maximum was judged as found, i.e., identified as the primary contributor to the peak and detectable at infinite SNR. All other SCPs not so identified were considered to be lost due to superposition, i.e. SCP overlap. The amplitudes of found peaks also were compared to a series of LODS values to determine how many were below the noise gate and therefore lost to detection limits. The statistical analysis of the retention time and amplitude deviation due to overlap is calculated and discussed at length in Part 2 of the Supporting Information.

For visual display of lost and found SCPs, loss maps were prepared, with SCPs lost to overlap displayed in red. Found peak maxima are displayed in green. The synthesized chromatograms, loss maps, and lost and found SCP densities (both with and without consideration of LODS) produced via these procedures are shown below.

3. Results

3.1. Natural mixtures dataset analysis

Enke and Nagels [37] have proposed that the components present in a natural complex mixture follow a “natural law” in terms of their instrumental response, and they concluded that this response is a log-normal distribution. The data used by these authors is taken from Table 2 of ref. 37 (light crude oil, as analyzed by high-resolution MS) and Table 1 of their Supporting Information (plant extracts, as analyzed by LC). As shown in Fig. 2A and 2B for both the component density and logarithm of the component density, this data is plotted along with the curve fits given by these authors for the log-normal distribution [37]. Note that these fits were made to data that was not normalized to unity, as the data is in the form of the number of components (or log of the number of components) between two response bounds, and is a function of instrumental response. The lack of normalization means the log-normal pdf must be multiplied by a prefactor.

Table 2.

Summary of the parameter estimates for the Enke and Nagels data [37]. The prefactor is the scalar multiplier of pdfs; other parameters are given in Table 1. Eqs. (14) and (15) define the goodness-of-fit metrics, RMSE and Ra2. All log-normal parameters including the prefactor are taken from ref. 37; other curve fits were obtained with the Levenberg-Marquardt algorithm.

crude oil data: n = 42 data points
distribution mean h¯ variance υ prefactor other parameters RMSE Ra2
log-normal 2.844·10−2 9.411·10−3 8.726 μ = −4.828, σ = 1.593 24.78 0.9949
Weibull 5.429·10−3 3.002·10−5 5.940 λ = 5.408 · 10 −3, k = 0.9908 27.07 0.9940
exponential 5.000·10−3 2.500·10−5 6.000 λ = 5.000 · 10−3 42.71 0.9850
plant extract data: n = 25 data points
distribution mean h¯ variance υ prefactor other parameters RMSE Ra2
log-normal 6.137·10−1 3.204 98.76a μ = −1.614, σ = 1.501 8.904 0.9764
Weibull 3.609·10−1 2.239·10−1 100.0 λ = 0.3101, k = 0.7716 1.213 0.9996
exponential 2.222·10−1 4.938·10−2 77.77 λ = 0.2222 6.654 0.9868
a

determined in the Supporting Information of ref. 37.

Fig. 2.

Fig. 2.

Enke and Nagel's datasets for crude oil (circles) and plant extracts (squares) along with curve fits based on the log-normal, Weibull, and exponential distributions. Graph A shows the component density and graph B shows the logarithmic component density. Both graphs A and B have a logarithmic response. Graph C shows the distributions with linear response. Graph D is the result of distribution transformation to unity mean with preservation of the coefficient of variation (CV). The curve-fit parameters are given in Table 2 for the untransformed data and in Table 4 for the CV-transformed data.

Although these authors focused on the log-normal distribution, a significant limitation to their approach is that only part of the dataset defines the shape of the proposed model distribution. Despite evidence to support the log-normal fit, it is generally accepted that curve fitting under extrapolation conditions introduces uncertainty [65]. This invites comparison with other known functions, especially those that have alternate behavior as the instrumental response tends toward zero.

Also shown in Figs. 2A and B are the curve fits of the Weibull and exponential distributions to these unnormalized datasets produced by the Levenberg-Marquardt algorithm. The functional forms of these distributions are given in Table 1, and the numerical summary of the curve fit parameters is given in Table 2. As before, all pdfs required multiplication by prefactors.

A number of reasons exist for making fits to the Weibull and exponential distributions, as well as the log-normal distribution. The first is all three distributions are relevant in physics, chemistry, materials science, engineering, hydrology and other disciplines [37,43,45], and the physical processes affecting responses may have analogs in these fields. The second reason is that the three distributions behave differently as the response approaches zero. Specifically, the log-normal distribution always approaches zero; the Weibull distribution, for different shape parameters k, can approach infinity (0 < k < 1), a constant (k = 1), or zero (k > 1); and the exponential distribution always approaches a constant, being equal to the Weibull distribution for k = 1 (the exponential pdf in Table 1 is written in a slightly unconventional form, i.e., relative to the reciprocal of parameter λ, to emphasize its relation to the Weibull distribution). The variability of the low-response regions allows us to assess how well data of intermediate responses define that region.

The results of curve fitting by Enke and Nagels, and by us, are shown in Table 2 along with the goodness-of-fit metrics, root-mean-square error (RMSE) and adjusted coefficient of determination (Ra2), which are defined in Eqs. (14) and (15) as:

RMSE=i=1n(yiycalc,i)2df (14)

and

Ra2=1n1dfi=1n(yiycalc,i)2i=1n(yiy¯)2=1RMSE2s2 (15)

where n is the number of data points, yi is the ith of n individual dependent variables (here, component density), y¯ and s are the mean and standard deviation of all yi values, and ycalc,i is the least-squares prediction corresponding to yi. The term df in Eq. (14) is the number of degrees of freedom, which is n less the number of fitting coefficients (3 for the log-normal and Weibull distributions; 2 for the exponential distribution). The Ra2 metric is not to be confused with the linear correlation coefficient used in least-squares fits of straight lines [68]. An attractive feature of Ra2 is that it is dimensionless, facilitating comparison of distributions having different prefactors. Good fits are indicated by small RMSE values and Ra2 values near unity.

For each mixture, the prefactors in Table 2 have similar values (ca. 5 for crude oil; ca. 100 for plant extract), signifying the pdfs in Table 1 require similar scaling. The prefactor for the plant extract data is greater despite the lower density, because the response range is larger. For the crude oil data, the log-normal distribution does fit the data the best over the total range studied (smallest RMSE and largest Ra2). Based on RMSE and Ra2, the Weibull distribution is a slightly less good fit, whereas the exponential distribution is a much worse fit. It is known that power law functions like log-normal pdfs, which have heavy-tailed behavior [69-71], are complicated by the large fluctuations that occur in the tail of the distribution where large but rare events occur [69-71]. Accepting that log-normal tails are indicative of large but rare events (i.e., responses) justifies examining other distributions. Even though the qualities of fit are different, the Weibull and exponential fitting coefficients to the oil data are about the same; the Weibull k, 0.9908, is almost one, and the Weibull and exponential parameters λ are similar. For the plant extract data, the Weibull distribution is by far the best fit, with the exponential distribution and then the log-normal distribution providing modest fits. These results suggest that the log-normal distribution is not necessarily the only one capable of representing these data sets and that the other functions can be useful to examine the results of lower and higher response data.

As shown in Fig. 2A and 2B, the Weibull and exponential distributions fit better in the high-density, lower response region than in the low-density, higher response region. Nonetheless, one has a visual sense that all the fits are good. However, the low-response regions of the fits are very different. For both the crude oil and plant extracts, the Weibull parameter k is less than one, and the density approaches infinity as the response approaches zero. In contrast, the densities of the log-normal and exponential distributions approach zero and a non-zero constant, respectively. Evidently, the prediction of the density of the low-response region is a challenging task, when regression is performed using intermediate responses, and one must be cautious in inferring attributes outside the bounds of the fit. Indeed, one of us (J.M.D.) has previously demonstrated a similar problem with regressions in statistical overlap theory, given that different functions can fit the same chromatographic attributes [72].

3.2. Distribution transformation to unity mean

The fits to the Enke-Nagels data are shown on a linear response scale in Fig. 2C. It is difficult to compare them, because they have different prefactors and means, given in Table 2. To facilitate comparison, we can ignore the prefactors (they simply measure relative response) and transform the underlying pdfs to have a common mean. However, it is necessary to keep the relative variation or spread of each pdf about this mean the same as it is in the original distribution. To this purpose, the coefficient of variation (CV) is introduced and used to transform the pdfs.

The CV is defined as [73]:

CV=DE=υh¯ (16)

where D is the standard deviation of a pdf and E is the mean of this pdf. E is also known as the expectation value or first moment, and D is the square root of the second central moment (variance). The right hand side of Eq. (16) shows that the values used to calculate the CV are already determined from curve fits whose results are given in Table 2.

The CV expresses the variability of a distribution about its mean. Consequently, the fits can be transformed to a common mean with preservation of similarity if their CV’s are not changed on transformation. The log-normal, Weibull, and exponential pdfs have closed-form CV’s which makes the transformation simple. General equations for the CV’s are given in Table 3, as derived from equations in Table 1 for the means and variances of the pdfs.

Table 3.

The coefficient of variation (CV) for the three amplitude pdfs and the transformed parameters, μ and λ, that produce mean amplitude h¯.

pdf name CV transformed parameters
general expression for h¯=1
log-normal (eσ2 – 1)1/2 μ = ln h¯σ22 μ=σ22
Weibulla (Γ2Γ12)12Γ1 λ=h¯Γ11 λ=Γ11
exponential 1 λ=h¯ λ = 1
a

Γ1=Γ(1+1k) and Γ2=Γ(1+2k), where Γ is the gamma function

As shown in Table 3, the log-normal CV depends only on scale parameter σ, whereas the Weibull CV depends only on shape parameter k. Thus, the σ and k values in Table 2 must be kept in the transformed fits, but the location and scale parameters μ and λ can be changed. In contrast, the exponential CV is equal to 1 and is independent of parameter λ. General equations relating these parameters to the desired mean are given in Table 3.

Eq. (16) shows that the standard deviation is equal to the CV when the mean equals unity. We chose this mean for our scaling, for which numerical values of parameters are given in Table 4. The standard deviations of the pdfs calculated from these parameters and the variance expressions in Table 1 equal the CV’s in Table 4.

Table 4.

The transformed pdf parameters that keep the coefficient of variation (CV) of fits to the Enke-Nagels data [37] with unity mean. CV’s were calculated from means and variances in Table 2 using Eq. (16).

crude oil data
pdf name CV parameters after transformation
log-normal 3.412 μ − 1.268; σ = 1.593 (same as in Table 2)
Weibull 1.009 λ = 0.9961; K = 0.9908 (same as in Table 2)
exponential 1 λ = 1
plant extract data
pdf name CV parameters after transformation
log-normal 2.917 μ = −1.126; σ = 1.501 (same as in Table 2)
Weibull 1.311 λ = 0.8591; K = 0.7716 (same as in Table 2)
exponential 1 λ = 1

Fig. 2D shows the six transformed pdfs with unity mean. As can be seen, once the means are fixed at the same value, the curve fits for the log-normal crude oil and plant extract results are very similar, especially at large responses. The exponential crude oil and plant extract pdfs are identical, because the single parameter, λ, equals the common mean. The Weibull pdf for the crude oil is essentially that of the exponential pdf, except at very low responses, but it differs from the Weibull pdf for the plant extract; however, both Weibull pdfs go up to higher probability density as the response tends toward zero. Because of these similarities of pdfs for the two mixtures, we chose the transformed crude oil data as the single benchmark for mimicking amplitude distributions in chromatographic signal synthesis and examining how that signal is lost due to peak overlap and due to finite SNR.

3.3. Fractions of peaks lost in synthetic chromatograms due to peak overlap

First, we consider peak loss due to overlap in chromatograms having an infinite signal-to-noise ratio (SNR). Fig. 3 shows a comparison of synthetic chromatograms constructed with the log-normal, exponential and Weibull amplitude pdfs. The mean and standard deviation of the pdfs were calculated from the raw data on crude oil extracts [37] after transforming the pdf means to unity using the parameters in Table 4, as discussed above. Also shown below each chromatogram is a loss map, with SCPs lost to overlap coded in red and SCPs closest to peak maxima in retention time coded in green. Synthetic chromatograms and loss maps are shown in the Supporting Information for a full range of log-normal pdfs with effective saturation values up to 6.

In Fig. 3, two values of effective saturation are utilized, αe = 0.10 and αe = 1.00. The peak capacities nc of these chromatograms can be calculated from the last expression in Eq. (12), with nc equaling 6,000 and 600 for these αe values, m = 600, and Rs = 1. The largest nc value exceeds that attainable with current technology, but chromatograms still can be synthesized relative to it.

It is easy to see that the peaks (maxima) and their amplitude variations look different by inspection at the same αe. For example, the log-normal pdf appears to produce more large amplitude peaks than the Weibull and exponential pdfs. This is to be expected as the log-normal pdf is heavy-tailed [69-71]; large amplitudes are more probable than in the other two distributions. The Weibull pdf is also a heavy-tailed distribution [70] when the shape parameter k is in the range 0 < k < 1. Here, however, k = 0.9908 and the heavy tail is modest. In fact, the variations of peak amplitude for the Weibull and exponential pdfs look very similar, since k is almost unity. Both appear to produce a more compressed range of peak amplitudes than the log-normal case but this may be due to having more small amplitude SCPs that do not readily appear in the chromatograms.

For the αe = 0.10 chromatograms in Fig. 3, few SCPs are lost due to overlap, as viewed in the loss map. However, for αe = 1.00, the loss of distinct SCPs is far more noticeable. The fraction of SCPs appearing as maxima is designated as γobs; therefore, the fraction of SCPs lost to overlap is 1 - γobs and is listed in Table 5 for these and other αe values. For all three amplitude pdfs, 1 - γobs increases with increasing αe because of decreasing peak capacity (see Eq. (12)). At any αe, the log-normal values of 1 - γobs are larger than their Weibull and exponential counterparts, which themselves are almost the same, but all three are similar.

Table 5.

For the three amplitude pdfs and different effective saturation αe: the fraction of SCPs lost to overlap in synthetic chromatograms, 1 - γobs, and calculated from Eq. (11), 1 - γ; the cumulative fraction of SCPs lost below LODS, Φ, as a function of the signal-to-noise ratio (SNR); and the ratio of fraction of SCPs lost to overlap to cumulative fraction of SCPs lost below LODS at SNR = 100.

pdf αe 1-γobs 1-γ Φ∣SNR=5000 Φ∣SNR=1000 Φ∣SNR=500 Φ∣SNR=100 1γobsΦSNR=100
log-normal 0 0 0 5.498 · 10−5 2.162 · 10−3 7.834 · 10−3 7.996 · 10−2 0
0.10 0.072 0.074 * 4.233 · 10−3 8.033 · 10−3 7.008 · 10−2 1.027
0.25 0.175 0.175 * * 2.967 · 10−3 5.108 · 10−2 3.426
0.50 0.316 0.319 * * * 2.158 · 10−2 14.64
0.75 0.436 0.438 * * * 2.167 · 10−3 201.2
1.00 0.528 0.536 * * * 2.167 · 10−3 243.7
1.50 0.656 0.684 * 1.667 · 10−4 1.667 · 10−4 2.000 · 10−3 328.0
2.00 0.734 0.785 * 1.667 · 10−4 1.667 · 10−4 2.917 · 10−3 251.6
Weibull 0 0 0 6.386 · 10−4 3.159 · 10−3 6.286 · 10−3 3.062 · 10−2 0
0.10 0.067 0.070 * 2.867 · 10−3 5.317 · 10−3 2.758 · 10−2 2.429
0.25 0.161 0.166 * * 2.183 · 10−3 1.867 · 10−2 8.623
0.50 0.293 0.304 * 1.667 · 10−4 1.667 · 10−4 7.750 · 10−3 37.81
0.75 0.391 0.420 * 1.667 · 10−4 1.667 · 10−4 5.000 · 10−4 782.0
1.00 0.473 0.516 * * * 1.667 · 10−4 2837
1.50 0.598 0.663 * * * 6.667 · 10−4 897.0
2.00 0.683 0.766 * 1.667 · 10−4 1.667 · 10−4 8.333 · 10−4 819.6
exponential 0 0 0 5.941 · 10−4 2.984 · 10−3 5.975 · 10−3 2.955 · 10−2 0
0.10 0.064 0.070 * 2.733 · 10−3 5.133 · 10−3 2.625 · 10−2 2.438
0.25 0.151 0.166 * * 2.033 · 10−3 1.817 · 10−2 8.310
0.50 0.283 0.304 * * * 6.000 · 10−3 47.17
0.75 0.387 0.419 * * * 5.000 · 10−4 774.0
1.00 0.473 0.516 * 1.667 · 10−4 1.667 · 10−4 8.333 · 10−4 567.6
1.50 0.593 0.663 * * * 8.333 · 10−4 711.6
2.00 0.678 0.765 * 1.667 · 10−4 1.667 · 10−4 1.000 · 10−3 678.0
(*)

Entries marked with an asterisk are below 1.667 · 10−4. Mean amplitude of SCPs h¯=1.

The amount of overlap is about the same for all amplitude pdfs, because they differ markedly only over a narrow range of small amplitudes. It may be surprising that the log-normal pdf, which approaches zero with decreasing amplitude, produces the most overlap. However, Fig. 2D shows this pdf has a high density of small amplitudes over a wide range, and Table 4 shows it has the largest CV, indicating a wide spread of amplitudes. As is shown below, small-amplitude SCPs are easily lost in overlap with large-amplitude SCPs.

Also listed in Table 5 are values of 1 - γ, with γ predicted from SOT and Eq. (11). The Rs values needed to calculate γ are given in Part 1 of the Supporting Information. In all cases, the predicted values equal or overestimate the 1 - γobs results from the synthetic chromatograms but agree with them within 10% as long as αe ≤ 1. The difference between 1 - γ and 1 - γobs increases with αe, because the simple theory used here for Rs is correct only as αe approaches zero [57,63].

The distribution of lost and found peak amplitudes is shown in Fig. 4 for the two αe values used in Fig. 3. For clarification, the lost peak amplitude distribution (or more simply, the lost distribution) describes amplitudes of SCPs lost in overlap, whereas the found peak amplitude distribution (or found distribution) describes amplitudes of observed maxima at infinite SNR. Both distributions vary with αe . The SCP amplitude distribution (or SCP distribution) describes SCP amplitudes used in constructing the chromatograms. It is independent of αe and, in principle, the same as the amplitude pdf, differing only because of limited sampling. All these distributions are scaled relative to the number of SCPs. Thus, the area under the SCP distribution is unity, and the sum of the areas under the lost and found distributions is unity.

At the same effective saturation αe , the general shapes of the SCP, lost, and found distributions are similar for the three amplitude pdfs. Since p maxima amplitudes are distributed in the found distribution and mp SCP amplitudes are distributed in the lost distribution, as one distribution increases the other decreases.

Several trends are observed in Fig. 4 for all three amplitude pdfs, and all occur because peak overlap is much more prevalent for low amplitude SCPs than for high amplitude SCPs, as expected. In other words, small amplitude SCPs are much more susceptible to losing a maximum by overlap with neighboring SCPs, causing the low amplitude SCPs to be selectively removed. At low αe little overlap exists, and Fig. 4A-C shows the SCP and found distributions are similar, whereas the lost distribution is small except at low amplitude. However, the selective removal of SCPs by overlap increases with αe , as shown in Fig. 4D-F, causing the found and lost distributions to decrease and increase, respectively, with decreasing amplitude. The lost distribution grows markedly at low amplitude and approaches the SCP distribution, whereas the found distribution maximizes and then tends toward zero. In contrast to these low-amplitude changes, the larger amplitudes are little affected by peak overlap for both αe, such that the found and SCP distributions merge at the larger amplitudes whereas the lost distribution tends to zero more rapidly than they do.

3.4. Fractions of peaks lost in synthetic chromatograms due to the limit of detection

We now consider peak losses resulting from the superposition of a noise gate on the synthetic chromatograms. At αe = 0, all SCPs are infinitely narrow and there is no peak overlap. Here, the cumulative fraction ΦSNR of SCPs below the LODS specified by Eq. (10) at different SNRs (i.e., LODS=3h¯SNR, with h¯=1 in the synthetic chromatograms) can be computed [57] from the cumulative density function of the SCP amplitude pdfs in Table 1

ΦSNR=0LODSp(h)dh (17)

where p(h) is the pdf and h is the SCP amplitude.

In contrast, the cumulative fractions ΦSNR for αe > 0 are calculated by analysis of the synthetic chromatograms. Fig. 5A-C shows the cumulative fractions of the SCP lost and found distributions at αe = 0.10 as a function of peak amplitude for the three amplitude pdfs and LODS values at SNRs of 100, 500, 1000 and 5000. In addition to Fig. 5A, Fig. 5D-F shows these fractions for the log-normal pdf over a wide range of αe values. The cumulative fraction is the area under the curves in Fig. 4 from zero to a given peak amplitude, whose small range in Fig. 5 doesn’t exceed 0.05. The value of ΦSNR for αe > 0 is evaluated from Fig. 5 as the intersection of the vertical dashed lines representing the LODS and the green curves representing the found distributions. This makes sense if one recalls that the found distribution describes the fraction of resolved peaks at infinite SNR; now, the fraction below LODS, ΦSNR, is assumed to be lost. In principle, the same method could be used to evaluate ΦSNR at αe = 0 by replacing the green curves of the found distribution by the blue curves of the SCP distribution. However, Eq. (17) is more statistically robust. Values of ΦSNR are summarized in Table 5 for a wide range of αe values and all three amplitude pdfs.

The most obvious trend in Table 5 is the cumulative fraction ΦSNR of peak loss due to a finite limit of detection is much smaller than the fraction 1 - γobs of peak loss from overlap, except for αe values very near zero. The maximum ΦSNR for any amplitude pdf occurs at αe = 0 and decreases with increasing SNR. At SNR = 5000, ΦSNR is only 0.0055%, 0.064%, and 0.059% of the total number of SCPs for the log-normal, Weibull, and exponential pdfs, respectively. Even at SNR = 100, ΦSNR is only 8.0%, 3.1%, and 3.0% for these pdfs, respectively. In contrast, the fractional loss 1 - γobs from peak overlap almost equals or exceeds these values at αe = 0.10, and exceeds them even more at larger αe.

As αe increases, ΦSNR decreases rapidly because increasing peak overlap obscures SCPs of low amplitude, which otherwise might be lost below LODS. In other words, large numbers of the small- amplitude SCPs that potentially could be lost to finite detection limits are already lost in peak overlap. There is scatter in the ΦSNR values at large αe, because the small number of found peaks with small amplitudes limits the statistics. The smallest non-zero value of ΦSNR in the synthetic chromatograms of m = 6000 SCPs is 1/6000 = 1.667 · 10−4, and Table 5 reports many such ΦSNR values at large αe and SNR. This is not a problem for the αe = 0 case, because ΦSNR is calculated from Eq. (17), which is not statistically limited, and the integration for the αe = 0 case is available in closed form for all three pdfs.

Also listed in Table 5 is the ratio (1 – γobs)/Φ∣SNR=100, which is the fraction of SCPs lost to overlap, divided by the fraction of SCPs lost to detection limits at SNR = 100. The cumulative fraction Φ∣SNR=100 was chosen for this ratio, because its value is the largest evaluated here. Even so, losses from detection limits are comparable to losses from peak overlap only in the most efficient chromatograms (e.g., at αe = 0.10, the ratio is 1.027 for the log-normal pdf). In contrast, even in the modestly efficient chromatograms synthesized at αe = 0.50 using all amplitude pdfs, the ratios are 15-fold or more. As αe increases, the disparity increases further. The ratio is somewhat scattered at large αe, for the same reason that ΦSNR is scattered.

The origin of the dependence of ΦSNR on αe in Table 5 is shown in Fig. 5A and D-F, which depict variations among the lost and found cumulative fractions for the log-normal pdf (the variations for the Weibull and exponential pdfs are similar). As shown in Fig. 5A, at low αe the lost SCPs comprise a relatively small fraction compared to the found peaks. As αe is increased, the fraction of lost low-amplitude SCPs increases relative to the fraction of found peaks. At αe = 0.50 in Fig. 5E, the fraction of lost SCPs having amplitudes below ≈ 0.015 dominates the chromatograms, with almost no found peaks having this or smaller amplitudes. At αe = 1.00 in Fig. 5F, almost all SCPs with amplitudes below 0.05 are lost. This shows that effective saturation determines in a pro- found manner the number of accessible low-level peaks that are available for analysis.

Overall, our findings show the fraction of SCPs lost below the detection limit is exceedingly small, except at very low αe. There- fore, we are comfortable with the following assertion: in chromatographic systems with single channel detectors, it is incomplete separation, and not the limit of detection, that is the major factor controlling one’s ability to find low-level components, e.g., biomarkers, in complex mixtures.

Nagels and co-workers made this same assertion a number of years ago in a statistical study on determination limits [34]. However, no evidence was given to support it. Our work differs from it and related studies [34,35,46-49] in its evaluation of the relative and separate contributions of peak overlap and detection limits to peak loss for different SNRs and amplitude pdfs. In contrast, these previous studies showed the probability of successful determination was intimately dependent on peak overlap, such that evaluating the separate consequences of overlap and determination was somewhat difficult.

3.5. Assessment of assumptions

We have made two assumptions in interpreting the synthetic chromatograms. The first is that a peak (maximum) is representative of the constituent SCP having a retention time closest to its own retention time (i.e., the primary SCP). This assumption is used in constructing the found distribution using peak amplitudes, instead of primary SCP amplitudes. We have assessed the assumption by comparing the closeness of retention times of peaks and their primary SCPs, and by comparing the distributions of peak amplitudes and primary SCP amplitudes. These issues were found to have small consequences and are discussed in Part 2 of the Supporting Information. Another assumption is whether our interpretations change on renormalization of the densities and cumulative fractions in Figs. 4 and 5 relative to the number of peaks, instead of the number of SCPs. This was found to be a rather small effect and is also discussed in Part 2 of the Supporting Information.

4. Discussion

This work clearly demonstrates that the loss of low-level components in chromatograms with single channel detection systems is critically dependent on chromatographic efficiency. In addition, it is demonstrated that the functional form of the amplitude pdf is not critical regarding the detection limit, as the fraction of components lost this way is very small for three different amplitude pdfs that have infinite, finite, and zero densities as the amplitude approaches zero. The findings, although emphasized in this paper for LC, apply equally as well to gas chromatography, capillary electrophoresis and other peak-based separation techniques.

It is important to realize that this study would have been impossible to do by experiment. Because of LOD and overlap losses, we could never have known exactly the numbers of SCPs in chromatograms, the amplitude distribution of SCPs, the retention times and amplitudes of found and lost peaks, and the numbers of found peaks below the LOD. Furthermore, some matters easily addressed by simulation, e.g., the comparison of SCP and maxima retention times used to determine found peaks, would have been compromised by experimental irreproducibility. Ours is an in silico study by necessity.

The values of 1 – γobs and Φ∣SNR in Table 5 were evaluated from synthetic chromatograms having fixed m and different nc at unit resolution Rs. For given amplitude and retention-time pdfs, these dimensionless loss metrics depend only on the dimensionless variable combination, m/(ncRs) = αe. Eq. (11) shows this is true for γ, and it must be true for Φ∣SNR. This can be seen on study of the synthetic chromatograms in Fig. 3. Aside from random variation in retention time and peak amplitude, any one chromatogram looks the same from beginning to end because αe is constant. If that chromatogram were broken into a series of smaller chromatograms, the characteristics of each would not change. In particular, their γ and Φ∣SNR values would be the same (aside from statistical fluctuations), even though their m and nc values would differ.

We have made the assumption that the response factor is not explicitly known here, because we are using a signal-based assumption about the amplitude distribution and are not concerned with the concentration distribution. To relate the concentration model to the signal model, additional information is needed. This includes understanding the sensitivity distribution inherent in the response factor, given in Eq. (4). However, the response factor is extremely difficult to ascertain and is dependent on a multitude of instrumental factors, perhaps too many to accurately predict and/or measure.

Many detection systems are well-filtered to mask baseline noise, an effect known in chromatography [24-26] to add an additional zone broadening-like contribution that increases peak width. In this regard, the noise is masked and its absence makes the signal look better, but both signal and noise are typically reduced. Filtering affects the total signal chain, and thus careful use of filters must be applied when working near the limit of detection as low-level components can be obliterated in a manner similar to that of decreasing column efficiency (increasing effective saturation) and increasing overlap that reduces the number of peaks. The inclusion of noise into the synthetic chromatograms with subsequent digital filtering has been strictly avoided in this study. The inclusion of peak tailing has also been avoided. Their presence would introduce additional degrees of freedom which would make the study unduly complicated and more difficult to interpret. Furthermore, the results show that so few peaks are lost to detection limits, relative to their loss to overlap, that the exact noise and filtering models are not important.

The multichannel detection of signals by separate channels should drastically reduce the crowding of signals caused by a single channel detector. In particular, mass spectrometry should be effective as a multichannel detector in reducing the effective saturation αe, when the ionization efficiency and selectivity are sufficient. For multichannel detectors with CMC independent, uncorrelated channels, one can deduce that

αeMC=αeCMC (18)

by assuming the number of channels, CMC, multiplicatively increases the peak capacity in Eq. (13), giving the simple theoretical relationship (cf. to Eq. (11)):

γ=eαeMCRs (19)

noting that some studies have suggested that MS detectors have an associated peak capacity due to the multichannel nature of the detector [74-77] which may equal the number of available masses.

The assumption of independent, uncorrelated channels is needed so that signals and noise on each channel are unique and separate sources of information. The number CMC is effectively less than the number of mass channels in MS due to correlation between mass channels, statistical occupancy weighting and limits in incidence detection for temporal analysis of correlation. A statistical theory of mass spectrometry [78] can be expanded into the framework of Eq. (18), allowing further understanding of the LCMS peak overlap problem. However, this approach requires LCMS data to define the unique sample requirements.

Our model of detection limits assumes that Eq. (9) for LODS is valid. This model has been referred to as “the N sigma rule” [31], which we use here for a single channel detector. However, Tsybin and coworkers have examined very low-level signals with high-resolution MS methods and have shown the noise amplitude distribution, along with the signal distribution. Their petroleomics data [31,79] clearly shows the delineation of noise and signal, and the signal amplitude density appears as a log-normal function although curve fits to this data were not made. A noise model similar to that used in ref. 31 can easily replace Eq. (9) to determine the cumulative fraction of lost signals as a function of the noise level.

We have used the least-squares method in this study and have shown different amplitude pdfs fit the data of Enke and Nagels. In contrast, a number of studies have used maximum likelihood estimation (MLE) for parameter estimation of power law and heavy-tailed distribution data [71,80]. In some cases [71], systems that appeared to obey power laws were shown not to fit power law statistics, when parameter estimation was made by MLE. Although MLE may give different parameters than the least-squares method, the differences would probably not change the outcome of this study, as chromatographic overlap appears to be a much stronger and dominating effect in suppressing the presence of smaller signals as opposed to the signal being detection-limited.

One single channel analysis of increasing importance is the fluorescence detection of glycans via HILIC separations [81-85], where glycans are labeled with a small fluorescent molecule. Fluorescent detection is very sensitive and because resolution is high with HILIC, trace level components are most likely detected as well.

On a more fundamental level, little work has been conducted into depolymerization models [86-88] describing a natural distribution of molecular weights on polymer decomposition induced by temperature, shear stress or ionizing radiation. These distributions, some of which look like log-normal functions, provide a basis for study of the functional form of a natural distribution. However, most complex mixtures are probably the result of degradation of groups or homologous series of compounds, and the complexity of these mixtures is hard to ascertain with a natural mixture from a biological or an environmentally processed origin, e.g., from a petroleomics study. Interestingly, one of the first papers on the statistics of peak overlap used the theory of depolymerization [89] as a starting basis.

We have used the Enke and Nagels [37] petroleomics dataset as a model for low-level biomarkers. This data was similar to the plant extract data when shifted to unit mean with preservation of the CV. However, most complex samples have a unique distribution, although the distribution may look like a log-normal function in amplitude density. Further experimental data from a wider range of samples should be gathered to further assess the validity of the hypothesis that the log-normal distribution is a natural law describing mixtures of a natural origin.

The results in this paper indicate that chromatographic experiments, unless run at the highest efficiency, may seriously distort the amplitude distribution, especially for the lower-level amplitudes. This suggests that LC should not be used when assessing amplitude distributions, especially for evaluating low-level signals. We believe that multichannel detectors like MS are essential for determining component amplitude distributions. For this reason, we used the crude-oil data determined by high-resolution MS, instead of the plant-extract data determined by LC, to mimic synthetic chromatograms.

In many scenarios pertinent to biomarkers, such as top-down proteomics, capillary liquid chromatography is run under conditions of high effective saturation, even with long columns and very long run times [90,91]. Even with MS detection, the mass spectra at specific time slices are very crowded [90]. Although strategies like high-resolution MS and MS/MS are needed to “thin out” the overlapped components, a substantial boost in chromatographic efficiency (and resolution) would help mass spectral data interpretation. The coupling of multichannel detection techniques to efficient chromatography may help in sorting out the separation and suffer less peak loss in the search for low-abundance biomarkers and other components correlating to specific properties.

5. Conclusion

When the characterization of complex samples found in diverse sources such as proteomics and petroleomics is determined by chromatographic means, the low-level constituents are very difficult to determine due to chromatographic peak overlap that obscures the peak maxima. This effect was studied using peak simulations by varying the chromatographic efficiency through the effective saturation. Our analysis has concluded that it is peak overlap, and not the detection limit of the detector, that limits chromatographic-based analysis of low-level components.

Complex sample studies by one-dimensional, single-channel LC are severely compromised by peak overlap, and it is unlikely that future increases of peak capacity will change this situation. For good results, LC must be paired with a multi-channel detector and/or practiced in a multi-dimensional mode, e.g., the off-line collection of aliquots for subsequent analysis. Biomarker studies typically are run at such high levels of effective saturation that, even with multi-channel detection, the additional signals, e.g., mass spectra, contain information for more than one component in a single acquisition. Thus, sophisticated multi-channel algorithms for subsequent interpretation are required as well. The problem is immensely challenging and requires solution by a combination of methods.

Supplementary Material

1

Acknowledgements

We gratefully acknowledge the support of the National Institutes of Health under grant R44-GM108122-02. This paper is in partial fulfillment of the senior thesis research of Nicole M. Devitt within the department of Chemical and Biomolecular Engineering at the University of Delaware.

For those interested in a copy of the MATLAB program used to simulate peaks and analyze the synthetic chromatograms, please email inquiries to Mark.Schure@GMail.com.

Footnotes

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Supplementary materials

Supplementary material associated with this article can be found, in the online version, at doi:10.1016/j.chroma.2020.461266.

References

  • [1].Gerszten RE, Wang TJ, The search for new cardiovascular biomarkers, Nature 451 (2008) 949–952. [DOI] [PubMed] [Google Scholar]
  • [2].Anderson L, Candidate-based proteomics in search of biomarkers of cardiovascular disease, J. Physiology 563 (2009) 23–60. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Polanski M, Anderson NL, A list of candidate cancer biomarkers for targeted proteomics, Biomarker Insights 1 (2006) 1–48. [PMC free article] [PubMed] [Google Scholar]
  • [4].Liu T, Qian W-J, Gritsenko MA, Xiao W, Moldawer LL, Kaushal A, Monroe ME, Varnum SM, Moore RJ, Purvine SO, Maier RV, Davis RW, Tompkins RG, Camp DG, Smith RD, High dynamic range characterization of the trauma patient plasma proteome, Molecular & Cellular Proteomics 5 (2006) 1899–1913. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Jacobs JM, Adkins JN, Qian W-J, Liu T, Shen Y, Camp II DG, Smith RD, Utilizing human blood plasma for proteomic biomarker discovery, J. Proteome Res. 4 (2005) 1073–1085. [DOI] [PubMed] [Google Scholar]
  • [6].Wang YY, Cheng P, Chan DW, A simple affinity spin tube filter method for removing high-abundant common proteins or enriching low-abundant biomarkers for serum proteomic analysis, Proteomics 3 (2003) 243–248. [DOI] [PubMed] [Google Scholar]
  • [7].Plavina T, Wakshull E, Hancock WS, Hincapie M, Combination of abundant protein depletion and multi-Lectin affinity chromatography (M-LAC) for plasma protein biomarker discovery, J. Proteome Res 6 (2007) 662–671. [DOI] [PubMed] [Google Scholar]
  • [8].Tu C, Rudnick PA, Martinez MY, Cheek KL, Stein SE, Slebos RJC, Liebler DC, Depletion of abundant plasma proteins and limitations of plasma proteomics, J. Proteome Res 9 (2010) 4982–4991. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [9].Bogdanov B, Smith RD, Proteomics by FTICR mass spectrometry: top down and bottom up, Mass spectrometry reviews 24 (2005) 168–200. [DOI] [PubMed] [Google Scholar]
  • [10].Tran JC, Zamdborg L, Ahlf DR, Lee JE, Catherman AD, Durbin KR, Tipton JD, Vellaichamy A, Kellie JF, Li M, Wu C, Sweet SMM, Early BP, Siuti N, LeDuc RD, Compton PD, Thomas PM, Keliher NL, Mapping intact protein isoforms in discovery mode using top-down proteomics, Nature 480 (2011) 254–258. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Weng N, Overview of targeted quantitation of biomarkers and its applications, in: Weng N, Jian W (Eds.), Targeted biomarker quantitation by LC-Ms, Wiley, Hoboken, 2017. [Google Scholar]
  • [12].Stahl-Zeng J, Lange V, Ossola R, Eckhardt K, Krek W, Aebersold R, Domon B, High sensitivity detection of plasma proteins by multiple reaction monitoring of N-glycosites, Molecular & Cellular Proteomics 6 (2007) 1809–1817. [DOI] [PubMed] [Google Scholar]
  • [13].Zhao J, Patwa TH, Lubman DM, Simeone DM, Protein biomarkers in cancer: Natural glycoprotein microarray approaches, Curr. Opin. Mol. Ther 10 (2008) 602–610. [PMC free article] [PubMed] [Google Scholar]
  • [14].Taylor AD, Hancock WS, Hincapie M, Taniguchi N, Hanash SM, Towards an integrated proteomic and glycomic approach to finding cancer biomarkers, Genome Medicine 1 (6) (2009) 1–10 Article 57. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Adamcyzk B, Tharmalingam T, Rudd PM, Glycans as cancer biomarkers, Biochimica et Biophysica Acta 1820 (2012) 1347–1353. [DOI] [PubMed] [Google Scholar]
  • [16].Kuzmanov U, Musrap N, Kosanam H, Smith CR, Batruch I, Dimitromanolakis A, Diamandis EP, Glycoproteomic identification of potential glycoprotein biomarkers in ovarian cancer proximal fluids, Clin. Chem. Lab. Med 51 (2013) 1467–1476. [DOI] [PubMed] [Google Scholar]
  • [17].Drake P, Schilling B, Gibson B, Fisher S, Elucidation of N-glycosites within human plasma glycoproteins for cancer biomarker discovery, Mass spectrometry of glycoproteins: methods and protocols, Methods in Molecular Biology 951 (2013) 307–322. [DOI] [PubMed] [Google Scholar]
  • [18].Hua S, Jeong HN, Dimapasoc LM, Kang I, Han C, Choi J-S, Lebrilla CB, An HJ, Isomer-specific LC/MS and LC/MS/MS profiling of the mouse serum N-glycome revealing a number of novel sialylated N-glycans, Anal. Chem 85 (2013) 4636–4643. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [19].McCormack AL, Schieltz DM, Goode B, Yang S, Barnes G, Drubin D, Yates JR III, Direct analysis and identification of proteins in mixtures by LC/MS/MS and database searching at the low-femtomole level, Anal. Chem 69 (1997) 767–776. [DOI] [PubMed] [Google Scholar]
  • [20].Qian W-J, Jacobs JM, Liu T, Camp II DG, Smith RD, Review: Advances and challenges in liquid chromatography-mass spectrometry-based proteomics profiling for clinical applications, Molecular & Cellular Proteomics 5 (10) (2006) 1727–1744. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [21].Karpievitch YV, Polpitiya AD, Anderson GA, Smith RD, Dabney AR, Liquid Chromatography Mass Spectrometry-Based Proteomics; Biological and Technological Aspects, The annals of applied statistics 4 (2010) 1797–1823. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [22].Chervet JP, Ursem M, Salzmann JP, Instrumental requirements for nanoscale liquid chromatography, Anal. Chem 68 (1996) 1507–1512. [DOI] [PubMed] [Google Scholar]
  • [23].Blue LE, Franklin EG, Godinho JM, Grinias JP, Grinias KM, Lunn DB, Moore SM, Recent advances in capillary ultrahigh pressure liquid chromatography, J. Chromatogr. A 1523 (2017) 17–39. [DOI] [PubMed] [Google Scholar]
  • [24].Dyson N, Chromatographic Integration Methods, second edition, RSC Chromatography Monographs, The Royal Society of Chemistry, London, 1998. [Google Scholar]
  • [25].Andreev VP, Rejtar T, Chen H-S, Moskovets EV, Ivanov AR, Karger BL, A universal denoising and peak picking algorithm for LC-MS based on matched filtration in the chromatographic time domain, Anal. Chem 75 (2003) 6314–6326. [DOI] [PubMed] [Google Scholar]
  • [26].Wahab MF, Dasgupta PK, Kadjo AF, Armstrong DW, Sampling frequency, response times and embedded signal filtration in fast, high efficiency liquid chromatography: a tutorial, Anal. Chimica Acta 907 (2016) 31–44. [DOI] [PubMed] [Google Scholar]
  • [27].Wong CCL, Cociorva D, Venable JD, Xu T, Yates JR, Comparison of different signal thresholds on data dependent sampling in Orbitrap and LTQ mass spectrometry for the identification of peptides and proteins in complex mixtures, J. Am. Soc. Mass Spectrom 20 (2009) 1405–1414. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Petkovic M, Schiller J, Müller J, Müller M, Arnold K, Arnhold J, The signal-to-noise ratio as the measure for the quantification of lysophospholipids by matrix-assisted laser desorption/ionisation time of flight mass spectrometry, Analyst 126 (2001) 1042–1050. [DOI] [PubMed] [Google Scholar]
  • [29].Chen L, Cottrell CE, Marshall AG, Effect of signal-to-noise ratio and number of data points upon precision in measurement of peak amplitude, position and width in Fourier transform spectrometry, Chemomet. Intell. Lab. Systems 3 (1986) 51–58. [Google Scholar]
  • [30].Makarov A, Denisov E, Lange O, Horning S, Dynamic range of mass accuracy in LTQ Orbitrap hybrid mass spectrometer, J. Am. Soc. Mass Spectrom 17 (2006) 977–982. [DOI] [PubMed] [Google Scholar]
  • [31].Zhurov K, Kozhinov AN, Fornelli L, Tsybin YO, Distinguishing analyte from noise components in mass spectra of complex samples: where to cut the noise? Anal. Chem 86 (2014) 3308–3316. [DOI] [PubMed] [Google Scholar]
  • [32].Du P, Stolovitzky G, Horvatovich P, Bischoff R, Lim J, Suits F, A noise model for mass spectrometry based proteomics, Bioinformatics 24 (2008) 1070–1077. [DOI] [PubMed] [Google Scholar]
  • [33].Mallet CR, Lu Z, Mazzeo JR, A study of ion suppression effects in electrospray ionization from mobile phase additives and solid-phase extracts, Rapid Commun. Mass Spectrom 18 (2004) 49–58. [DOI] [PubMed] [Google Scholar]
  • [34].Nagels LJ, Creten WL, Vanpeperstraete PM, Determination limits and distribution function of ultraviolet absorbing substances in liquid chromatographic analysis of plant extracts, Anal. Chem 55 (1983) 216–220. [Google Scholar]
  • [35].Nagels LJ, Creten WL, Evaluation of the glassy carbon electrochemical detector selectivity in high-performance liquid chromatographic analysis of plant material, Anal. Chem 57 (1985) 2706–2711. [Google Scholar]
  • [36].Dondi F, Kahie YD, Lodi G, Remelli M, Reschiglian P, Bighi C, Evaluation of the number of components in multi-component liquid chromatograms of plant extracts, Anal. Chimica Acta 191 (1986) 261–273. [Google Scholar]
  • [37].Enke CG, Nagels LJ, Undetected components in natural mixtures: How many? What concentrations? Do they account for chemical noise? What is needed to detect them? Anal. Chem 83 (2011) 2539–2546. [DOI] [PubMed] [Google Scholar]
  • [38].Gundlach-Graham A, Enke CG, Effect of response factor variations on the response distribution of complex mixtures, Eur. J. Mass Spectrom 21 (2015) 471–479. [DOI] [PubMed] [Google Scholar]
  • [39].Log-normal distribution: https://en.wikipedia.org/wiki/Log-normal_distribution.
  • [40].Raabe O, Particle size analysis using grouped data and the log-normal distribution, Aerosol Science 2 (1971) 289–303. [Google Scholar]
  • [41].Krishnamoorthy K, Handbook of Statistical Distributions with Applications (Statistics: A Series of Textbooks and Monographs), fourth ed, Chapman and Hall/CRC, Boca Raton, 2015. [Google Scholar]
  • [42].Rinne H, The Weibull distribution: A handbook, CRC Press, Boca Raton, 2009. [Google Scholar]
  • [43].Weibull distribution: https://en.wikipedia.org/wiki/Weibull_distribution.
  • [44].Brown WK, Wohletz KH, Derivation of the Weibull distribution based on physical principles and its connection to the Rosin–Rammler and lognormal distributions, J. Appl. Phys 78 (1995) 2758–2763. [Google Scholar]
  • [45].Exponential distribution: https://en.wikipedia.org/wiki/Exponential_distribution.
  • [46].Nagels LJ, Creten WL, Quantitative evaluation of chromatographic analysis of complex mixtures by establishing limits of determination, Anal. Chim. Acta 169 (1985) 299–307. [Google Scholar]
  • [47].Nagels LJ, Creten WL, van Haverbeke L, Determination limits in high-performance liquid chromatography of plant phenolic compounds with an ultraviolet detector, Anal. Chim. Acta 173 (1985) 185–192. [Google Scholar]
  • [48].Nagels LJ, Creten WL, Parmentier F, Statistical model for organic chromatographic trace analysis of complex samples. A case study: plant extracts, Intern. J. Environ. Anal. Chem 25 (1986) 173–186. [Google Scholar]
  • [49].El Fallah MZ, Martin M, Quantitative determination limit in chromatography: computer-based simulations, J. Chromatogr 557 (1991) 23–37. [Google Scholar]
  • [50].Karger BL, Martin M, Guiochon G, Role of column parameters and injection volume on detection limits in liquid chromatography, Anal. Chem 46 (1974) 1640–1647. [Google Scholar]
  • [51].Cover TM, Thomas JA, Elements of Information Theory, John Wiley & Sons, New York, 1991. [Google Scholar]
  • [52].Signal-to-noise_ratio: https://en.wikipedia.org/wiki/Signal-to-noise_ratio.
  • [53].Currie LA, Limits for qualitative detection and quantitative determination: application to radiochemistry, Anal. Chem 40 (1968) 586–593. [Google Scholar]
  • [54].Long GL, Winefordner JD, Limit of detection: A closer look at the IUPAC definition, Anal. Chem 55 (1983) 712A–724A. [Google Scholar]
  • [55].Davis JM, Giddings JC, Statistical theory of component overlap in multicomponent chromatograms, Anal. Chem 55 (1983) 418–424. [Google Scholar]
  • [56].Schure MR, Davis JM, The Simple Use of Statistical Overlap Theory in Chromatography, LCGC North America Magazine 33 (s4) (2015) 10–14. [Google Scholar]
  • [57].Davis JM, Computation of distribution of minimum resolution for log-normal distribution of chromatographic peak heights, J. Chromatogr. A 1218 (2011) 7841–7849. [DOI] [PubMed] [Google Scholar]
  • [58].Davis JM, New theory for distribution of minimum resolution in multicomponent separations with noise/detection limits, J. Chromatogr. A 1251 (2012) 1–9. [DOI] [PubMed] [Google Scholar]
  • [59].Davis JM, Carr PW, Effective saturation: a more informative metric for comparing peak separation in one- and two-dimensional separations, Anal. Chem 81 (2009) 1198–1207. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [60].Davis JM, Dependence on effective saturation of numbers of singlet peaks in one- and two-dimensional separations, Talanta 83 (2011) 1068–1073. [DOI] [PubMed] [Google Scholar]
  • [61].Giddings JC, Maximum number of components resolvable by gel filtration chromatography, Anal. Chem 39 (1967) 1027–1028. [Google Scholar]
  • [62].Grushka E, Chromatographic peak capacity and the factors influencing it, Anal. Chem 42 (1970) 1142–1147. [Google Scholar]
  • [63].Felinger A, Critical peak resolution in multicomponent chromatograms, Anal. Chem. 69 (1997) 2976–2979. [DOI] [PubMed] [Google Scholar]
  • [64].Hamming RW, Numerical Methods for Scientists and Engineers, second ed, Dover Publications, Mineola, New York, 1987. [Google Scholar]
  • [65].Seber GAF, Wild CJ, Nonlinear regression, John Wiley & Sons, Hoboken, 2003. [Google Scholar]
  • [66].Press WH, Teukolsky SA, Vetterling WT, Flannery BP, Numerical Recipes in Fortran, second ed, Cambridge Univ. Press, New York, 1992. [Google Scholar]
  • [67].Devroye L, Non-uniform Random Variate Generation, Springer-Verlag, New York, 1986. [Google Scholar]
  • [68].Coefficient of determination: https://en.wikipedia.org/wiki/Coefficient_of_determination.
  • [69].Schoeder M, Fractals Chaos, Power Laws, Dover Publications, Mineola, New York, 2009. [Google Scholar]
  • [70].Foss S, Korshunov D, Zachary S, An introduction to heavy-tailed and subex-ponential distributions, second edition, Springer, New York, 2013. [Google Scholar]
  • [71].Clauset A, Shalizi CR, Newman MEJ, Power-law distributions in empirical data, SIAM Review 51 (2009) 661–703. [Google Scholar]
  • [72].Pietrogrande MC, Dondi F, Felinger A, Davis JM, Statistical study of peak overlapping in multicomponent chromatograms: Importance of the retention pattern, J, Chemometr. Intell. Lab. Syst 28 (1995) 239–258. [Google Scholar]
  • [73].Renyi A, Probability Theory, Dover, Mineola, New York, 1998, p. 116. [Google Scholar]
  • [74].Lewis KC, Opiteck GJ, Jorgenson JW, Comprehensive on-line RPLC-CZE-MS of peptides, J. Am. Soc. Mass Spectrom 8 (1997) 495–500. [Google Scholar]
  • [75].Shen Y, Tolić N, Zhao R, Pasa-Tolić L, Li L, Berger SJ, Harkewicz R, Anderson GA, Belov ME, Smith RD, High-throughput proteomics using high-efficiency multiple-capillary liquid chromatography with on-line highperfor-mance ESI FTICR mass spectrometry, Anal. Chem 73 (2001) 3011–3021. [DOI] [PubMed] [Google Scholar]
  • [76].Valentine SJ, Kulchania M, Srebalus Barnes CA, Clemmer DE, Multidimensional separations of complex peptide mixtures: a combined high-performance liquid chromatography/ion mobility/ time-of-flight mass spectrometry approach, Int. J. Mass. Spectrom 212 (2001) 97–109. [Google Scholar]
  • [77].Ruotolo BT, Gillig KG, Stone EG, Russell DH, Peak capacity of ion mobility mass spectrometry: Separation of peptides in helium buffer gas, J. Chromatogr. B 782 (2002) 385–392. [DOI] [PubMed] [Google Scholar]
  • [78].Faccin M, Bruscolini P, MS/MS Spectra Interpretation as a statistical–mechanics problem, Anal. Chem 85 (2013) 4884–4892. [DOI] [PubMed] [Google Scholar]
  • [79].Zhurov KO, Kozhinov AN, Tsybin YO, Evaluation of high-field orbitrap Fourier transform mass spectrometer for petroleomics, Energy and Fuels 27 (2013) 2974–2983. [Google Scholar]
  • [80].Pawitan Y, In all likelihood, Oxford University Press, Oxford, 2013. [Google Scholar]
  • [81].Melmer M, Stangler T, Premstaller A, Lindner W, Comparison of hydrophilic-interaction, reversed-phase and porous graphitic chromatography for glycan analysis, J. Chromatogr. A 1218 (2011) 118–123. [DOI] [PubMed] [Google Scholar]
  • [82].Wang C, Yuan J, Wang Z, Huang L, Separation of one-pot procedure released O-glycans as 1-phenyl-3-methyl-5 pyrazolone derivatives by hydrophilic interaction and reversed-phase liquid chromatography followed by identification using electrospray mass spectrometry and tandem mass spectrometry, J, Chromatogr. A 1274 (2013) 107–117. [DOI] [PubMed] [Google Scholar]
  • [83].Michael C, Rizzi AM, Quantitative isomer-specific N-glycan fingerprinting using isotope coded labeling and high performance liquid chromatography-electrospray ionization-mass spectrometry with graphitic carbon stationary phase, J. Chromatogr. A 1383 (2015) 88–95. [DOI] [PubMed] [Google Scholar]
  • [84].Kozak RP, Tortosa CB, Fernandes DL, Spencer DLR, Comparison of procainamide and 2-aminobenzamide labeling for profiling and identification of glycans by liquid chromatography with fluorescence detection coupled to electrospray ionization-mass spectrometry, Anal. Biochem 486 (2015) 38–40. [DOI] [PubMed] [Google Scholar]
  • [85].Mechref Y, Peng W, Huang Y, A brief review of recent advances in isomeric N- and O-glycomics, Current Trends in Mass Spectrometry 17 (2019) 23–31. [Google Scholar]
  • [86].Montroll EW, Simha R, Theory of depolymerization of long chain molecules, J. Chem. Phys 8 (1940) 721–727. [Google Scholar]
  • [87].Ziff RM, Grady ED, Kinetics of polymer degradation, Macromol. 19 (1986) 2513–2519. [Google Scholar]
  • [88].Yashin VV, Isayev AI, On the theory of radical depolymerization: A rigorous solution, J. Polym. Sci. Part B, Poly. Phys 41 (2003) 965–982. [Google Scholar]
  • [89].Martin M, Guiochon G, Analogy between the depolymerization and separation processes. Application to the statistical evaluation of complex chromatograms, Anal. Chem 57 (1985) 289–295. [Google Scholar]
  • [90].Shen Y, Tolić N, Piehowski PD, Shukla AK, Kim S, Zhao R, Qu Y, Robinson E, Smith RD, Paša-Tolić L, High-resolution ultrahigh-pressure long column reversed-phase liquid chromatography for top-down proteomics, J. Chromatogr. A 1498 (2017) 99–110. [DOI] [PubMed] [Google Scholar]
  • [91].Xiang P, Yang Y, Zhao Z, Chen A, Liu S, Experimentally validating open tubular liquid chromatography for peak capacity of 2000 in three hours, Anal. Chem 91 (2019) 10518–10523. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

1

RESOURCES