Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2009 Jun 22.
Published in final edited form as: J Chromatogr A. 2006 Oct 27;1137(2):163–172. doi: 10.1016/j.chroma.2006.10.024

Fast Gradient Elution Reversed-Phase Liquid Chromatography with Diode-Array Detection as a High-throughput Screening Method for Drugs of Abuse II. Data Analysis

Sarah E G Porter 1, Dwight R Stoll 2, Changyub Paek 2, Sarah C Rutan 1,*, Peter W Carr 2
PMCID: PMC2699672  NIHMSID: NIHMS28909  PMID: 17070534

Abstract

In Part I of this work, we developed a method for the detection of drugs of abuse in biological samples based on fast gradient elution liquid-chromatography coupled with diode array spectroscopic detection (LC-DAD). In this part of the work, we apply the chemometric method of target factor analysis (TFA) to the chromatograms. This algorithm identifies the target compounds present in chromatograms based on a spectral library, resolves nearly co-eluting components, and differentiates between drugs with similar spectra. The ability to resolve highly overlapped peaks using the spectral data afforded by the DAD is what distinguishes the present method from conventional library searching methods. Our library has a mean list length of 1.255 and a discriminating power of 0.997 when both retention index and spectral factors are considered. The algorithm compares a library of 47 different compounds of toxicological relevance to unknown samples and identifies which compounds are present based on spectral and retention index matching. The application of a corrected retention index for identification rather than raw retention times compensates for long-term and column-to-column retention time shifts and allows for the use of a single library of spectral and retention data. Training data sets were used to establish the search and identification parameters of the method. A validation data set of 70 chromatograms was used to calculate the sensitivity (correct identification of positives) and specificity (correct identification of negatives) of the method, which were found to be 92 % and 94% respectively.

1. Introduction

1.1. Drug Screening

Forensic drug and toxicology laboratories have an on-going need for rapid, simple assays for screening biological samples suspected of containing drugs and metabolites of toxicological interest. Current techniques for such analyses include enzyme immunoassay (EIA) [13] and liquid chromatography with diode array detection (LC-DAD) [4], whereas liquid chromatography with mass spectrometric detection (LC-MS) and gas chromatography – mass spectrometry (GC-MS) are typically used as confirmatory methods [2, 5]. A recent review by Maurer [6] gives an overview of chromatographic techniques commonly applied in toxicological testing. While EIA is usually used only in screening for substances with similar properties or structures (i.e., drug classes such as amphetamines or benzodiazepines) [7], mass spectrometric detection in chromatography is useful for reliable identification of analytes in biological matrices such as blood and urine. In fact, many recent publications dealing with LC in forensic toxicology are specifically discussions of LC-MS methods [811].

As an alternative to MS, absorbance detectors (including DAD) are much less expensive and relatively simple to use. We will show that LC-DAD is a fast and robust method for screening biological samples in conjunction with a library search algorithm to quickly identify those samples that require confirmatory testing. Numerous methods for using LC-DAD as a screening method have been published and were recently reviewed by Pragst [4]. Because a DAD can collect an entire spectrum at each time point in a chromatogram, the resultant data are information rich and more selective than single wavelength chromatograms. Herzler et al. [12] showed that DAD data can be used to selectively identify abused substances in spectrochromatograms based on comparison to a library of over 2500 “toxicologically relevant” substances. Their method relied on the calculation of a ‘similarity index’ (related to the correlation coefficient) to determine the similarity between a spectrum in an unknown chromatogram and a library spectrum. In addition to spectral matching, a relative retention time was also used to identify the substances of interest.

We have adopted a similar technique using the chemometric method of target factor analysis (TFA) to match spectra contained within a spectrochromatogram to a library. The main distinctions between our method and previously published work are (i) the use of high-speed, gradient elution liquid chromatography (discussed in Part I) [13], (ii) the use of a corrected retention index rather than retention times for library matching, and (iii) the use of a factor analysis algorithm that can readily distinguish the presence of library compounds, within severely overlapped chromatographic data. This gradient elution method boasts high retention time reproducibility between runs (± 0.002 min) and narrow peak widths (0.045 minutes) and a total cycle time of less than four minutes, which includes gradient re-equilibration time between runs [13]. Another advantage of this high speed method is that it allows for more frequent running of controls and calibration standards. The application of the retention index allows us to account for longer term (13 months) variation in retention times, and permits the application of a single set of library retention indices to all subsequent analyses.

This part of the work focuses on the application of TFA to the data collected in part I [13] as well as a collection of both drug-free blood samples and blood samples spiked with drugs contained in the library. By using TFA to match the peaks present in a sample chromatogram, overlapped peaks can be distinguished from one another. This improvement over traditional library searching algorithms creates a fast and accurate method to determine which samples should go on for a confirmatory test, using a method such as LC-MS or GC-MS. If no compounds of interest are detected, the analysis is complete in only a fraction of the time required for the full analysis. The results provided by LC analysis provide important information for the selection of the appropriate chromatographic and mass spectrometric conditions for subsequent confirmatory analyses. The LC-DAD screening test is fast enough that a second LC analysis on an “orthogonal” stationary phase would be quite feasible, which would further improve the selectivity of the method [14, 15].

1.2. Fast Gradient Liquid Chromatography

One of the main shortcomings in gradient elution reversed-phase LC (RPLC) has been the time required to re-equilibrate the system between runs. As a consequence, the throughput of an otherwise fast method can be significantly reduced. However, recent improvements in the methodology for ultra-fast gradient LC and significant reductions in required re-equilibration times have greatly improved the speed and efficiency by which these separations can be carried out. Schellinger et al. [1618] have recently investigated the minimum essential re-equilibration time in a gradient elution to provide excellent retention time reproducibility (±0.004 min) between runs without the requirement for full column equilibration. As a result of this work, a gradient LC-DAD method has been developed with a gradient time of less than three minutes and a re-equilibration time of less than one minute for a total cycle time of four minutes. An instrument modification employing two pumping systems further reduces the total cycle time of a single analysis to only 2.8 minutes. The development and demonstrated reproducibility of this method is discussed extensively in Part I of this series [13].

2. Theory

The data obtained when using hyphenated instruments are multivariate data. A single LC-DAD experiment produces two-way data, which can be represented mathematically by [19]

X=A·BT (1)

where X is the data matrix (the spectrochromatogram of a mixture sample), A contains the chromatographic profiles of the individual components of the mixture, and BT is the transpose of the spectral profiles of the individual components. The dimensions of A and B are a × n and b × n, respectively, where a is the number of chromatographic time points (also the number of rows in the data matrix), b is the number of wavelengths in each spectrum (also the number of columns in the data set), and n is the number of components in the mixture. At each time point in the chromatogram, there is a corresponding spectrum that may be compared to library spectra. This concept is the basis of TFA [20] and is also applied in many other chemometric methods, including multivariate curve fitting algorithms such as alternating least squares (ALS) [21] and parallel factor analysis (PARAFAC) [22].

2.1. Target Factor Analysis

TFA is a technique used in many analytical chemistry applications for determining the presence or absence of a known spectrum within a set of overlapped spectra [20, 23]. With this method, a series of individual reference spectra (targets) from a library are compared to the abstract spectra from the spectrochromatogram matrix to determine if a target spectrum is present in the sample. The first step in this approach is the application of a principal component analysis method called singular value decomposition (SVD), which decomposes the spectrochromatogram into abstract spectra, abstract chromatograms, and a diagonal matrix of singular values [24]. The first n columns of the abstract spectral matrix V, are represented by . By selecting only the first n columns of V, those contributions to the data that are not significant (i.e., the noise) are excluded. The abstract spectra are aptly named; they do not have any chemical meaning and are merely mathematical representations of the data. In order to compare this set of spectra to a chemically meaningful reference spectrum from the library, target transformation is carried out. The linear combination of the n abstract spectra are tested against the reference (target) spectrum, L to create a transformation matrix T (also called a rotation matrix), according to the equation

T=V¯·L (2)

where is the pseudo-inverse of V̄. T is used to predict the spectrum of the component, according to Eq. (3),

L^=V¯·T (3)

where is the predicted spectrum. This transformation finds the linear combination of the first n abstract spectra that most closely represents the target spectrum. The spectrum obtained from the rotation, , is now a chemically meaningful spectrum that can be directly compared to the target. The advantage to using the TFA algorithm rather than simply comparing the spectrum at the top of a chromatographic peak to a library spectrum is that a target spectrum can be successfully matched with the components in the abstract spectral matrix even in the case of severe peak overlap.

The degree of correlation between L and is indicated by the angle theta (θ):

θ=cos1ρ (4)

where ρ is the square root of the correlation coefficient between the predicted spectrum and the target spectrum [25]. The two spectra can be represented as vectors in b-dimensional space (where b is the number of wavelengths in the spectrum). For two spectra that are identical (ρ = 1.000), the angle between them (θ) is 0° and is independent of the magnitude of the vectors. However, due to the presence of noise, it is more likely that two independently obtained spectra will have a small but non-zero value of θ. Lohnes and Wentzel [26] suggest that a value of θ less than 10° indicates a high probability that two spectra are the same; however this variable is highly dependent on the amount of noise present in the system.

For determination of the rank of the data sets (n), we calculated the residual standard deviation (RSD) as described by Malinowski [23], which compares the RSD of the eigenvalues of the matrix. The eigenvalues (λi) can be determined by

λi=sii2 (5)

where sii are the diagonal elements of the singular value matrix, S, determined from the SVD analysis [24]. The RSD is calculated according to the following equation

RSD=(i=n+1bλi0a(bn))1/2 (6)

where n is the number of factors being considered and λi0 are the eigenvalues attributed to noise rather than real chemical information. The RSD is determined for a single factor (n = 1), and compared to the estimated experimental error in the data. If the calculated RSD is greater than the estimated error inherent in the data, more factors are required to adequately describe the data. The calculation is repeated for two factors, and then three, and so on until all of the eigenvalues have been included (up to n = b1). Once the point has been reached where the RSD is approximately equal to the estimated random error contributed by the instrument, the correct number of factors has been determined. Although this method of rank determination is not exact unless the noise level of the data is well known, it is a simple way to estimate the number of components present.

2.2. Evaluation of the Screening Method

A useful library search algorithm should always give a minimum number of false positive and false negative results. A screening method should quickly and accurately identify any samples that ought to be tested further. In the case of a false positive result, the sample will be further examined by a more reliable confirmatory method (LC-MS or GC-MS) and the slower, more sophisticated assay will resolve the error. However, if the rate of false positives is too high, the benefit gained from using the fast screening method will be lost. In the case of a false negative result, the sample will not be further analyzed, and thus this type of error is more serious.

To quantitatively measure the effectiveness of our screening method, a 2 × 2 contingency table was used to evaluate the results for two qualitative variables, outcome and consequence [27]. The possible outcomes of a screening test are a positive test (target compounds identified, P) and a negative test (no target compounds identified, N). The consequence of a positive test is that the sample will go on for confirmatory testing, while the consequence of a negative test is that the sample will not go on for further testing. The sensitivity and specificity of the method are calculated by [27]

Sensitivity=100(TPC1) (7)
Specificity=100(TNC2) (8)

where TP is the number of true positives (samples correctly identified as containing library substances), TN is the number of true negatives (blank samples correctly identified as negative) and C1 and C2 are equal to the total number of analyzed chromatograms that contained drug peaks and the total number of blank chromatograms, respectively. These two values allow the outright comparison between methods, or in our case between algorithm parameters, by their tendency to correctly detect positives (sensitivity) and negatives (specificity).

The positive predictive value (PPV) and the negative predictive value (NPV) are additional parameters that can measure the effectiveness of a method. These values are calculated by Eqs. (9) and (10) [27],

PPV=100(TPTP+FP) (9)
NPV=100(TNFN+TN) (10)

where FN represents the number of false negatives and FP represents the number of false positives. If the analysis method indicates that a target compound is present, the PPV is the conditional probability that that compound is actually present in the sample. Conversely, the NPV is an estimate of the probability that a sample indicated as containing no target compounds is truly blank. In other words, PPV and NPV are estimates of the ability of the method to detect true positives and true negatives.

Two other commonly reported measures of selectivity for screening methods are the mean list length (MLL) [28] and the discriminating power (DP) [29]. The DP is a measure of the probability that two compounds in the library can be distinguished by the method. The list length (LL) for a compound i represents the number of compounds, n, in the library (including the compound itself) that are indistinguishable from compound i, and the MLL is simply the mean of the LL values over all the compounds in the library. These parameters are calculated using the following equations:

DP=12pq(q1) (11)
MLL=iqniq (12)

where p is the number of indistinguishable substance pairs, and q is the total number of substances in the library.

3. Experimental

3.1. Materials and Reagents

All solvents were obtained from J.T. Baker (Mallinckrodt Baker, Inc., Phillipsburg, NJ, USA), drug free blood was obtained from UTAK Laboratories Inc. (Valencia, CA, USA), and the drug standards were obtained from Cerilliant Corporation (Round Rock, TX, USA). The Clean Screen DAU columns used for the solid phase extraction (SPE) were obtained from United Chemical Technologies, Inc. (Bristol, PA, USA). Deionized water was used to prepare all aqueous solutions.

3.2. Preparation of blood samples

Whole blood specimens from volunteers at the Minnesota Bureau of Criminal Apprehension (BCA) were collected in blood collection tubes containing 100 mg of sodium fluoride and 20 mg of potassium oxalate. The procedure for extraction of the samples was adapted from Telepchak et al. [30, 31]. Blood samples were stored at −20 °C until analyzed, warmed to room temperature, and mixed on a tube rocker for at least 2 minutes before extraction. A 1.0 mL aliquot of each blood sample was added to screw cap tubes, the tubes were mixed by vortexing and 4 mL of water was added to the sample and allowed to stand for 5 minutes. The samples were centrifuged for 10 minutes at 3000 rpm, and 2 mL of 0.1 M phosphate buffer (pH 6.0) was added to the supernatant. The pH was adjusted (if needed) to 6.0 ± 0.5 with 0.1 M mono- or dibasic sodium phosphate.

The samples were extracted using a Zymark RapidTrace automated SPE system (Zymark Corp., Hokinton, MA, USA). The SPE columns used were 3 mL Clean Screen DAU columns with a 200 mg sorbent bed. The RapidTrace was programmed with the following parameters: the SPE cartridges were conditioned with 3 mL of methanol, followed by 3 mL of water, followed by 1 mL of 0.1 M phosphate buffer (pH 6.0), at a flow rate of 12 mL/min. After conditioning, 7.2 mL of the sample was loaded onto the column at 1 mL/min. The RapidTrace cannula was purged with 6 mL of water followed by 6 mL of methanol at 12 mL/min, then the SPE cartridges were rinsed with 3 mL of water followed by 1 mL of 0.1 M acetic acid at 12 mL/min. The columns were dried under nitrogen for 2 minutes, and then rinsed with 2 mL hexane at 12 mL/min. The samples were eluted with 3 mL of a 50/50 hexane/ethyl acetate eluent at 2 mL/min followed by a rinse with 3 mL of methanol at 12 mL/min (acidic fraction). The columns were again dried under nitrogen for 5 minutes, and the second fraction was eluted with 3 mL of a methylene chloride/isopropanol/ammonium hydroxide (78:20:2) solvent at 1 mL/min (basic fraction).

Following the SPE procedure, the organic solvent was evaporated from the acidic and basic fractions under nitrogen at 37 °C until approximately 300 μL of the solvent remained; 10 μL of a 1% HCl in isopropanol (v/v) solution was added, and the samples were evaporated to dryness under nitrogen. The samples were then resuspended in 40 μL of a water/acetonitrile (95/5) mixture containing 10 mg/L of uracil and vortexed for 5 seconds. The samples were warmed with a heat gun for about 5 seconds until condensation formed around the tube about 2 cm above the residue. The samples were centrifuged at low speed (1000 rpm) for 2 minutes and the supernatant was transferred into autosampler inserts. The inserts were centrifuged at high speed (13,000 – 14,000 rpm) for 20 to 60 minutes and the dried extracts were reconstituted using 50 μL of the initial mobile phase used in the gradient elution (10/90 (% v/v) acetonitrile/20 mM perchloric acid in water). Finally, the basic fraction was analyzed by the LC system described in ref. [13]; the injection volume for LC analysis was 10 μL.

Blood samples from volunteers were pooled to prepare samples for matrix studies. The pooled samples were spiked with varying levels of target drugs and used to determine the limits of detection of the TFA method with respect to a matrix background. These spiked samples were included as part of data set B as described below.

3.3. Collection of chromatograms

All chromatograms were collected on one of the fast LC-DAD systems described in Part I of this work [13]. Table 1 summarizes the data sets used to develop and validate the analysis method, including the reference data set for the library. The compounds included in this study are shown in Table 2. A set of chromatograms was collected containing single component peaks of each drug in the library in order to determine the reference retention times and retention indices for each compound, which are denoted in Table 2 as tR° and I° in respectively. A spectral library was created by obtaining the UV-visible spectrum for each of the drugs from 201 – 301 nm. Rather than independently collecting the spectra, they were extracted from the chromatographic peak maxima of the pure components. The chromatograms in the three data sets outlined in Table 1 were pooled together, and then separated into a training set and a validation set. The training data were used to determine the method parameters and contained various combinations of overlapped peaks, low abundance peaks, and spectrally similar peaks in order to determine the effectiveness of the method under these different conditions.

Table 1.

Summary of data sets evaluated. All data sets included blank chromatograms.

Data Set Compounds included Purpose
Reference all Establish reference values for spectra and retention times
A various mixtures Evaluate overlapped peaks and peaks from same spectral class
B amitriptyline, oxycodone, zolpidem Evaluate low intensity peaks in the presence of gradient background and blood matrix
C amphetamine, MDA, hydrocodone, zolpidem Evaluate different concentration ratios of highly overlapped peaks

Table 2.

Spectral class, standard retention index, and standard retention time for 47 library compounds.

Class Compound Name I° tR°
A 2-hydroxyethylflurazepam 441.41 1.6104
B 6-Acetylmorphine 279.82 0.6094
C 7-Aminoclonazepam 218.59 0.3937
U 7-Aminoflunitrazepam 259.24 0.5369
D Alprazolam 455.52 1.7078
U Amitriptyline * 546.47 2.2809
E Amphetamine 278.57 0.6050
U Benzoylecgonine * 336.05 0.9126
U Bromazepam 326.01 0.8480
F Cathinone 233.18 0.4451
U Chlordiazepoxide 394.18 1.2869
U Clobazam 373.54 1.1540
D Clonazepam * 465.61 1.7775
B Codeine 245.87 0.4898
U Cyclobenzaprine HCl 532.1 2.1987
A Desalkylflurazepam 439.08 1.5943
G Diazepam * 428.86 1.5237
E Ephedrine 246.44 0.4918
D Estazolam 430.7 1.5364
D Flunitrazepam 484.32 1.9067
D Flurazepam 465.64 1.2590
H Hydrocodone 301.32 0.6890
H Hydromorphone 198.44 0.3262
C Lometazepam 530.55 2.1898
C Lorazepam 475.28 1.8443
I MBDB 343.42 0.9601
E Methamphetamine 302.75 0.6982
F Methcathinone 248.42 0.4988
I Methylenedioxyamphetamine (MDA) 289.72 0.6443
I Methylenedioxyethylamphetamine (MDEA) 327.27 0.8561
I Methylenedioxymethamphetamine (MDMA) 306.4 0.7217
D Midazolam 455.3 1.7063
B Morphine * 140.59 0.2518
U Nitrazepam 377.05 1.1766
U Nordiazepam 382.28 1.2103
C Oxazepam 450.71 1.6746
H Oxycodone * 276.53 0.5978
H Oxymorphone 162.91 0.2805
J Paramethoxyamphetamine (PMA) 302.03 0.6936
J Paramethoxymethamphetamine (PMMA) 315.81 0.7823
E Phenylpropanolamine 219.73 0.3977
G Prazepam 549.41 2.2977
E Pseudoephedrine 244.71 0.4857
H Sertraline * 571.11 2.4219
C Temazepam * 498.89 2.0073
D Triazolam 494.14 1.9745
U Zolpidem hemitartrate * 404.21 1.3535
*

Compound used as a secondary standard

3.4. Data Analysis

All data analysis was performed in the Matlab® programming environment (The Mathworks, Natick, MA, USA), version 7.0.4. A macro provided by Agilent (Agilent Technologies; Wilmington, DE, USA) was used to convert the data collected in Chemstation (version A.10.01) into a comma separated variable (CSV) file format that could be loaded into Matlab as a variable. The SVD algorithm used was a built-in Matlab function. All other Matlab functions, including the TFA algorithm, were written in-house. The ALS algorithm was written in-house as reported previously [21]. Analysis was carried out on a Dell® Optiplex GX280 with a Pentium 4 3.2 GHz processor with 2 GB of RAM.

4. Results and Discussion

4.1. Creation of retention index and spectral libraries

The library used in this study contained 47 different compounds, which we refer to as target compounds. These compounds are listed in Table 2. The compounds were judiciously chosen to include frequently abused drugs and drugs of toxicological relevance and several metabolites. Chromatograms of each target compound were collected in order to establish standard values for the retention indices of the pure components. These results constituted the retention index library. Run-to-run, day-to-day, and column-to-column shifts in retention times must be accounted for in order to apply a retention library to a searching algorithm. We chose to use a standard retention index (shown in Table 2 as I°) for each drug rather than standard retention times (also shown in Table 2) [13, 32]. This method uses neutral primary standards (nitroalkanes) and secondary drug standards (indicated by an asterisk in Table 2) as external retention time references and calculates a corrected retention index (Icorr) based on the difference between the retention time of the target analyte and the primary and secondary standards eluting immediately before and immediately after the peak of interest. The application of the retention index calculation and its effect on the repeatability of between-column and between-day measurements are discussed extensively in Part I of this work [13]. The use of this system greatly improves the reliability of the method and produces a measure of retention that is much less sensitive to perturbations in operating conditions. The reference retention times (and hence, retention indices) of the target compounds were determined from pure component chromatograms, and used for the duration of the study as the reference values.

A matrix containing the spectra for all 47 target compounds was created that constituted the spectral library. The spectra in the spectral library were used as the target spectra as described in section 2.1. A correlation matrix for the spectral library was used to determine the occurrence of similar spectra. The library contained 20 unique spectra, where highly correlated spectra (ρ > 0.98) were considered to be identical. Based on the results of this analysis, each target compound was either assigned to one of ten spectral classes (classes designated by ‘A’–‘J’ in Table 2), or was one of 10 compounds that exhibited a unique spectrum (designated by a ‘U’ in Table 2). Because our method discriminates based on retention indices as well as spectra, the spectral class assignment is only significant when two compounds in the same class are overlapped chromatographically. Such cases affected the calculation of the selectivity of the method (discussed below) and were included in both the training and validation data sets.

4.2. Discriminating Power and Mean List Length

The DP [29] and MLL [28] of the retention index library and the spectral library were calculated using Eqs. (11) and (12) to characterize the selectivity of the method. Compounds with a spectral correlation coefficient greater than 0.98, or compounds with a retention index within ±12 retention index units of one another, were considered indistinguishable by the method. The value of 12 retention index units corresponds to three times the long term (13 months) standard deviation in retention index reported for the LC method in Part I [13]. The error window of three times the standard deviation is consistent with that used by Maier and Bogusz [33]. For our library of 47 compounds, the DP was determined to be 0.94 and the MLL was calculated to be 3.72 based solely on spectral criteria. The DP and the MLL when only retention index is considered are 0.95 and 3.26 respectively. When both the retention indices and the spectra of the compounds are considered, the DP and MLL are improved to 0.997 and 1.255, respectively. The number of indistinguishable pairs (p) drops from 53 to 6 (out of a total 1081 possible pairs) when both data dimensions are included. In the combined case, only those drugs with retention indices within 12 retention index units of one another and in the same spectral class were considered indistinguishable by our method. A MLL value of close to one indicates that on average only one compound will be identified with a given retention index and spectral result; the MLL increases as the number of indistinguishable pairs increases, up to a maximum value of q. Conversely, DP is less than one as long as p is greater than q. DP approaches one as p approaches 0; that is, when all compounds in the library are unique. This ideal situation indicates that each library compound is only indistinguishable from itself.

While the parameters DP and MLL are a reasonable indication of how well a method distinguishes compounds in a library, they are also strongly correlated with q. Therefore the size of the library should be considered when comparing DP and MLL between two different methods. Maier and Bogusz [33] reported a DP and MLL of 0.96 and 4.04 for a library of 56 acidic drugs when considering retention index and UV absorbance maxima. We are reporting a significant improvement in DP and MLL for a similar size library by using all of the spectral information afforded by using the DAD. In comparison, Herzler et al. [12] reported a DP of 0.9999 and a MLL of 1.253 for a library of over 2500 compounds.

4.3. Optimization of algorithm parameters

The TFA algorithm was evaluated based on the production of false positive and false negative results from the test chromatograms. Training samples were used to optimize the algorithm parameters to minimize the occurrence of errors. The parameters that required optimization were the maximum angle requirement for θ (i.e., the maximum angle between the predicted and the target spectra that can be obtained and still considered a match) and the allowed difference between Icorr and . A 32 factorial design was used to evaluate these parameters based on the contingency table results. The levels and the results for the training data set are summarized in Table 3. A sample was considered a true positive (TP) if any of the drug peaks in it were identified correctly; hence the consequence of a TP result is that the sample will go on for confirmatory testing. Blank samples were considered true negatives (TN) if no drug peaks were identified, the consequence being that the sample will not go on for confirmatory testing. In addition to the results for the chromatograms, the samples were analyzed on a per peak basis, where every single drug peak was accounted for. In this analysis, only true positives, false positives, and false negatives were counted. As discussed in Part I [13], the demonstrated long term reproducibility of Icorr was about 1.0% (relative standard deviation), or about 4 retention index units when all factors contributing to retention shift were considered. Accordingly, retention index windows of ±1σ, ±2σ, and ±3σ around the reference value were tested, corresponding to 4, 8, and 12 retention index units, respectively. Thresholds for θ of 5°, 7.5°, and 10° were also tested as part of the experimental design.

Table 3.

Results of 32 factorial experiment for training data set.

Level RI Window Theta (°) SENa SPECa PPVa NPVa SENb PPVb
1 4 10 97% 91% 90% 97% 80% 75%
2 4 7.5 97% 91% 90% 97% 79% 80%
3 4 5 93% 97% 96% 94% 63% 88%
4 8 10 100% 85% 85% 100% 81% 62%
5 8 7.5 100% 85% 85% 100% 80% 70%
6 8 5 97% 91% 90% 97% 64% 83%
7 12 10 100% 76% 78% 100% 83% 48%
8 12 7.5 100% 76% 78% 100% 81% 59%
9 12 5 97% 85% 85% 97% 66% 75%
Validation 4 7.5 92% 94% 94% 91% 69% 77%
a

Sensitivity, specificity, PPV and NPV calculated according to equations 710 by chromatogram.

b

Sensitivity (equation 7) and PPV (equation 9) calculated for individual target compound peaks present.

As seen in Table 3, both the angle θ and the retention index window have a significant effect on the results; all of the parameters (selectivity, specificity, PPV, and NPV) must be taken into account when choosing the appropriate levels. For example, the results for the chromatogram analysis for levels 4, 5, 7, and 8 all show a sensitivity of 100%, indicating that no false negative results were obtained; however, the specificities for these levels are all below 90%, indicating a high rate of false positive results. The last two columns of Table 3 summarize the results of the per peak analysis as described above. Again, levels 4, 5, 7, and 8 have the highest sensitivities; however, all of their PPV values are 70% or lower (as low as 48% for level 7). The consequence of a very low PPV is that many extraneous peaks are identified as being target compounds, which will complicate the application of confirmatory testing methods such as LC-MS where selected ion monitoring or selected reaction monitoring is employed. Level 3 has the highest PPV on a per peak basis; however the sensitivity is low both on the per peak basis and on the chromatogram analysis. Because all of these values should ideally approach 100%, it was logical to choose the level with the highest average over all of the factors for both the chromatogram analysis and the per peak analysis.

The level with the highest average was level 2; this level did not have the highest value for any of the factors, however a compromise was made to select the level with the best overall performance. By applying the factorial design we determined that an unknown compound must have a value of θ less than 7.5° and an Icorr within 4 retention index units of the target compound in order to be considered a match. This requirement for Icorr corresponds to a window of approximately ±1%, which is consistent with the results reported in Part I [13]. The parameters determined were applied to the validation set, and the 2 × 2 contingency table for the validation set is shown in Table 4. The fact that the validation results were consistent with the training sets confirms that the parameters applied to the method are robust and not dependent on the data being analyzed.

Table 4.

2 × 2 contingency table for the validation data set, where TP = true positive, FP = false positive, FN = false negative, and TN = true negative.

Drugs present Drugs absent Row totals
Positive testa (confirmatory testing) TP = 33 FP = 2 R1 = 35
Negative testa (no confirmatory testing) FN = 3 TN = 32 R2 = 35
Column totals C1 = 36 C2 = 34 70
a

The test results indicate whether confirmatory testing will be done (positive test) or not (negative test)

Too many false positive results indicate that the screening method is not useful in eliminating samples from consideration that do not need confirmatory testing, which will negate the increase in throughput obtained by using the screening method. A simple calculation can confirm whether or not the screening method is significantly increasing the throughput of the laboratory. The total analysis time without any screening method (Ttotal) is equal to the number of samples times the cycle time of the confirmatory method. That is, all samples are evaluated using a longer confirmatory method. Using the screening method on all samples and the confirmatory method on only those samples that test positive, the total analysis time (Ttotal,screen) is the total number of positive samples (TP + FP) times the cycle time of the confirmatory method plus the total number of samples time the cycle time of the screening method. When Ttotal = Ttotal,screen, the screening method is no longer advantageous in saving time. The factor that determines this crossover point is the ratio of total positive samples (TP + FP) to the total number of samples. For the validation data set analyzed in this work, the cycle time of the long method was assumed to be 30 minutes (typical for a GC/MS analysis), the cycle time of the screening method was 4 minutes (including data analysis time), and the total number of samples was 70. The total number of positive screening tests for the validation data set was 35, giving a positive rate of 50%. At this rate, where Ttotal,screen/Ttotal = 0.63, use of the screening method nearly doubles the throughput of the laboratory.

In order to carry out the data analysis of the chromatograms more efficiently, the peak integration tables obtained from the data acquisition software were used to identify the regions of the chromatograms that should be target tested. This method allows the direct comparison of retention times found in the data collection software, converted to Icorr, to the standard retention indices of the target analytes and eliminates the need for the analysis algorithm to identify retention times. A disadvantage to this method, particularly for Hewlett Packard/Agilent LC users utilizing Chemstation, is that a single wavelength must be chosen to do the integration. In other software, such as Waters Millennium 32®, maximum absorbance plots can be used, possibly allowing lower detection limits for compounds that have their λmax at different wavelengths. Another potential problem occurs when peaks are overlapped, causing the observed retention time to be shifted from the expected retention time.

We have also explored the application of window target testing factor analysis (WTTFA), introduced by Lohnes, et al. [26]. This algorithm is as effective as ours in identifying drug peaks and resolving overlapping peaks (results not shown); however, it is significantly more taxing on the computer processor and therefore requires much more computing power in order to be applicable as a fast data analysis method. The analysis of the training data set (70 samples) chromatograms took hours rather than minutes. We determined that it was much more efficient to analyze sections of the chromatogram determined by the retention times in Chemstation, which did not compromise sensitivity or selectivity results.

4.4. Examples of Results from Data Sets A, B, and C

Several different data sets were analyzed to evaluate the efficacy of the method (summarized in the Experimental section and Table 1). Data set A contained chromatograms with various mixtures of overlapping compounds, including drug/metabolite pairs, pairs with the same or similar spectra, and pairs with different spectra. This data set also contained individual chromatograms of the drugs of interest. Because the presence of overlapping peaks in a chromatogram may shift the apparent retention time (and hence Icorr) that is reported, chromatograms with highly overlapped peaks were included to assess the effect of this phenomenon on the results. Analyzing these sets of partially and highly overlapped target compounds allowed us to determine (a) if the TFA algorithm was successful at resolving the spectra of the overlapped peaks, and (b) what the effect on the allowable retention index window would be.

An example of one of the chromatograms from data set A is shown in Fig. 1. The single wavelength chromatogram in Fig. 1 illustrates the issue of overlapping or indistinguishable retention indices for drugs with different spectra. This peak is actually three target compounds, which is not evident from the single wavelength chromatogram. The sample contained ephedrine ( = 246.4), codeine ( = 245.9) and methcathinone ( = 248.4). The calculated Icorr for this peak was 244.6. Two issues are apparent here: that of the observed spectrum and the retention index. The observed spectrum for this peak is a linear combination of the three spectra, which the TFA algorithm can resolve and individually identify. Without this step, traditional library searching algorithms (such as those available in Chemstation) would not be able to identify any of the target compounds in this peak due to its distorted (relative to the library) spectral profile. However, it is also clear that the presence of overlapping peaks within a chromatogram will degrade the quality of a match to standard retention index values (). Codeine and methcathinone were easily identified in this chromatogram. Although in this case the retention indices were all within 4 RI units of , the spectrum of ephedrine was not identifiable even using the TFA algorithm. The spectrum of ephedrine (spectral class E, cf. Table 2) does not have any maxima at selective wavelengths and thus the distortion due to the overlap with the other compounds is too much even for the TFA algorithm to resolve. Using a larger threshold for θ will allow the identification of all three drugs; however, as discussed, using too large of a threshold creates too many false positive results.

Fig. 1.

Fig. 1

Example chromatogram at 210 nm from data set A illustrating overlapping peaks. This chromatogram contains ephedrine (I° = 246.4), codeine (I° = 245.9) and methcathinone (I° = 248.4). The calculated Icorr for this peak is 244.6.

Data set B was designed to determine if the TFA algorithm could detect low intensity peaks in a chromatogram, particularly in the presence of a changing baseline typically seen in gradient chromatography and in the presence of overlapping peaks from a blood matrix. Oxycodone, zolpidem, and amitriptyline were chosen as test compounds for this data set as these three drugs have retention times at the beginning, middle and end of the gradient. A range of concentrations was tested from 0.2 μg/mL to 20 μg/mL in both “clean” matrix and spiked in blood matrix. Chromatograms of these drugs in both matrices at 0.2 μg/mL (the lowest level tested) are shown in Fig. 2. In the clean matrix sample (Fig. 2B), the amitriptyline (peak 3) is severely overlapped with a gradient background peak that is most likely due to a mobile phase impurity. Although the major gradient background peak in Fig. 2B is atypical (i.e., in practice fresh mobile phase would be prepared to eliminate the large background), we include the analysis of this chromatogram here because it showcases the power of the TFA algorithm in the analysis of these chromatograms. Particularly with low abundance components, such an overlap can distort the apparent spectrum of the peak relative to the library spectrum. This distortion is illustrated in Fig. 2C, where the library spectrum of amitriptyline (dashed line) is overlaid with the actual spectrum from the apex of peak 3 (solid line) in Fig. 2B. Using the TFA algorithm resolves the component from the background and allows identification of the peak where traditional library search algorithms would fail. The spectrum of the background (and any other interfering peaks) is resolved by SVD and therefore the TFA algorithm can readily distinguish drug peaks from background without any need for background subtraction. A comparison of Figs. 2A and 2B also shows the significant shift in retention time that can occur over long time scales and on different instruments. The chromatogram in Fig. 2A was collected on the two- pump system described in Part I, and the chromatogram in Fig. 2B was collected on the one- pump system [13]. The difference in retention times between the amitriptyline peaks (peak 3 in both figures) is clear; however, the application of the retention index method gave consistent Icorr values for both peaks.

Fig. 2.

Fig. 2

Oxycodone (peak 1), zolpidem (peak 2), and amitriptyline (peak 3) in (A) blood matrix, and (B) clean matrix at 0.2 ug/mL. (C) Comparison of the library spectrum (dashed line) of amitriptyline and the spectrum of peak 3 in the chromatogram in 2B (solid line).

The third data set (data set C) evaluated included chromatograms of mixtures of amphetamine, methylenedioxyamphetamine (MDA), hydrocodone, and zolpidem. Amphetamine, MDA, and hydrocodone are highly overlapped peaks, as evident by their retention times in Table 1. A single wavelength chromatogram (210 nm) is shown in Fig. 3. In this case only one peak was integrated from Chemstation, and thus the retention indices of the two earlier peaks were significantly different from the library values. If a large enough error window was allowed, all three compounds could be identified but under the final chosen conditions (± 4 retention index units) only MDA and hydrocodone were identified in most of the chromatograms in this data set. The sample was still flagged for confirmatory testing, however, so this result can still be considered a true positive. Different concentration ratios of these three drugs were evaluated to determine what the effect on the reported retention index would be, and what the allowed deviation between Icorr and should be in order to identify the drugs. While an error window for Icorr could be applied that would identify all of the drugs in these mixtures, such a large window adversely affected the other results and resulted in too many false positives.

Fig. 3.

Fig. 3

Single wavelength chromatogram (210 nm) selected from data set C containing amphetamine, MDA, and hydrocodone.

Finally, in addition to the drug mixtures discussed, all three data sets contained “blank” chromatograms. These chromatograms included blanks of the buffer used in the mobile phase, certified drug-free blood and urine blanks, blood samples obtained from volunteers at the Minnesota BCA, and samples that contained compounds not included in the library. The blank samples were used in determining the rate of false positive results obtained by our method.

4.5. Application of ALS algorithm to selected data

As discussed, and illustrated in Figs. 1 and 3, areas of high peak overlap can cause problems in identifying target compounds based on spectra or retention index, even when the demonstrated reproducibility of the method is very good (1% relative standard deviation). The application of a curve resolution algorithm with flexible constraints (ALS) [21] can alleviate some of these issues by resolving the retention profiles and the spectral profiles and allowing the calculation of a more accurate Icorr for the individual components. Also the resolved spectral profiles often show a better match to the target spectrum than simply using the linear combination of the SVD abstract spectra as in the TFA algorithm.

Although the application of the curve resolution technique would potentially improve the overall success of the screening method, the limitation lies in the difficulty of automating the ALS algorithm to analyze a large number of samples at once. ALS would need to be applied to each section of the chromatogram where a target drug was detected, and it would be necessary for the analyst to individually examine the results obtained from the curve resolution algorithm for each sample to determine the appropriate set of constraints and initial estimates that best fit the data. The added time required to carry out the ALS algorithm would negate the potential of the method for high-throughput screening. For this reason, it is not yet feasible to include the ALS analysis as part of the fast screening method.

5. Conclusions

We have shown in Part I [13] of this work that fast gradient chromatography can be used as an effective screening method for drugs of abuse in blood and urine. The total cycle time for a gradient LC-DAD method was reduced to less than four minutes for the traditional system and less than 3 minutes for the modified instrument, which represents a significant improvement over previous work [12]. In addition, the application of a corrected retention index calculation for drug identification allows the comparison of new samples to a library and compensates for day-to-day and column-to-column variability in retention times.

Using the data obtained on the fast LC-DAD method described in Part I [13] in combination with a TFA algorithm facilitates the identification of those samples that require confirmatory testing. Our method showed a low rate of false positive and false negative results, with a validation data set resulting in a sensitivity of 92% and a specificity of 94%. The TFA algorithm uses spectroscopic information to enhance the resolution of highly overlapped chromatographic peaks, which means that complete chromatographic resolution of all the drugs in the library is not necessary. By requiring both spectral and retention index matches, the method more effectively identifies samples that contain drug peaks, and can specify which drugs are present. The ability to more specifically identify the target compounds present in a sample is a key improvement in our method relative to traditional immunoassays. Knowledge of the potential analytes will make the application of a confirmatory method such as LC-MS more efficient by allowing the analyst to tailor the detector parameters for the compounds found in the screening phase.

Future work in this area will involve the inclusion of a multivariate curve resolution algorithm (such as ALS) in order to more accurately identify the retention times of overlapped peaks. We would also ultimately like to incorporate our data analysis method as a macro into mainstream data acquisition software packages, eliminating the need for independent Matlab analysis by forensic toxicologists.

Acknowledgments

The authors wish to acknowledge funding received from the Research Corporation (grant # RA-0344, SCR), and the National Institutes of Health (grant # 5R01GM054585-09, PWC). This work was also funded by the National Institute of Justice (PWC), through the Midwest Forensics Resource Center at Ames Laboratory under interagency agreement number 2002-LP-R-083. The Ames Laboratory is operated for the US Department of Energy by Iowa State University, under contract No. W-7405-Eng-82. The authors also graciously acknowledge Kate Fuller from the Minnesota BCA for her assistance with the blood extraction methods and Glenn Hardin from the Minnesota BCA for his assistance in developing the library of compounds used in this study.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.Kroener L, Musshoff F, Madea B. J Anal Toxicol. 2003;27:205. doi: 10.1093/jat/27.4.205. [DOI] [PubMed] [Google Scholar]
  • 2.Cooper G, Wilson L, Reid C, Baldwin D, Hand C, Spiehler V. J Forensic Sci. 2005;50:928. [PubMed] [Google Scholar]
  • 3.Molina DK, Dimaio VJ. Am J Forensic Med Pathol. 2005;26:303. doi: 10.1097/01.paf.0000188089.10062.f4. [DOI] [PubMed] [Google Scholar]
  • 4.Pragst F, Herzler M, Erxleben T. Clin Chem Lab Med. 2004;42:1325. doi: 10.1515/CCLM.2004.251. [DOI] [PubMed] [Google Scholar]
  • 5.Laloup M, Tilman G, Maes V, De Boeck G, Wallemacq P, Ramaekers J, Samyn N. Forensic Sci Int. 2005;153:29. doi: 10.1016/j.forsciint.2005.04.019. [DOI] [PubMed] [Google Scholar]
  • 6.Maurer HH. Clin Chem Lab Med. 2004;42:1310. doi: 10.1515/CCLM.2004.250. [DOI] [PubMed] [Google Scholar]
  • 7.Stimpfl T, Vycudilik W. Forensic Sci Int. 2004;142:115. doi: 10.1016/j.forsciint.2004.02.014. [DOI] [PubMed] [Google Scholar]
  • 8.Thieme D, Sachs H. Anal Chim Acta. 2003;492:171. [Google Scholar]
  • 9.Kudo K, Tsuchihashi H, Ikeda N. Anal Chim Acta. 2003;492:83. [Google Scholar]
  • 10.Kratzsch C, Tenberken O, Peters FT, Weber AA, Kraemer T, Maurer HH. J Mass Spectrom. 2004;39:856. doi: 10.1002/jms.599. [DOI] [PubMed] [Google Scholar]
  • 11.Mueller CA, Weinmann W, Dresen S, Schreiber A, Gergov M. Rapid Commun Mass Spectrom. 2005;19:1332. doi: 10.1002/rcm.1934. [DOI] [PubMed] [Google Scholar]
  • 12.Herzler M, Herre S, Pragst F. J Anal Toxicol. 2003;27:233. doi: 10.1093/jat/27.4.233. [DOI] [PubMed] [Google Scholar]
  • 13.Stoll DR, Paek C, Carr PW. J Chromatogr A. 2006 doi: 10.1016/j.chroma.2006.10.017. submitted. [DOI] [PubMed] [Google Scholar]
  • 14.Pellett J, Lukulay P, Mao Y, Bowen W, Reed R, Ma M, Munger RC, Dolan JW, Wrisley L, Medwid K, Toltl NP, Chan CC, Skibic M, Biswas K, Wells KA, Snyder LR. J Chromatogr A. 2006;1101:122. doi: 10.1016/j.chroma.2005.09.080. [DOI] [PubMed] [Google Scholar]
  • 15.Van Gyseghem E, Jimidar M, Sneyers R, Redlich D, Verhoeven E, Massart DL, Vander Heyden Y. J Chromatogr A. 2005;1074:117. doi: 10.1016/j.chroma.2004.05.034. [DOI] [PubMed] [Google Scholar]
  • 16.Schellinger AP, Stoll DR, Carr PW. J Chromatogr A. 2005;1064:143. doi: 10.1016/j.chroma.2004.12.017. [DOI] [PubMed] [Google Scholar]
  • 17.Schellinger AP, Carr PW. J Chromatogr A. 2006;1109:253. doi: 10.1016/j.chroma.2006.01.047. [DOI] [PubMed] [Google Scholar]
  • 18.Schellinger AP, Stoll DR, Carr PW. J Chromatogr A. 2006 submitted. [Google Scholar]
  • 19.Smilde A, Bro R, Geladi P. Multi-way Analysis with Applications in the Chemical Sciences. John Wiley & Sons, Ltd; Hoboken, NJ: 2004. [Google Scholar]
  • 20.McCue M, Malinowski ER. Appl Spectrosc. 1983;37:463. [Google Scholar]
  • 21.Bezemer E, Rutan SC. Chemom Intell Lab Syst. 2002;60:239. [Google Scholar]
  • 22.Tauler R. Chemom Intell Lab Syst. 1995;30:133. [Google Scholar]
  • 23.Malinowski ER. Factor Analysis in Chemistry. John Wiley & Sons, Inc; New York: 1991. [Google Scholar]
  • 24.Otto M. Chemometrics. Wiley-VCH; New York, NY: 1999. [Google Scholar]
  • 25.Huber L, George SA. Diode Array Detection in HPLC. M. Dekker; New York, NY: 1993. [Google Scholar]
  • 26.Lohnes MT, Guy RD, Wentzell PD. Anal Chim Acta. 1999;389:95. [Google Scholar]
  • 27.Massart DL, Vandeginste BGM, Buydens LMC, de Jong S, Lewi PJ, Smeyers-Verbeke J. Handbook of Chemometrics and Qualimetrics: Part A. Elsevier; New York: 1997. [Google Scholar]
  • 28.Schepers PG, Franke JP, de Zeeuw RA. J Anal Toxicol. 1983;7:272. doi: 10.1093/jat/7.6.272. [DOI] [PubMed] [Google Scholar]
  • 29.Moffat AC, Smalldon KW, Brown C. J Chromatogr. 1974;90 doi: 10.1016/s0021-9673(01)94768-5. [DOI] [PubMed] [Google Scholar]
  • 30.Fuller K. Personal Communication. Minnesota Bureau of Criminal Apprehension; 2006. [Google Scholar]
  • 31.Telepchak MJ, August TF, Chaney G. Forensic and Clinical Applications of Solid Phase Extraction. Humana Press; Totowa, NJ: 2004. [Google Scholar]
  • 32.Bogusz MJ. In: Retention and Selectivity in Liquid Chromatography. Smith RM, editor. Elsevier; New York: 1995. p. 171. [Google Scholar]
  • 33.Maier RD, Bogusz M. J Anal Toxicol. 1995;19:79. doi: 10.1093/jat/19.2.79. [DOI] [PubMed] [Google Scholar]

RESOURCES