Abstract
We compared inductively coupled plasma–mass spectrometry (ICP-MS) test results for the analysis of heavy metals (As, Ba, Cd, Hg, Pb, and Se) in pet foods and routine veterinary diagnostic specimens using intralaboratory and interlaboratory comparisons. Four laboratories, 1 principal laboratory and 3 collaborating laboratories, conducted instrument comparison (limit of detection [LOD], limit of quantification [LOQ], and linear dynamic range [LDR] on 24 data sets), in-house method comparison (accuracy and precision on 120 data sets), and interlaboratory comparison (reproducibility on 528 data sets using Horwitz equation analysis). Matrices tested included 2 types of pet food jerky treats (chicken and sweet potato), bovine blood, and bovine liver and kidney. The instrument comparison study confirmed that ICP-MS provided the sensitivity necessary for the analysis of all heavy metals tested at concentrations below the level of concern for routine diagnostic testing. The “in-house” method comparison samples, spiked at low (0.04 µg/g), medium (0.4 µg/g), and high (8.0 µg/g; note: the high validation level spike for mercury was 2 µg/g) concentration levels, indicated that ICP-MS can meet U.S. FDA acceptance criteria for both accuracy (90–105% recovery) and precision (< 6% coefficient of variation). The interlaboratory comparison studies showed that ICP-MS is a reproducible method for the analysis of heavy metals (HorRat value of 0.5–2.0) except for mercury in one laboratory, which used a different sample preparation method (open block rather than microwave digestion). Overall, our study showed that ICP-MS is a reproducible method for the analysis of heavy metals in spite of minor differences in methodology.
Keywords: animal diagnostic specimens, animal feeds, heavy metals, inductively coupled plasma–mass spectrometry, interlaboratory comparison
Introduction
To better address routine and emergency response testing of animal feeds and drugs, as well as diagnostic specimens, existing methodologies must be extended to the testing of new matrices from animal diagnostic specimens. Rapid detection of chemical contamination in animal foods or feeds, either as unintentional contaminants or intentionally introduced adulterants, augments nationwide public health by ensuring the health of food-producing animals and improving human food safety and security. Furthermore, chemical contamination of animal feeds is often a forewarning for contamination of the human food chain as illustrated in the late 2000s by multiple melamine contamination events in both animal and human foods (World Health Organization. Toxicological and health aspects of melamine and cyanuric acid, 2009. Available at: https://www.who.int/foodsafety/publications/chem/Melamine_report09.pdf).2 A survey of veterinary diagnostic laboratories conducted in 2011–2012 by the California Animal Health and Food Safety Laboratory revealed over 50 contamination events nationwide attributed to trace elements and heavy metals such as copper, sulfur, lead, arsenic, zinc, selenium, and phosphorus (Poppenga RH. Chemical and biological feed/drug contamination prevention in veterinary diagnostic laboratories in North America. FDA-VetLIRN Report, Sept 2012). Furthermore, animal feed recalls by the U.S. FDA have included contamination events from zinc, potassium, copper, and molybdenum. Other potentially hazardous contaminants in animal feed and feed ingredients include heavy metals and metalloids such as arsenic, cadmium, chromium, lead, mercury, and selenium.3 Concentrations of cadmium, mercury, and selenium in animal feeds > 10 ppm are considered highly toxic; for barium, cobalt, copper, lead, molybdenum, tungsten, and vanadium, concentrations > 40 ppm are considered toxic.9
Interlaboratory comparisons (ILCs) for existing analytical methodologies for the determination of heavy metal levels in new matrices are needed to better address routine and emergency response testing. There is a critical need to expand the scope of these testing methodologies to animal feeds and diagnostic matrices from animals (blood, fluids, and tissues). The evaluation of method performance with a new matrix is intended to ensure that the method will continue to produce accurate and reliable results. Expansion of test methods for heavy metals in animal diagnostic specimens will address needs for rapid, accurate, and consistent chemical testing across a wide range of diagnostic laboratories when responding to animal food and drug contamination events.
The biggest challenge of implementing consistent heavy metal testing across independent diagnostic laboratories is lack of standardized methods. Diagnostic laboratories across North America routinely test for heavy metals in animal feed and diagnostic specimens by mass spectrometry instrumentation but utilize slightly different methodologies (e.g., differences in sample digestion). Thus, the evaluation that we report herein uses ILC testing (the organization, performance, and evaluation of measurements or tests on the same items by 2 or more laboratories in accordance with predetermined conditions) to demonstrate precision (closeness of agreement between independent test results obtained under specified conditions) across multiple active diagnostic laboratories in North America. The fundamental experimental design used standardized instrumentation (inductively coupled plasma–mass spectrometry (ICP-MS]) and compared testing performance at the instrument, in-house method, and ILC levels from data collected by 4 independent laboratories using homogeneous and stable samples. Performance parameters evaluated for the instrument were limit of detection (LOD), limit of quantification (LOQ), and linear dynamic range (LDR). Performance parameters evaluated for in-house method included accuracy, precision, and repeatability on spiked samples. ILCs were single-blinded studies using a simultaneous participation scheme, measuring reproducibility of test results. We compared specific test results by a formal statistical demonstration of similarity, the Horwitz ratio analysis, an index of method performance with respect to precision.4
An important consideration in assessing the suitability of ICP-MS for veterinary diagnostic applications is the sensitivity of the method relative to concentrations of concern for the various metals in representative sample matrices. Published LOQ values for ICP-MS are frequently an order of magnitude lower than the levels of concern, indicating that the method is sufficiently sensitive for diagnostic purposes.8
Quantitative evaluation of ILC studies requires calculation of both within-laboratory precision (repeatability) and between-laboratory precision (reproducibility). Repeatability in this context reflects the internal precision of a method (i.e., same analyst with the same instruments in the same laboratory). Reproducibility includes both the within- and between-laboratory variances and allows evaluation of the impact of minor changes in the method (i.e., comparison of results obtained by different analysts with different instruments in different laboratories). These parameters must be calculated for each metal–matrix combination analyzed by the participating laboratories (132 data sets per lab for a total of 528 in our study). The AOAC InterLaboratory Workbook (https://www.aoac.org/resources/aoac-interlaboratory-study-workbook-blind-unpaired-replicates-excel-file-xls/) provides a convenient tool for accomplishing each of these calculations.
Materials and methods
Sample acquisition and preparation
Samples of chicken jerky treats were obtained from a large U.S. supplier of pet foods (Milo’s Kitchen; Big Heart Pet Brands, San Francisco, CA); the sweet potato jerky was purchased from a local retail store (Golden Rewards Sweet Potato with Chicken; Walmart, Bentonville, AK). Approximately 1,000 g of sample material was dried at 50–60°C for ~3 d or until the dog treats became brittle. The dried treats were then broken into smaller pieces, either by hand or by placing in a plastic bag and using a hammer to crush. Sample particle size was further reduced (Ultra centrifugal mill; Retsch, Haan, Germany), first to pass a 2.0-mm sieve and then a 0.75-mm sieve.
Samples of bovine liver and kidney were obtained from three 6-mo-old Holstein steers being used as normal control animals (untreated) on a separate research project that were submitted live for autopsy and sedated with intramuscular xylazine followed by euthanasia using the captive bolt method. A postmortem examination of the whole carcass and histologic examination of the collected liver and kidney revealed no significant gross or microscopic lesions. Samples were lyophilized (Freezone benchtop freeze dry system; Labconco, Kansas City, MO) and then processed with the Retsch mill fitted with a 2-mm ring sieve. The sieved samples were then dried at 60°C for 24 h. Complete sample dryness was verified by weighing the entire sample, drying for an additional 24 h, and reweighing to confirm a constant weight. The third diagnostic matrix—bovine blood—was obtained from the steers described above. The blood samples were preserved with sodium citrate and stored under refrigerated conditions. Prior to analysis, the refrigerated blood samples at each collaborating laboratory (CL) were mixed on a rocker table for 2 h.
General experimental design
We followed guidelines for the validation of chemical methods as described in the U.S. FDA Food and Veterinary Medicine (FVM) Program (level 3: multi-laboratory validation; Guidelines for the validation of chemical methods in food, feed, cosmetics, and veterinary products. 3rd ed. FDA, 2019. Available at: https://www.fda.gov/media/81810/download). The validation approach involved 3 phases: instrument performance, in-house method performance, and an ILC. Instrument and in-house method-performance parameters included LDR, LOD, LOQ, method accuracy, and method precision (FDA. Methods, method verification and validation, 2014. Document ORA-LAB.5.4.5. Available at: https://www.fda.gov/media/73920/download). The third phase utilized an ILC study in which the 4 independent laboratories analyzed spiked sample matrices prepared and verified for homogeneity and stability by the principal laboratory. As with the first 2 phases, the ILC study was designed to evaluate the reproducibility and similarity of the established sample digestion and ICP-MS methods in use at each laboratory. Samples were analyzed following protocols in use for routine diagnostic work in the principal laboratory and in each of the 3 CLs. Each laboratory utilized a sample preparation procedure involving either an open block or microwave digestion system followed by metal analysis using ICP-MS (Table 1).
Table 1.
Summary of method parameters used by principal and collaborating laboratories.
| Laboratory | Sample | Sample preparation | Instrumentation |
|---|---|---|---|
| Principal laboratory | 0.25 g (dry weight) –jerky treats –freeze-dried tissues |
Concentrated nitric acid; open vessel digestion block; 70°C 1 h, ramp to 120°C, hold 6 h; dilute with type 1 water and 1% HCl | Agilent 7500 cx ICP-MS, with collision cell: He mode for As and Ba; H mode for Se; no collision cell for Cd, Pb, and Hg |
| 1.0 g (wet weight) –whole blood | |||
| Collaborating laboratory 1 | 0.25 g (dry weight) –jerky treats –freeze-dried tissues |
70% nitric acid + 30% hydrogen peroxide; microwave digestion (MARS system); 100% power, 180°C, 20 min; dilute with 1% HCl, 200 ppb Au in type 1 water | Agilent 7700 ICP-MS, no collision cell |
| 1.0 g (wet weight) –whole blood | |||
| Collaborating laboratory 2 | 0.25 g (dry weight) –jerky treats –freeze-dried tissues |
Jerky treats and tissue*: concentrated nitric acid; microwave digestion (MARS system); 100% power, 180°C, 20 min; dilute with type 1 water Whole blood and Hg in jerky treats and tissues: concentrated nitric acid; TuffTainers in drying oven, 95°C (100°C for Hg), minimum of 4 h; dilute with type 1 water |
Agilent 7900 cx ICP-MS, with collision cell: He mode for As and Ba; H mode for Se; no collision cell for Cd, Pb, and Hg |
| 0.50 g (wet weight) –whole blood (Hg only) | |||
| 1.0 g (wet weight) –whole blood | |||
| Collaborating laboratory 3 | 0.25 g (dry weight) –jerky treats –freeze-dried tissues |
Concentrated nitric and HCl acids + 1,000 ppb Au; microwave digestion (MARS system); 100% power, 180°C, 15 min; dilute with Nanopure water | Agilent 7900 cx ICP-MS, with collision cell: He mode for As and Cd; H mode for Se; no collision cell for Ba, Pb, and Hg |
| 5 g (wet weight) –whole blood |
Jerky treats and tissues stored in 100°C drying oven for 24 h prior to digestion for Hg analysis.
Instrument performance
The LDR for each metal was developed using concentrations that bracket the expected analyte levels routinely encountered in diagnostic laboratories. Instrument linearity was evaluated using the coefficient of determination (r2) for each calibration curve. Further verification of the LDR was accomplished by determining instrument accuracy and precision at concentrations associated with the highest and lowest calibrators.
LODs were determined from the variance associated with analysis of 7 reagent blanks. Statistical significance was evaluated at the 99% confidence level, which closely equates to a signal-to-noise ratio of 3:1 (Wells G, et al. Signal, noise, and detection limits in mass spectrometry. Technical note. Agilent Technologies, 2011. Available at: https://www.agilent.com/cs/library/technicaloverviews/public/5990-7651EN.pdf). LOQs were defined as the level above which quantitative results may be determined with acceptable accuracy and precision. It was recognized that LOQ acceptance criteria are dependent upon the concentration being evaluated. For the purposes of our study, with LOQs of 0.01–0.1 µg/g, acceptable accuracy was defined as spike recoveries of 80–110% and acceptable precision as relative standard deviations (SDs) <11% (FDA. Guidelines for the validation of chemical methods for the FDA FVM Program, 2015. 2nd ed.).
Selectivity is based on the concept of unequivocal identification by means of physical–chemical properties unique to the chemical element. Selectivity for the ICP-MS method was confirmed by conducting measurements at a unique mass-to-charge ratio for each metal. In addition, interferences from isobaric and polyatomic ions, matrix effects, and instrument drift were addressed by a combination of collision cell technology and multiple internal standards covering the same mass range as the elements to be determined (Yamada T, Yamada N. Operating principles of the Agilent Octopole Reaction Cell. Agilent ICP-MS Journal 2002;(13):2. Available at: https://www.agilent.com/cs/library/periodicals/Public/5988_7502ENE_low.pdf).
In-house method performance
In-house method accuracy was established across the linear range by spiking the 5-sample matrices at each of 3 concentrations: a low validation level (LVL) near the LOQ, a middle validation level (MVL) near the mid-point of the LDR, and a high validation level (HVL).
It should be noted that the HVL was selected based on metal concentrations that would be considered critical for interpretation in routine veterinary diagnostic testing rather than the true upper limit of the LDR.8 Concentrations for the LVL, MVL, and HVL were selected at 0.04 µg/g, 0.4 µg/g, and 8 µg/g, respectively (note: the HVL spike for mercury was 2 µg/g). Accuracy was calculated as the percent recovery of the known metal spike in each sample. For in-house method performance, each CL spiked the 5-sample matrices, which were provided to each laboratory by the principal laboratory as described in the sample acquisition and preparation section above.
Method precision was evaluated using replicate analyses of each sample matrix spiked at the LVL, MVL, and HVL. Within-day repeatability (within-day precision) was determined by replicates analyzed on the same day, and between-day repeatability (between-day precision) was determined by replicates analyzed on 3 nonconsecutive days within the same laboratory. Precision was quantified by calculating the coefficient of variation (CV) of the replicates determined under the conditions described above. All accuracy and precision determinations were conducted in triplicate.
Interlaboratory comparison
Samples for the ILC study were prepared by the principal laboratory following standard methods for preparation of a reference material (International Atomic Energy Agency. Development and use of reference materials and quality control materials, 2003. Document IAEA-TECDOC-1350. Available at: https://www-pub.iaea.org/MTCD/publications/PDF/te_1350_web.pdf),1,6 which included both homogeneity and stability testing. As mentioned earlier, the solid samples (jerky, liver, and kidney) were prepared and analyzed on a dry-weight basis to reduce variability associated with sample moisture. This allowed our study to focus on variability introduced by laboratory methods and procedures during the ILC rather than sample variability. Bulk samples of each sample matrix were thoroughly homogenized, spiked with the 6 metals at concentrations near the MVL, and thoroughly homogenized again prior to preparing aliquots for the study. An ICP-MS calibration standard (Inorganic Ventures, Christiansburg, VA) was used as the source of metal spiking. A second calibration standard (AccuStandard, New Haven, CT) was used for the mercury spike. Homogeneity of the samples was evaluated by selecting random 0.25-g aliquots of each bulk matrix, analyzing 5 replicates, and determining the CV of each metal following ICP-MS analysis. Stability of the spiked samples stored at 0–6°C was evaluated by conducting replicate metals analyses in each matrix over a 4-wk period. This time period was consistent with the turnaround time for each laboratory involved in our study.
Each CL received a set of unmarked samples, consisting of 3 spiked and 1 non-spiked samples, for each of the 5 matrices. A standard reference material (freeze-dried bovine liver) was also included in the sample set. This in-house–generated reference material was fully characterized by the principal laboratory using over 5 y of quality control data and trend monitoring. The reference material was periodically verified against a certified reference material (TORT-2 Lobster Hepatopancreas Reference Material for Trace Metals; National Research Council of Canada [NRC-CNRC], Ottawa, Canada).
The ILC samples were presented as unknowns (single-blind) and coded in a random pattern, along with a set of instructions and a reporting template. A subset of analysts in the principal laboratory not involved in the sample preparation received the samples in the same single-blind format. The data remained blind to the analysts at the principal laboratory and CLs and were unblinded only after receipt of results by the principal investigators.
Data compilation and statistical analyses were completed by the principal laboratory. Precision of the analytical methods was measured as both repeatability (within-laboratory precision) and reproducibility (interlaboratory precision). The repeatability relative standard deviation (RSDr), reproducibility relative standard deviation (RSDR), and HorRat ratios were calculated using the AOAC Inter-laboratory Study Workbook for Blind (Unpaired) Replicates (https://www.aoac.org/resources/aoac-interlaboratory-study-workbook-blind-unpaired-replicates-excel-file-xls/). The HorRat ratio, an additional similarity statistic, is the ratio of the RSDR calculated from the data of the laboratory to the RSD predicted from the Horwitz equation, which is a relationship of the mean CV, expressed as powers of 2, with the mean concentration measured, expressed as powers of 10, independent of the determinative method.5 The HorRat ratio is a normalized performance parameter indicating the acceptability of methods of analysis with respect to interlaboratory precision (reproducibility). It is independent of analyte, matrix, and method. The HorRat ratio should be 0.5–2.0 under reproducibility conditions.5
Results
Instrument performance
Linear dynamic range
Instrument validation studies conducted by each laboratory demonstrated acceptable levels of linearity, accuracy, and precision at concentrations that bracket the expected analyte levels routinely encountered in diagnostic laboratories (Table 2). Acceptable linearity was defined as a coefficient of determination (r2) > 0.995 and this was easily achieved by each laboratory (data not presented). In addition, instrument performance was verified by determining the accuracy and precision of reagent spikes (fortified blanks). Acceptable accuracy was defined as recoveries of 80–110% of the known concentration and acceptable precision as RSD < 11% (FDA. Guidelines for the validation of chemical methods for the FDA FVM Program, 2015. 2nd ed.). The accuracy criteria were achieved for concentrations near the LOQ in all cases except for 2 Hg recoveries (113% and 129% for CL 1 and CL 3, respectively) and 1 Cd recovery (111% for CL 1; Table 2). Accuracy and precision near the highest calibrator (upper limit of LDR) were also well within the acceptance criteria, with recoveries of 91–104% and RSDs < 2% (data not shown). It should be noted that the analytical range of ICP-MS can be extended well beyond the highest calibrator concentrations that we investigated. However, it was considered impractical to investigate the upper concentration range far beyond what would be encountered commonly in a diagnostic laboratory.
Table 2.
Inductively coupled plasma–mass spectrometry analytical limits for 6 heavy metals.
| Unit | As | Ba | Cd | Pb | Se | Hg | |
|---|---|---|---|---|---|---|---|
| Principal lab | |||||||
| LDR | µg/L | 0.2–5000 | 0.5–5000 | 0.1–5000 | 0.1–5000 | 0.2–5000 | 10–200 |
| dw LOD | µg/g | 0.0012 | 0.0050 | 0.0002 | 0.0003 | 0.0007 | 0.0086 |
| ww LOD | µg/g | 0.0003 | 0.0013 | 0.0001 | 0.0001 | 0.0002 | 0.0022 |
| dw LOQ | µg/g | 0.008 | 0.020 | 0.004 | 0.004 | 0.008 | 0.400 |
| ww LOQ | µg/g | 0.002 | 0.005 | 0.001 | 0.001 | 0.002 | 0.100 |
| Accuracy at LOQ | % | 98 | 109 | 100 | 97 | 98 | 110 |
| Precision at LOQ | % | 6.2 | 3.1 | 8.6 | 2.7 | 6.5 | 3.2 |
| Collaborating lab 1 | |||||||
| LDR | µg/L | 1.0–2000 | 1.0–2000 | 1.0–2000 | 1.0–2000 | 1.0–2000 | 1.0–2000 |
| dw LOD | µg/g | 0.0320 | 0.0120 | 0.0160 | 0.0120 | 0.0600 | 0.0040 |
| ww LOD | µg/g | 0.0080 | 0.0030 | 0.0040 | 0.0030 | 0.0150 | 0.0010 |
| dw LOQ | µg/g | 0.040 | 0.040 | 0.040 | 0.040 | 0.400 | 0.040 |
| ww LOQ | µg/g | 0.010 | 0.040 | 0.010 | 0.010 | 0.100 | 0.010 |
| Accuracy at LOQ | % | 109 | 100 | 111 | 107 | 99 | 113 |
| Precision at LOQ | % | 3.2 | 4.9 | 5.0 | 5.2 | 5.9 | 4.3 |
| Collaborating lab 2 | |||||||
| LDR | µg/L | 3.85–38.5 | 3.85–38.5 | 0.96–38.5 | 0.96–38.5 | 3.85–38.5 | 0.5–100 |
| dw LOD | µg/g | 0.0038 | 0.0908 | 0.0011 | 0.0062 | 0.0090 | 0.0036 |
| ww LOD | µg/g | 0.0019 | 0.0454 | 0.0005 | 0.0031 | 0.0045 | 0.0018 |
| dw LOQ | µg/g | 0.154 | 0.154 | 0.038 | 0.038 | 0.154 | 0.019 |
| ww LOQ | µg/g | 0.077 | 0.077 | 0.019 | 0.019 | 0.077 | 0.009 |
| Accuracy at LOQ | % | 90 | 93 | 91 | 97 | 93 | NA |
| Precision at LOQ | % | 6.6 | 7.3 | 4.0 | 2.5 | 7.2 | NA |
| Collaborating lab 3 | |||||||
| LDR | µg/L | 0.05–50 | 0.5–50 | 0.05–50 | 0.05–50 | 0.05–50 | 0.01–10 |
| dw LOD | µg/g | 0.0003 | 0.0006 | 0.0001 | 0.0006 | 0.0004 | 0.0001 |
| ww LOD | µg/g | 0.0001 | 0.0003 | 0.0001 | 0.0003 | 0.0002 | 0.0001 |
| dw LOQ | µg/g | 0.002 | 0.020 | 0.002 | 0.002 | 0.002 | 0.004 |
| ww LOQ | µg/g | 0.001 | 0.010 | 0.001 | 0.001 | 0.001 | 0.002 |
| Accuracy at LOQ | % | 103 | 102 | 96 | 106 | 102 | 129 |
| Precision at LOQ | % | 6.1 | 1.1 | 1.1 | 1.7 | 1.2 | 1.0 |
dw = dry weight; LDR = linear dynamic range; LOD = limit of detection; LOQ = limit of quantitation; NA = not available; ww = wet weight. Boldface indicates that result exceeds FDA guidelines.
Limits of detection and quantitation
The minimum metal concentration that could be reliably detected (LOD) ranged from 0.0001 µg/g for Cd to 0.091 µg/g for Ba; the minimum concentrations that could be quantified with acceptable accuracy and precision (LOQ) ranged from 0.001 µg/g for Cd, Pb, and Se, to 0.4 µg/g for Hg (Table 2).
In-house method performance
Method accuracy and precision
The ICP-MS method provided acceptable accuracy, as determined from percent recovery of 5 spiked sample matrices, for the 6 heavy metals across the concentration range covered by the LVL, MVL, and HVL (Fig. 1). The results summarized in Figure 1 were generated by the principal laboratory; similar results were obtained by each of the CLs (data not shown). Acceptance criteria for accuracy were 60–115% recovery at the LVL and 80–110% at the MVL and HVL (https://www.fda.gov/media/81810/download). Twenty-eight of 30 observations in the method accuracy study met the LVL acceptance criteria, with Pb in blood and Hg in kidney and blood as outliers (117% and 57%, respectively; Fig. 1A). Similar results were obtained for accuracy at the MVL (note the tighter acceptance limits) with one As and one Ba result slightly exceeding the limit and one Hg observation below the limit (Fig. 1B). Accuracy studies at the HVL showed one Hg result again below the limit in the kidney matrix (Fig. 1C).
Figure 1.

Within-day accuracy (% recovery) of inductively coupled plasma–mass spectrometry analysis of 6 metals in 5 sample matrices: A. low validation level; B. middle validation level; C. high validation level. Each bar represents the average of triplicate analyses from the principal laboratory. Error bars are 2 SDs (very small). Acceptance criteria are from the FDA Guidelines for the Validation of Chemical Methods in Food, Feed, Cosmetics, and Veterinary Products (https://www.fda.gov/media/81810/download).
Within-day precision was determined by replicate analysis of the 6 heavy metals in each of the 5 matrices in the same analytical run. These results met acceptance criteria at all concentration ranges (CV < 22%, 11%, and 8% for LVL, MVL, and HVL, respectively) for all metals in each of the 5 matrices (Fig. 2A–2C; https://www.fda.gov/media/81810/download). The results summarized in Figure 2 were generated by the principal laboratory; similar results were obtained by each of the CLs.
Figure 2.

Within-day precision (%CV) of inductively coupled plasma–mass spectrometry analysis of 6 metals in 5 sample matrices: A. low validation level; B. middle validation level; C. high validation level. Each bar represents the average of triplicate analyses from the principal laboratory. Error bars are 2 SDs (very small). Acceptance criteria are from the FDA Guidelines for the Validation of Chemical Methods in Food, Feed, Cosmetics, and Veterinary Products (https://www.fda.gov/media/81810/download).
Figure 3.

Interlaboratory comparison results for the principal laboratory (PL) and collaborating laboratories (CLs): A. chicken jerky; B. sweet potato jerky; C. lyophilized bovine liver; D. lyophilized bovine kidney; E. whole bovine blood. Each result bar represents the mean of triplicate analyses, and the error bars are 2 SDs about the mean.
Between-day precision was determined through analysis of spiked chicken jerky and sweet potato jerky on 3 separate days of analysis by pooling data across the 3 analytical runs for the LVL, MVL, and HVL (Table 3). As was the case with the within-day precision measurements, the between-day precision results met the acceptability criteria at each concentration range for all metals in each matrix. In both types of measurements, there was a clear trend of improved precision (lower %CV) as the spike concentration increased from the LVL (nominal value of 0.04 µg/g) to the MVL and HVL (nominal values of 0.4 µg/g and 8 µg/g, respectively).
Table 3.
Between-day precision (%CV) of inductively coupled plasma–mass spectrometry analysis of 6 metals in chicken jerky and sweet potato jerky sample matrices.
| As | Ba | Cd | Pb | Se | Hg | |
|---|---|---|---|---|---|---|
| Chicken jerky | ||||||
| %CV at LVL | 5 | 7 | 2 | 13 | 6 | 20 |
| %CV at MVL | 0 | 1 | 1 | 1 | 2 | 3 |
| %CV at HVL | 2 | 2 | 2 | 2 | 3 | 5 |
| Sweet potato jerky | ||||||
| %CV at LVL | 5 | 14 | 11 | 14 | 10 | 17 |
| %CV at MVL | 4 | 9 | 10 | 9 | 5 | 4 |
| %CV at HVL | 2 | 3 | 3 | 3 | 2 | 2 |
CV = coefficient of variation; LVL, MVL, HVL = low, middle, and high validation level, respectively. Each result represents the average of triplicate analyses pooled from 3 nonconsecutive analytical days (9 observations in total).
Interlaboratory comparison
Prior to conducting the ILC study, both homogeneity and stability of the ILC samples were verified using repeatability measurements. Analysis of 5 replicates of each spiked sample matrix confirmed that the samples were homogeneous with respect to the metals investigated (Table 4). The CV for the replicate analysis was 0.6–1.5% for the spiked chicken jerky and 1.0–3.8% for the spiked bovine blood. In all cases, the repeatability results met the FDA guidelines for method validation (i.e., CV < 11% for results in the concentration range of 1–10 µg/g).7 Stability of the ILC samples was verified by determining the metal concentrations in each spiked matrix over the course of 4 wk (the amount of time required by the CLs to complete the analyses of their ILC sample set). Consistent results (no significant changes in concentration) for each metal in each matrix were confirmed by weekly measurements over the 4-wk period (Table 5).
Table 4.
Homogeneity testing of samples used for the interlaboratory comparison studies.
| As (%) | Ba (%) | Cd (%) | Pb (%) | Se (%) | Hg (%) |
|
|---|---|---|---|---|---|---|
| Chicken jerky | 1.4 | 1.1 | 0.6 | 1.2 | 1.5 | 4.8 |
| Sweet potato jerky | 2.8 | 1.7 | 1.8 | 1.4 | 1.8 | 0.7 |
| Liver | 2.7 | 5.8 | 2.7 | 2.9 | 2.3 | 3.3 |
| Kidney | 2.9 | 3.0 | 2.8 | 3.4 | 2.4 | 3.4 |
| Blood | 1.2 | 1.5 | 1.1 | 1.0 | 3.8 | 2.9 |
Results represent repeatability of inductively coupled plasma–mass spectrometry analysis of 6 metals in 5 sample matrices using 5 replicates expressed as coefficient of variation percent.
Table 5.
Stability testing of samples used for the interlaboratory comparison studies.
| Analysis date | Ba (µg/g) | As (µg/g) | Cd (µg/g) | Pb (µg/g) | Se (µg/g) | Hg (µg/g) |
|---|---|---|---|---|---|---|
| Chicken jerky | ||||||
| Week 1 | 6.3 | 4.9 | 4.8 | 4.2 | 5.5 | 0.43 |
| Week 2 | 7.5 | 5.2 | 4.8 | 4.4 | 5.3 | 0.42 |
| Week 3 | 7.6 | 4.9 | 4.8 | 4.2 | 5.3 | 0.44 |
| Week 4 | 7.3 | 4.9 | 4.4 | 4.1 | 5.6 | 0.43 |
| Sweet potato jerky | ||||||
| Week 1 | 11.0 | 4.9 | 4.9 | 4.5 | 5.6 | 0.44 |
| Week 2 | 12.9 | 5.1 | 4.9 | 4.6 | 5.3 | 0.45 |
| Week 3 | 11.8 | 4.8 | 4.8 | 4.4 | 5.5 | 0.46 |
| Week 4 | 13.1 | 4.9 | 4.5 | 4.4 | 5.7 | 0.46 |
Results represent means of triplicate metal analyses determined at weekly intervals.
Repeatability (RSDr) met the FDA acceptance criteria of < 11% for all metal–matrix combinations except for Cd, Pb, and Se in blood (Table 6). Further investigation indicated that this was the result of relatively high within-laboratory variance for these 3 analyses in 1 of the 3 CLs. Except for these 3 exceedances, the repeatability results were typically ≤5% RSD and confirm that ICP-MS is capable of excellent within-laboratory precision. Reproducibility (RSDR) results exhibited a very wide range, from 2.4% for As in blood to 61.1% for Hg in blood. The poor reproducibility (high RSDR) for Hg between the laboratories is likely the result of differences in preparation methods (open digestion block vs. microwave digestion). It is also important to note that the predicted RSDR is strongly dependent on the concentration of the metal being measured. For example, for metal concentrations of 1%, the predicted RSDR should be near 4%. When the metal concentration decreases to 1 µg/g and to 1 µg/kg, expected RSDR increases to 16% and 45%, respectively.7
Table 6.
Repeatability relative standard deviations (RSDr) and reproducibility relative standard deviations (RSDR) for the interlaboratory comparison studies.
| Matrix/Statistic (%) | Ba | As | Cd | Pb | Se | Hg |
|---|---|---|---|---|---|---|
| Chicken jerky | ||||||
| RSDr | 3.4 | 5.3 | 5.6 | 3.8 | 4.1 | 4.1 |
| RSDR | 10.3 | 20.2 | 9.7 | 21.8 | 9.1 | 13.4 |
| Sweet potato jerky | ||||||
| RSDr | 5.7 | 5.0 | 5.7 | 6.7 | 5.7 | 6.4 |
| RSDR | 6.2 | 15.7 | 7.8 | 27.1 | 9.5 | 11.8 |
| Liver | ||||||
| RSDr | 5.0 | 4.4 | 3.8 | 4.0 | 3.0 | 4.3 |
| RSDR | 5.9 | 11.1 | 4.8 | 5.5 | 9.9 | 52.9 |
| Kidney | ||||||
| RSDr | 2.1 | 2.6 | 2.2 | 2.3 | 2.3 | 1.7 |
| RSDR | 4.5 | 11.8 | 3.7 | 5.2 | 7.7 | 50.5 |
| Blood | ||||||
| RSDr | 2.5 | 1.3 | 16.2 | 12.6 | 19.1 | 3.6 |
| RSDR | 7.4 | 2.4 | 24.7 | 27.7 | 27.8 | 61.1 |
RSDr from triplicate within-laboratory analysis; RSDR from between-laboratory comparison from mean of triplicate within-laboratory analysis.
HorRat ratios for our study met the acceptance criterion in 26 of 30 metal–matrix combinations (Table 7). The HorRat results confirm that the ICP-MS methods provided precise performance among the participating laboratories for analysis of Ba, As, Cd, Pb, and Se. However, method performance was outside the acceptable HorRat range for Hg in 3 of the 5 matrices tested (bovine liver, kidney, and blood). Further evidence of non-precise performance is evident by examining the actual Hg results from each participating laboratory (Table 8). These results clearly show a significant low bias in Hg analysis by the principal laboratory compared to the 3 CLs. This low bias was particularly evident in the liver, kidney, and blood matrices, which were associated with high HorRat ratios (Table 7).
Table 7.
Summary of HorRat values obtained from the interlaboratory comparison.
| Ba | As | Cd | Pb | Se | Hg | |
|---|---|---|---|---|---|---|
| Chicken jerky | 0.86 | 1.6 | 0.76 | 1.7 | 0.73 | 0.72 |
| Sweet potato | 0.57 | 1.3 | 0.61 | 2.1 | 0.76 | 0.63 |
| Liver | 0.45 | 0.86 | 0.37 | 0.42 | 0.80 | 2.9 |
| Kidney | 0.38 | 1.0 | 0.31 | 0.44 | 0.69 | 3.0 |
| Blood | 0.56 | 0.18 | 1.8 | 2.0 | 2.1 | 3.2 |
Data represent results of inductively coupled plasma–mass spectrometry analysis of 6 heavy metals in 5 matrices by 4 laboratories. HorRat value is the ratio of the reproducibility relative standard deviation (RSDR) calculated from the data of the laboratory to the RSD predicted from the Horwitz equation.
Table 8.
Mercury results for 5 sample matrices from 3 laboratories collaborating in the interlaboratory comparison studies.
| Principal laboratory | Collaborating laboratory | |||
|---|---|---|---|---|
| 1 | 2 | 3 | ||
| Chicken jerky | 0.33 | 0.41 | 0.42 | 0.38 |
| Sweet potato jerky | 0.31 | 0.36 | 0.42 | 0.38 |
| Liver | 0.13 | 0.52 | 0.50 | 0.47 |
| Kidney | 0.22 | 0.90 | 0.84 | 0.80 |
Results represent the mean mercury value (μg/g) from triplicate analyses in each laboratory.
Discussion
Our instrument performance study confirmed that ICP-MS provides the sensitivity necessary for the analysis of heavy metals at concentrations that are useful for routine diagnostic testing. The LOQ values obtained in our study are at least an order of magnitude lower than representative critical concentration levels used for interpretation of metal-testing results in liver, kidney, and blood samples (Table 9).
Table 9.
Relationship of limits of quantification (LOQs) to diagnostic levels of concern.
| As (µg/g) | Ba (µg/g) | Cd (µg/g) | Pb (µg/g) | Se (µg/g) | Hg (µg/g) | |
|---|---|---|---|---|---|---|
| Instrument LOQ* | 0.002 | 0.005 | 0.001 | 0.001 | 0.002 | 0.100 |
| Diagnostic levels of concern† | ||||||
| Liver | ||||||
| Bovine | 0.004 | 0.01 | 0.02 | 0.10 | 0.12 | 0.0007 |
| Equine | 0.4 | NA | 0.01 | 0.003 | 0.2 | 0.1 |
| Ovine | 0.01 | NA | 0.02 | 0.03 | 0.15 | 0.01 |
| Kidney | ||||||
| Bovine | 0.018 | 0.01 | 0.05 | 0.20 | 0.4 | 0.008 |
| Equine | 0.4 | NA | 0.05 | 0.002 | 0.35 | 0.1 |
| Ovine | 0.01 | NA | 0.06 | 0.10 | 0.7 | 0.01 |
| Blood | ||||||
| Bovine | 0.03 | 0.01 | 0.001 | 0.01 | 0.06 | 0.1 |
| Equine | NA | NA | 0.003 | 0.01 | 0.16 | 0.01 |
| Ovine | 0.01 | NA | 0.004 | 0.02 | 0.05 | 0.1 |
The values for each metal represent the lowest concentrations considered to be critical for interpretation in routine veterinary diagnostic testing. NA = data not available.
LOQ values were determined by the principal laboratory and are based on a wet-weight basis using a digestion ratio of 1-g sample diluted to a final volume of 10 mL.
Values represent the low end of the normal or marginal range according to Puls 1998.8
The in-house method performance studies of 5 sample matrices spiked at 3 concentration levels indicate that ICP-MS can meet FDA acceptance criteria for both accuracy and within laboratory precision. The interlaboratory comparison studies confirm that, in most cases, ICP-MS is a reproducible method for the analysis of heavy metals providing acceptable results from the 4 independent participating laboratories.
The HorRat ratios ≥2.0 in the interlaboratory comparison for Hg in bovine liver and kidney, and Pb, Se and Hg in bovine blood, prompted further investigation into the protocols used by each laboratory focused on the sample preparation (digestion method) and instrumentation (ICP-MS). Results from the instrument-validation phase of the study indicated satisfactory recovery of Hg at both the LOQ and upper limit of the LDR by the principal laboratory. Similar recovery results were obtained by the 3 CLs, suggesting that the different ICP-MS instruments did not result in significant differences in the analysis of spiked aqueous samples. The most likely cause of the nonacceptable Hg results in the interlaboratory study samples was the different digestion methods used by the participating laboratories. Although the principal laboratory used an open-vessel digestion block, the 3 CLs used closed-vessel microwave digestion. Thus, the low bias is most readily explained as loss of mercury through volatilization in the open vessel digestion system.
The marginally equivalent Pb and Se HorRat ratios in the interlaboratory comparison were attributed to the high within-laboratory variance of Cd, Pb, and Se in blood in one CL. This laboratory was recruited as a CL late in the collaboration, receiving and testing single-blinded, spiked reference material (all 5 matrices) 1 y after the other CLs. Unlike the freeze-dried jerky treats and bovine tissue material, spiked blood was preserved only by storage at 4°C and matrix stability could have contributed to the high within-laboratory and between-laboratory variance with the blood matrix at this particular laboratory.
Acknowledgments
We thank Andriy Tkachenko and Renate Reimschuessel of the FDA Veterinary Laboratory Investigation and Response Network program office for their consultation and constructive evaluation throughout implementation of the project.
Footnotes
Declaration of conflicting interests: The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding: This project was supported by the Department of Health and Human Services, Food and Drug Administration, Research Demonstration Cooperative Agreement (1U18FD005011).
ORCID iD: Timothy Baszler
https://orcid.org/0000-0002-5350-1605
References
- 1. Brookman B, Walker R. Guidelines for in-house production of reference materials. Teddington, UK: Laboratory of the Government Chemist, 1997. [Google Scholar]
- 2. Brown CA, et al. Outbreaks of renal failure associated with melamine and cyanuric acid in dogs and cats in 2004 and 2007. J Vet Diagn Invest 2007;19:525–531. [DOI] [PubMed] [Google Scholar]
- 3. Ekelman KB. Potentially hazardous contaminants in animal feed and feed ingredients. Washington, DC: Food and Drug Administration, 2006. [Google Scholar]
- 4. Horwitz W, Albert R. The Horwitz ratio (HorRat): a useful index for method performance with respect to precision. J AOAC Int 2006;89:1095–1109. [PubMed] [Google Scholar]
- 5. Horwitz W, et al. Quality assurance in the analysis of foods and trace constituents. J Assoc Off Anal Chem 1980;63:1344–1354. [PubMed] [Google Scholar]
- 6. International Organization for Standardization (ISO). Reference materials—good practice in using reference materials. ISO Guide 33:2015. Geneva, Switzerland: ISO, 2015. [Google Scholar]
- 7. Nelsen TC, Wehling P. Collaborative studies for quantitative chemical analytical methods. Cereal Foods World 2008;53:285. [Google Scholar]
- 8. Puls R. Mineral Levels in Animal Health: Diagnostic Data. 2nd ed. Clearbrook, Canada: Sherpa International, 1994. [Google Scholar]
- 9. Rama PJ, et al. Contaminants and toxins in foods and feeds. Int J Environ Sci Technol 2016;2:82–89. [Google Scholar]
