Skip to main content
Canadian Journal of Veterinary Research logoLink to Canadian Journal of Veterinary Research
. 2003 Jan;67(1):60–63.

Evaluation of the reliability and repeatability of automated milk urea nitrogen testing

P Arunvipas 1, J A VanLeeuwen 1, I R Dohoo 1, G P Keefe 1
PMCID: PMC227029  PMID: 12528831

Abstract

The purpose of this study was to evaluate the reliability (precision and accuracy), and repeatability of an infrared method of determining milk urea nitrogen. The reference method used for the reliability assessment was a wet-chemistry, enzymatic determination of milk urea nitrogen. Reliability and repeatability, as measured by concordance correlation coefficients, were 0.97 and 0.99, respectively.


The most precise method for detecting milk and serum urea concentration is a wet-chemistry, enzymatic determination. In this procedure, the amount of urea present in the sample is estimated by measuring the change in the pH of the sample following the addition of urease to convert the urea to ammonia. This method has generally been accepted as the gold standard for urea determination in North America (1). However, the economics involved with this method in a commercial, large-scale setting is prohibitive, due to labor included in handling the samples (2). As a result, it has been impractical to measure milk urea concentration in dairy herds on a large-scale basis using wet chemistry.

Recently, rapid methods of milk urea nitrogen (MUN) determination, based on infrared technology, have become available (3). One machine uses infrared technology to produce an estimate of the urea content of milk samples (Fossomatic 4000 MilkoScan analyzer, FOSS4000; Foss North America, Brampton, Ontario). Infrared light is passed through a filter to produce a beam of a specific wavelength for the milk component being measured. The beam is subsequently passed through a milk sample and the amount of light absorbed by the sample is recorded. A computer algorithm then adjusts this estimate of urea concentration for the concentrations of other milk components, known as interfering substances, which are known to also absorb some light at the urea wavelength.

Use of this infrared technology within Dairy Herd Improvement laboratories (DHI) has proven to be cost-effective (4). The same instrument can be used for determination of milk fat, milk protein, lactose, somatic cell count, and MUN content. Therefore, no separate handling of samples is needed. However, when a new method is introduced, it is necessary to evaluate whether the new method can consistently produce the same results derived from the historically accepted method (5). It is important to compare the test results from identical samples tested with both the new test and the gold standard test. It is also important to determine the repeatability of the method, that is, the ability of the method to consistently produce the same results from the same samples upon repeated testing. The Prince Edward Island Milk Quality Laboratory (PEIMQL) recently acquired an addition to their FOSS4000 to analyze milk samples for MUN concentrations. The objective of this study was to evaluate the repeatability, validity, and precision of the MUN values from the FOSS4000 at the PEIMQL. This study is the first component of a large investigation to determine the factors affecting MUN levels in dairy cows.

A total of 161 composite milk samples from individual cows were selected by PEIMQL staff to represent a large number of herds, and a broad range of MUN concentrations. They were selected during 5 different time periods of the year (June 1999, August 1999, October 1999, February 2000, and May 2000), with approximately 32 samples per time period. Each sample was preserved with a bronopol tablet (6 mg: 2-Bromo-2-Nitro-Propane-1,3 Diol/tablet — 1 tablet per sample) to inhibit the growth of bacteria and yeast. Each milk sample was split to create 2 identical replicate samples. One of each of the duplicate samples was analyzed for MUN (mg/dL) using the infrared FOSS4000 according to standard procedures (FOSS4000; Foss North America) at the PEIMQL. The second duplicate sample was analyzed for MUN (mg/dL) using standard procedures from an enzymatic test (Eurochem CL 10; Foss North America) at the laboratory of the Ontario Dairy Herd Improvement Corporation (ODHIC). Staff members at the ODHIC lab were blinded to the results from the PEIMQL. There was only 1 d between the testing of the 2 methods.

Descriptive statistics for the MUN results of 161 samples, analyzed by each method, were calculated. Significant differences between MUN mean values and standard deviations were determined using the paired t-test. A scatter plot was created by plotting the MUN results from the infrared assessment against those from the enzymatic method, and a regression line was fit to the data (Figure 1). A line of perfect agreement (hatched line at 45° and intercept zero) was imposed on this figure.

graphic file with name 9FF1.jpg

Figure 1. Scatter plot of milk urea nitrogen (MUN) results from the enzymatic method (CL10) and infrared method (FOSS4000).

Agreement between the 2 tests was assessed using 2 statistical methods. A concordance correlation coefficient (CC) was calculated to determine the overall level of agreement between the 2 methods of measuring MUN. The CC is computed as the product of a measure of accuracy (the bias correction factor, BCF), and a measure of precision (the Pearson correlation coefficient, PCC) (CC = BCF × PCC) (5,6). The BCF measures how close a regression line through the data points falls to a line of perfect agreement (45° line), while the PCC measures how closely the data points are clustered around the regression line. The second statistical method was the Bland and Altman method of assessing agreement. It plots the mean of the paired measurements (x-axis) against their difference (y-axis). The 95% limits of agreement were computed as the mean difference plus or minus 1.96 times the standard deviation of the difference (6,7). All analyses were carried out using software (Stata, release 6; Stata Corporation, College Station, Texas) (8).

For repeatability testing, a total of 200 composite milk samples from individual cows were selected by PEIMQL staff, again, to represent a large number of herds and a broad range of MUN concentrations. Milk sample splitting and preserving was also conducted on these samples, as described above. All samples were analyzed using the FOSS4000. Both samples were tested in the same batch on the same day, after standardization of the equipment. The PEIMQL staff were blinded to the identity of the samples. The statistical approach used in the previous section was also used here, along with the calculation of the coefficient of variation (CV), a measure of test precision for the infrared machine

As shown in Table I, the MUN concentrations from the infrared method had significantly lower standard deviations, but not significantly different means, than by the enzymatic method (P <0.05). Comparisons, by individual test day, of mean MUN values and their standard deviations, showed no differences (P > 0.05) or trends between the 2 tests. When MUN values from the infrared method were plotted against those of the enzymatic test (Figure 1), points tended to cluster at, or near, the line of perfect agreement. The CC was 0.972 (95% confidence interval = 0.964, 0.980). The BCF measured how far the best-fit deviated from the 45° line (measure of accuracy), was measured at 0.998, which showed that the line of best-fit was very close to the perfect agreement line. The PCC was 0.973, showing very good precision. Using the Bland and Altman method of assessing agreement, the mean difference between the 2 tests was 0.051 mg/dL (standard deviation = 1.18 ). The 95% limits of agreement were −2.29 and 2.19 mg/dL, respectively, indicating that 95% of pairs of results differed by less than, approximately, ± 2.2 mg/dL (Figure 2).

Table I.

graphic file with name 9TT1.jpg

graphic file with name 9FF2.jpg

Figure 2. Difference between the milk urea nitrogen (MUN) concentrations from the enzymatic and infrared methods plotted against the mean value from the 2 methods, with horizontal lines showing the 95% limits of agreement.

The sets of MUN concentrations from the repeatability trial had significantly different means, but not significantly different standard deviations (Table II). When MUN values from the 2 infrared tests were plotted against each other (figure not shown), points clustered even more towards the line of perfect agreement. The CC was 0.983 (95% confidence interval = 0.978 to 0.988). The PCC was 0.988, consistent with excellent precision, and BCF was 0.995 which showed that the best-fit line was very close to the perfect line. The Bland and Altman method of assessing agreement determined the mean difference between the 2 tests to be 0.29 mg/dL (standard deviation = 0.49). The 95% limits of agreement were −0.68 and 1.27 mg/dL, respectively, indicating that 95% of pairs of results differed by less than approximately ± 1 mg/dL. The coefficient of variation for the FOSS4000 was 2.2%, which was excellent.

Table II.

graphic file with name 9TT2.jpg

In this study, we utilized advanced statistical analyses to evaluate a relatively new commercial MUN test procedure against a validated accepted test for MUN, providing an example of appropriate evaluation of diagnostic test procedures. The concordance correlation coefficient between the FOSS4000 and the CL10 was 0.972 (range −1 to 1), which indicated that there was very good overall agreement between the 2 methods. The major axis of the data effectively fell on the line of perfect concordance, reflecting excellent accuracy in the infrared results when compared with the enzymatic method. This CC was higher than that reported by Godden et al (1), who reported a CC of 0.86 (based on 89 samples).

In 2000, D. Lefebvre (personal written communication), conducted a comparison of similar infrared and enzymatic methods at 3 different laboratories: ODHIC, Programme d' Analyse des Troupeaux Laiters du Québec (PATLQ), and Eastern Lab Services. The Foss4000 at the PEIMQL had a smaller mean difference (0.05 mg/dL versus −0.10, 0.07, and −0.54, respectively) and a smaller standard deviation of the differences (1.18 mg/dL versus 1.51, 1.33, and 1.73, respectively) compared to the 3 listed laboratories.

Also in 2000, P. Sauve (personal written communication) evaluated the reliability of 15 infrared MUN analyzers around North America compared to an enzymatic method. Twelve samples were divided into duplicate samples and 1 of each duplicate sample was analyzed by each method. In his study, only 2 of 15 infrared analyzers had standard deviations of the differences that were less than 1.5 mg/dL, and large regional differences in quality control of MUN testing were found. Nine of 15 infrared laboratories had mean differences greater than 0, showing a positive bias in results of the infrared method. However, there was virtually no bias in results from the PEIMQL infrared method compared to the enzymatic method.

The standard deviations within each month were not significantly different, statistically. However, in the whole data set (Table I), the standard deviations were significantly different due to the larger sample size. While this difference was statistically significant, it was small in magnitude and not biologically important.

The reasons for this superior quality control at the PEIMQL are likely numerous, but would likely include such issues as excellent initial calibration and maintenance. All of the main components of milk interfere spectrophotometrically with each other, such that variation in concentration of any 1 component will affect all of the others in the infrared method. A calibration equation is used to correct for this interference. The slope, bias, and coefficient values produced by the interference filter and utilized in this equation are different from instrument to instrument. The quality of the original calibration of the equipment, and its continued maintenance, will determine how well the infrared method can account for, and remove, the effects of these interfering factors, when producing a urea estimate (3). As a result, it is important to compare infrared assessments to reference methods, not to another laboratory which uses the infrared method.

In conclusion, the FOSS4000 infrared MUN analysis at the PEIMQL had excellent reliability (precision and accuracy) when compared with the gold standard enzymatic test method (CL10), and excellent repeatability when duplicate samples were both tested on the FOSS4000. The FOSS4000 at the PEIMQL provides a low-cost method for determining precise, accurate and repeatable MUN results. However, the accuracy and precision information presented from this study should not be extrapolated to MUN data produced by other DHI laboratories, due to potential differences in calibration procedures.

Footnotes

Address all correspondence and reprint requests to Dr. P. Arunvipas; telephone: (902) 566-0995; fax: (902) 566-0823; e-mail: parunvipas@upei.ca

Received January 22, 2002. Accepted May 22, 2002.

References

  • 1.Godden SM, Lissemore KD, Kelton DF, Lumsden JH, Leslie KE, Walton JS. Analytic validation of an infrared milk urea assay and effects of sample acquisition factors on milk urea results. J Dairy Sci 2000;83:435–442. [DOI] [PubMed]
  • 2.Gustaffson AH. Current status concerning the use of milk urea concentration: analytical methods, use in different countries and how to utilize the results. Proceeding of the 30th Biennial Session of ICRA, Velhoven, The Netherlands, 1996:223–228.
  • 3.Grappin R, Packard VS, Ginn RE. Repeatability and accuracy of dye-binding and infrared methods for analyzing protein and other milk components. J Food Prot 1980;43:374–375. [DOI] [PubMed]
  • 4.Gustafsson AH, Carlsson J. Effects of silage quality, protein evaluation systems and milk urea content on milk yield and reproduction in dairy cows. Livest Prod Sci 1993;37:91–105.
  • 5.Lin Li-K. A concordance correlation coefficient to evaluate reproducibility. Biometrics 1989;45:255–268. [PubMed]
  • 6.Steichen TJ, Cox NJ. Concordance correlation coefficient to evaluate reproducibility. Stata Technical Bulletin. 1998:137–143.
  • 7.Bland JM, Altman DG. Comparing methods of measurement: why plotting difference against standard method is misleading. The Lancet 1995; 346:1085–1087. [DOI] [PubMed]
  • 8.Stata Corp. Stata statistical software. [Release 6]. 1999. College Station, Texas, Stata Corporation.

Articles from Canadian Journal of Veterinary Research are provided here courtesy of Canadian Veterinary Medical Association

RESOURCES