Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2011 Apr 1.
Published in final edited form as: J Clin Densitom. 2010 Mar 27;13(2):210–218. doi: 10.1016/j.jocd.2010.01.003

Cross-Calibration and Comparison of Variability in Two Bone Densitometers in a Research Setting: The Framingham Experience

David R Gagnon 1,2, Robert R McLean 3,4, Marian T Hannan 3,4,5, L Adrienne Cupples 1, Mary Hogan 3, Douglas P Kiel 3,4
PMCID: PMC2908922  NIHMSID: NIHMS169298  PMID: 20347371

Abstract

New technology introduced over time results in changes in densitometers during longitudinal studies of bone mineral density (BMD). This requires that a cross-calibration process be completed to translate measurements from the old densitometer to the new one. Previously described cross-calibration methods for research settings have collected single measures on each densitometer and used linear regression to estimate cross-calibration corrections. Thus, these methods may produce corrections that have limited precision and underestimate the variability in converted BMD values. Furthermore, most prior studies have included small samples recruited from specialized populations. Increasing the sample size, obtaining multiple measures on each machine, and utilizing linear mixed models to account for between- and within-subject variability may improve cross-calibration estimates. The purpose of this study was to conduct an in vivo cross-calibration of a Lunar DPX-L with a Lunar Prodigy densitometer using a sample of 249 healthy volunteers who were scanned twice on each densitometer, without repositioning, at both the femur and spine. Scans were analyzed using both automated and manual placement of regions of interest. Wilcoxon rank-sum tests and Bland-Altman plots were used to examine possible differences between repeat scans within and across densitometers. We used linear mixed models to determine the cross-calibration equations for the femoral neck, trochanter, total hip and lumbar spine (L2-L4) regions. Results using automated and manual placement of the regions of interest did not differ significantly The DPX–L exhibited larger median absolute differences in repeat scans for femoral neck [0.016 vs. 0.012, p=0.1] and trochanter [0.011 vs. 0.009, p=0.06] BMD values compared to the Prodigy. The Bland-Altman plots revealed no statistically significant linear relation between the difference in paired measures between machines and mean BMD. In our large sample of healthy volunteers we did detect systematic differences between the DPX-L and Prodigy densitometers. Our proposed cross-calibration method, which includes acquiring multiple measures and using linear mixed models, provides researchers with a more realistic estimate of the variance of cross-calibrated BMD measures, potentially reducing the chance of making a type I error in longitudinal studies of changes in BMD.

Keywords: cross-calibration, densitometer, bone mineral density, DXA, mixed models, Framingham Osteoporosis Study

Introduction

During a longitudinal research study of bone mineral density (BMD), there often arises both the opportunity and the need to replace a bone densitometer. Most notably, advances in technology create an opportunity to increase the precision of BMD measurement, although this is not often realized with current equipment. Upgrading to new equipment often reduces scan time, which enables more subjects to be seen in a shorter time. Finally, acquiring new equipment circumvents the potential problem of manufacturers discontinuing support for older models. When a system is upgraded, even by the same manufacturer, it is likely that the old and new machines are calibrated differently. Thus, cross-calibration is necessary to be confident that observed changes in BMD values between scans on the old and new machines for research study samples are real and not due to the change in machines. According to a recent report from the International Society for Clinical Densitometry (1), “although cross-calibration of DXA systems is possible using in vivo procedures, it is impractical for most DXA centers.” It has thereby been recommended that clinicians establish a new baseline on the newer machine and not rely on the old machine's values. This method can, however, be problematic for clinicians as it disregards the important information present in the historical results with respect to trends over time and possible abrupt changes in bone loss patterns. If clinicians wish to monitor their patients' BMD utilizing measures from the old and new machines after a technology upgrade a cross-calibration must be performed. Recently, Shepherd et al. have developed the “generalized least significant change” (GLSC) method of cross-calibration for clinical applications. (2,3)

While the above recommendations may be suitable for a clinical setting, they may be inappropriate for longitudinal research studies of large populations, where precise measurements of the actual change in BMD over time are needed for regression modeling (e.g. identification of risk factors) and proper estimation of standard errors is needed to assess the significance of study findings. Thus, if one intends to conduct longitudinal research studies involving more than one device, it becomes necessary to undergo a cross-calibration process that facilitates not only comparison of measurements across densitometers for estimating changes over time, but also calculation of appropriate standard errors.

Most previous cross-calibration studies have been conducted by acquiring one measurement per subject on each densitometer, and have used simple linear regression to obtain a regression equation to correct measures collected on the old machine to match those collected on the new machine. Acquisition of multiple measurements on each machine during a relatively short period of time, however, has advantages over a single measurement. Repeated measurements per subject allow one to quantify both the within- and between-subject variability, facilitating more realistic estimation of the slope and intercept of the cross-calibration equation. Utilizing more than one measurement generally yields more precise estimates with smaller standard errors, giving researchers greater power to examine the association of risk factors with change in bone BMD over time.

Furthermore, the use of simple linear regression to establish cross-calibration corrections often underestimates the variance in the measurements when converting values from older to newer equipment. Simple linear regression assumes that the subjects used in creating the cross-calibration equation are the only subjects that the equation will be used on. Random effect models and mixture models take into account that the subjects in the cross-calibration experiment are only a sample from a larger universe of subjects on whom the equation may be used. Thus, simple linear regression in this setting will usually underestimate the standard errors in the cross-calibration equation.

Over the course of the Framingham Osteoporosis Study, which began bone density testing in 1988 with a Lunar DP3 scanner, there have been two bone densitometer equipment upgrades during longitudinal follow up of our study's cohorts, each time necessitating a cross-calibration study. During the first upgrade to the DPX-L in 1992, we applied a cross-calibration equation obtained from simple linear regression to older DP3 data to facilitate comparison to measures collected on the new DPX-L system (4). Most recently we upgraded from the DPX-L pencil beam to the Lunar Prodigy fan beam densitometer in 2001. Many centers using Lunar devices have made similar upgrades in recent times, and three previous in vivo cross-calibration studies suggest that there may be differences in BMD measures between these two devices, yet results for all bone sites have not been consistent (5-7). Furthermore, these previous studies were limited in that they all used small samples selected from specialized populations, collected only single scans on each densitometer, repositioned subjects between scans on the same machine, and used simple linear regression analysis to compare densitometers.

The goal of our current in vivo cross-calibration study was to obtain multiple femur and lumbar spine scans on both the Lunar DPX-L and Prodigy densitometers from a large sample of non-selected adult volunteers. We further aimed to determine whether there are systematic differences between BMD measures acquired with the DPX-L and the Prodigy devices, taking advantage of statistical methods that utilize random effects models to account for individual subject variability, which have not been previously employed by cross-calibration studies. The results of this study will allow future longitudinal studies of our research population to utilize measures collected on both densitometers.

In addition, we were interested in the effects of different approaches to scan analysis in this context. Densitometry requires proper designation of anatomical areas and while densitometers are capable of automatically choosing an area, the quality control process may sometimes require an operator to manually override the automated placement of the region of interest. In the context of a single densitometer, this quality control generally involves looking at reproduction of the original positioning, as well as examining pairs of scans for placement of the regions of interest. With a subject being positioned on another densitometer, considerations must also be given to the comparison of scans on two machines, which results in a greater chance of detecting errors. Any additional manual adjustment requires considerable effort; thus we were interested in the effect of this additional quality control step upon our cross-calibration process. As a second aim of this paper, we examined results from automated analysis alone and from visual inspection-driven, manual region of interest placement to determine the value of this additional effort.

Materials and Methods

Subjects

Between October 2001 and January 2003, 249 volunteer subjects underwent scans of the proximal femur and lumbar spine using both a Lunar DPX-L densitometer and a GE Lunar Prodigy densitometer. Subjects were recruited via Institutional Review Board (IRB) approved posters and flyers from several sources, including employees, friends and family associated with the Framingham Study, the local medical building housing the densitometer (Framingham, MA) and Hebrew SeniorLife in Boston, a geriatric research and long-term care facility. Additionally, the study was listed in the monthly newsletter of the Harvard Cooperative Program on Aging, a registry of more than 1300 individuals in the Greater Boston area who are interested in participating in research studies, most of whom are over age 60. Subjects who agreed to participate in this study provided signed informed consent. The study was approved by the IRB of Hebrew SeniorLife.

Densitometry

Participants underwent either scans on the DPX-L first, or the Prodigy first, based on convenience and availability of the instrument. On each densitometer, participants had two scans of each skeletal site (femur and spine) performed without repositioning. Thus, each participant was measured a total of eight times with two femur and two spine scans on each of the two densitometers. It is generally recommended that subjects stand up between scans on the same machine to account for variability in positioning between scans over time. This is important for precision assessment of a single machine since the goal is to determine the ability of the machine to reproduce the same measurement on an individual at multiple assessments. The goal of cross-calibration, however, is to estimate what measures on the old machine would have been had they been obtained on the new machine. We did not reposition between scans on the same machine in order to minimize the within-scanner variability. Due to time constraints and limited availability of the bone densitometers, the first 37 of the 249 subjects were measured once at each bone site on both the DPX-L and Prodigy. Thus, the total number of scans available for this study was 1,844. One technician performed all DPX-L and Prodigy scans and each subject was measured on both densitometers on the same day.

Standard positioning for the femur and spine scans on both densitometers followed the manufacturer's recommendations from the DPX-L operator's manual. The right femur was scanned unless there was a history of previous fracture or hip joint replacement. For these individuals, the left side was scanned. DPX-L scans were acquired using the “medium” mode with Lunar DPX-L software version 1.35. Prodigy scans were acquired using enCORE software versions 4.00.145, 5.60.003 and 7.53.002. According to the manufacturer (personal communication), scan acquisition does not differ among these software versions. All Prodigy femur scans were acquired using the “standard” mode. Prodigy spine scans were acquired using the mode automatically selected by the enCORE software (thin, standard, or thick), which is based on an individual subject's combined height and weight.

DPX-L scans were analyzed with Lunar DPX-IQ software version 4.6b and Prodigy scans were analyzed using enCORE version 7.53.002. The regions of interest examined were the neck, trochanter and total hip for the femur, and the L2-L4 for the spine. The DPX-L short-term precision assessment has been previously described for the femoral neck, trochanter and spine, and the % coefficients of variation (CV) were 1.7, 2.5, and 0.9, respectively.(8)

As part of the current study, we assessed the precision of the Prodigy following the International Society for Clinical Densitometry guidelines; we measured 20 volunteers three times at both the femur and spine, with repositioning after each scan. The %CVs for the Prodigy device were 1.8, 2.3, 1.2 and 1.1 for the femoral neck, trochanter, total hip and L2-4 regions, respectively.

Quality control modes

We adopted two quality control approaches to reviewing scans prior to data analysis. The first approach was intended to allow the automated software to place the regions of interest and to only override this analysis when there was a clear error in the placement. Each scan was reviewed independently and without reference to the other scan on the same system or to the scans acquired on the alternate system. We will refer to this approach as the “automated” mode. In the second approach, every participant's first scan performed on the DPX-L was visually inspected by an investigator certified by the International Society for Clinical Densitometry alongside the companion first scan performed on the Prodigy. Re-analysis of scans was undertaken for regions of interest that were judged to differ between scanners. The same procedure was performed for every participant's second scan on the DPX-L and Prodigy. A single technician performed all re-analyses of scans that required alteration of the automated analysis. We will refer to this approach as the “manual” mode.

Covariates

Age was calculated at the time of densitometry measurement. Height, weight, body mass index, and body thickness (anterior/posterior distance measured at the umbilicus with the subject supine on the densitometer) were recorded for each subject on the same day as the BMD measurement.

Statistical Analysis

Evaluation of automated and manual review modes

The impact of automated vs. manual scan modes on comparisons of BMD values between densitometers was assessed with enumeration of the number of scans altered by the extended review process. For each bone site, we calculated the mean difference between BMD measures across densitometers for both the automated and manual analyses. We then calculated the mean, standard deviation and range of the difference in mean BMD between the two analysis modes for all observations.

Comparisons of BMD measures within and between densitometers

With paired observations on each densitometer, we quantified the variability in BMD within each densitometer by calculating the absolute difference between the pair for each densitometer. Given the non-Gaussian nature of the data, we compared the median absolute differences in BMD for each region of interest by densitometer using the Wilcoxon signed rank test.

For each region of interest, we calculated the difference in BMD measures for the two scanners (Prodigy – DPX-L) as the outcome and used linear mixed models to calculate the mean difference in BMD between scanners, and the 95% confidence intervals around the mean difference. This mixed model approach was conducted for both the automated and manual analysis modes. In addition, Bland-Altman plots (9) were produced for the manual analysis data to visually compare the differences in BMD measures (Prodigy – DPX-L) as a function of the mean BMD [0.5*(Prodigy + DPX-L)] for each region.

Cross-Calibration

To determine the equation that will establish the relation between BMD measurements obtained from the two densitometers, we analyzed the repeated scans for each subject using a linear mixed model (PROC MIXED, SAS version 9.1) (10) with subject treated as a random effect and all observations for a given subject assumed to be correlated with a compound symmetry structure. Scans were paired in the order that they were taken on each machine. Measures taken on the newer Prodigy system were considered as the dependent variable and measures on the older DPX-L system were treated as an independent predictor variable; thus our model will translate the older DPX-L measures into those obtained using the Prodigy. Parameter estimates for the intercepts and slopes, their 95% confidence intervals and the correlations between the intercepts and slopes were obtained for BMD measures from the femoral neck, trochanter, total femur and L2-L4 regions.

BMD conversion

To highlight the difference in the results obtained when using the standard (simple linear regression) versus our current method (mixed model regression) of cross-calibration, we chose the median value from the femoral neck BMD distribution collected on the old DPX-L machine and imputed the predicted new Prodigy BMD value a single time using both a simple linear regression model and a linear mixed model.

In our longitudinal studies, we plan to translate the old DPX-L measures to the Prodigy measures several times in order to generate multiple imputed datasets because a singly imputed BMD value does not adequately represent the variability generated by the calibration process. To demonstrate how multiple imputation accounts for this variability, we used the previously chosen DPX-L BMD value to impute 5 new Prodigy BMD values. We obtained the parameter estimates, and their accompanying standard errors, from both the linear regression and mixed-model cross-calibration equations that were used in the above described single imputation. Five different sets of slopes and intercepts were generated using a bivariate normal distribution that used independent random normal[0,1] values and the means and standard errors for the estimated slope and intercept as well as their correlation. The estimates of slope, intercept and correlation were obtained from the linear regression and mixture model cross-calibration equations. These slopes and intercepts from the linear regression and mixture models were then used to impute new Prodigy values from the selected DPX-L BMD value.

Results

Automated vs. manual review modes

Descriptive statistics for all subjects in the cross-calibration study are presented in Table 1. The age range (28 to 86 years) and femoral neck BMD range (0.519 to 1.407 g/cm2) of the study sample was similar to that of the Framingham Osteoporosis Study participants (age range 29 to 87 years; femoral neck BMD range 0.484 to 1.493 g/cm2). When the automated scan analysis was completed, the study population included 52 men and 197 women. Initial review of the 1,844 available scans identified 19 femur scans (16 DPX-L; 3 Prodigy) and 20 spine scans (11 DPX-L; 9 Prodigy) of insufficient quality to allow for valid analysis. These scans were thus excluded from analyses, resulting in a total of 1,805 (903 femur; 902 spine) useable scans. The trochanter (n=1) and total hip (n=2) regions of interest were excluded from scans of subjects that did not include the entire region of interest area, yielding femoral neck, trochanter and total hip data from 903, 902 and 901 scans, respectively. Furthermore, 10 L2-L4 regions of interest were excluded due to the presence of artifacts in the region area, leaving 892 scans with valid L2-L4 regions.

Table 1.

Characteristics of the 249 volunteers participating in the cross-calibration of the Lunar Prodigy to the Lunar DPX-L

Characteristic Mean Standard Deviation
Age (years) 60.3 12.7
Weight (kg) 73.4 16.2
Height (cm) 163.8 8.2
Body Thickness (cm) 22.5 4.1
BMI (kg/m2) 27.3 5.4
Sex (% male) 20.9
DPX-L BMD(g/cm2)a
 Femoral Neck 0.898 0.158
 Trochanter 0.759 0.154
 Total Hip 0.944 0.168
 Spine L2-L4 1.196 0.227
Prodigy BMD(g/cm2)a
 Femoral Neck 0.883 0.153
 Trochanter 0.723 0.152
 Total Hip 0.919 0.166
 Spine L2-L4 1.181 0.223

Abbr: BMD, bone mineral density

a

Values presented are for manual mode analysis

The extended manual analysis mode resulted in 1669 scans that remained unchanged and 136 re-analyzed scans with some BMD values changed. BMD measurements associated with the femur (neck, trochanter and total hip) were different in 45 to 48 scans, while there were 87 scans where the L2-L4 BMD was altered by re-analysis (Table 2). When comparing the differences obtained using the automatic mode to those obtained using the manual mode, the greatest corrections were seen in the lumbar spine and femoral neck (0.099 g/cm2 and -0.124 g/cm2, respectively), though at all sites the mean differences were essentially zero (Table 2).

Table 2.

Difference between BMD values obtained on the DPX-L and the Prodigy when either the automated or the manual mode was used

BMD Site Scans Scans Altered Mean Difference (SD)a Minimum Maximum
Femoral Neck 903 48 −0.001 (0.010) −0.124 0.059
Trochanter 902 45 0.000 (0.001) −0.009 0.014
Total Hip 901 48 0.000 (0.002) −0.010 0.021
Spine L2–L4 892 87 0.000 (0.006) −0.060 0.099

Abbr: BMD, bone mineral density; SD, standard deviation

a

This represents the mean difference between DPX-L and Prodigy BMD (g/cm2) when using the automated mode minus the mean difference between scanners when using the manual mode

Comparison of BMD measures within each densitometer

The median absolute differences in BMD between repeated scans performed on the same densitometer using the automatic mode for various skeletal sites were generally smaller for the Prodigy compared to the DPX-L machine. The DPX–L exhibited larger median absolute differences in repeat scans for femoral neck [0.016 vs. 0.012, p=0.1] and trochanter [0.011 vs. 0.009, p=0.06] BMD values compared to the Prodigy. Differences in total BMD [0.0070 vs. 0.0074] and L2-L4 BMD [0.0110 vs. 0.0107] were less notable.

Comparison of BMD measures between densitometers

Simple differences between paired observations on the Prodigy and DPX-L systems (Prodigy – DPX–L) resulted in a mean difference of -0.0111 for the femoral neck [SD = 0.0386], -0.0311 for the trochanter [SD = 0.0370] and -0.0193 for the total hip [SD = 0.0271] regions and -0.0181 for the L2-L4 [SD = 0.0381] region, Similarly, mean differences in BMD between machines (Prodigy – DPX-L) from the mixed models [BMD measure as a function of densitometer] were consistently negative indicating higher reported values on the DPX-L scanner (Table 3), with the trochanter showing the largest mean difference [−0.031] and the femoral neck showing the smallest mean difference [−0.011]. There did not appear to be any systematic differences in BMD values resulting from automated and manual analysis modes.

Table 3.

Meansa (95% CI) for differences between DPX-L and Prodigy BMD measures (g/cm2) for subjects scanned twice on each densitometer, using automated and manual analysis modes

Region of interest Automated Mode Manual Mode
Femoral Neck -0.010 (-0.013, -0.006) -0.011 (-0.014, -0.008)
Trochanter -0.031 (-0.034, -0.028) -0.031 (-0.034, -0.028)
Total Hip -0.020 (-0.022, -0.018) -0.019 (-0.022, -0.017)
Spine L2-L4 -0.018 (-0.021, -0.015) -0.018 (-0.021, -0.015)

Abbr: CI, confidence interval; BMD, bone mineral density

a

Prodigy – DPX-L from mixed models

Visual inspection of the Bland-Altman plots (Figure 1) showed no readily identifiable patterns of concern for any of the BMD measures, i.e., there was no evidence of differences in dispersion or any non-linear relationship between the difference measures and average BMD values. There was evidence of borderline significant negative slopes in the Bland-Altman plots, most notably in the lumbar spine (β=-0.021, p=0.07) and femoral neck (β=-0.035, p=0.07).

Fig. 1.

Fig. 1

Bland Altman plots comparing (A) femoral neck, (B) trochanter, (C) total hip and (D) lumbar spine (L2-L4) bone mineral density (BMD) measures acquired on the Lunar Prodigy to those acquired on the Lunar DPX-L (reference line at zero). Note that positive values indicate greater values for Prodigy BMD measures.

Cross-Calibration

The parameter estimates from the linear mixed regression models predicting Prodigy BMD measures from DPX-L measures are presented in table 4. These models resulted in significant positive slopes for all BMD sites and significant positive intercepts for femoral neck and L2-L4 spine BMD measures. Intercepts for models of trochanter and total hip, while positive, were not significantly different from zero. Positive slopes and intercepts in these models suggest that significant differences exist at all values of BMD, and that these differences increase with greater BMD.

Table 4.

Parameter estimatesa from the linear mixed model calculating the relation between DPX-L and Prodigy BMD measures for subjects scanned twice on each densitometerb

Region of interest Intercept Estimate (95% CI) Slope Estimate (95% CI)
Femoral Neck 0.032 (0.020, 0.044) 0.958 (0.945, 0.971)
Trochanter 0.009 (-0.002, 0.020) 0.968 (0.954, 0.982)
Total Hip 0.011 (0.002, 0.020) 0.978 (0.969, 0.988)
Spine L2-L4 0.012 (0.000, 0.025) 0.982 (0.972, 0.992)

Abbr: BMD, bone mineral density; CI, confidence interval

a

Mixed model: BMDProdigy01BMDDPX-L

b

Data analyzed using manual mode

BMD conversion

The results of our BMD conversion are presented in Table 5, comparing the single imputation of new BMD values when using simple linear versus mixed regression. The median of the sample distribution of femoral neck BMD collected on the DPX-L was 0.898 g/cm2. Using this value, the simple linear regression model generated a slope and intercept that was used to calculate the new Prodigy BMD value of 0.89245 with a standard error of 0.0009082. The predicted new BMD value from the mixture model was 0.89239, with a slightly larger standard error of 0.0010562.

Table 5.

Example of cross-calibration for a femoral neck BMD value of 0.898 g/cm2.

Intercept
(SE)
Slope
(SE)
Predicted Value
(SE)

Single estimates
 Linear regression 0.03055
(0.00524)
0.95980
(0.00574)
0.89245
(.0009082)
 Mixed regression 0.03175
(0.00609)
0.9584
(0.00668)
0.89239
(.0010562)

Intercept Slope Imputed Value

Multiply imputed data
 Linear regression 0.026015 0.96494 0.89253
0.038292 0.94962 0.89105
0.034483 0.95466 0.89177
0.034499 0.95411 0.89129
0.029693 0.95938 0.89122
 Mixed regression 0.026476 0.96438 0.89249
0.040755 0.94655 0.89076
0.036325 0.95242 0.89160
0.036344 0.95178 0.89104
0.030754 0.95791 0.89096

The results of our multiple imputations of new Prodigy BMD values using both the linear regression and mixture models are listed in Table 5. For the 5 sets of slopes and intercepts generated for each BMD site, the correlations between the intercepts and slopes for both linear regression and mixed models were -0.985, -0.980, -0.985 and -0.983 for the femoral neck, trochanter, total hip and L2-L4 spine, respectively. The variability in the 5 different converted BMD values demonstrates the uncertainty in the cross-calibration process that is not capture by a single imputation.

Discussion

To our knowledge, this is the largest, most comprehensive in-vivo cross-calibration evaluation of the differences between the DPX-L and Prodigy densitometers. Repeated BMD measures acquired with the Prodigy fan beam densitometer had better agreement than those obtained using the DPX-L pencil beam scanner among our large sample of volunteers. Prodigy measures were consistently lower than those obtained using the DPX-L, however there was no evidence that the magnitude of the differences between densitometers varied across the range of BMD values in our sample. Furthermore, manual comparison and re-analysis of paired scans from both densitometers did not significantly change the detected systematic differences.

Considerable effort was expended in extending the quality control/data cleaning beyond what would have been done if only a single densitometer was available. We were also able to make comparisons of multiple scans and perform a manual analysis when visual inspection revealed larger discrepancies. In the context of a cross-calibration study, the goal is to eliminate all possible sources of error in order to isolate the underlying regression equation needed to translate measurements from older to newer densitometer values. If insufficient care is taken in evaluating scans, the random error present in the data values could rise to the level of masking any longitudinal effects when measures from both the old and new densitometers are needed in a study.

In our previous upgrade from a Lunar DP3 to the DPX-L, we used a similar approach of automated versus manual analysis and found in this 1995 study that manual overrides of region of interest placement did result in better agreement between scanners (4). While results of this study show no appreciable difference between our automated and manual analysis modes, it would be difficult for any investigator to advise others not to perform quality assurance procedures to the best of his or her abilities. The effects of the different quality control procedures on the cross-calibration results in Table 4 are consistent, but are not of a large magnitude. Our results show that if a careful visual inspection of scan pairs derived from automated placement of the regions of interest suggests that a manual override is necessary, the additionally expended effort to better match the regions does not significantly influence the difference between scanners. Since manufacturers routinely advocate that technology upgrades result in more precise measurements of BMD, it is heartening that our results suggest that repeated measures acquired on the Prodigy are in higher agreement than those obtained with the DPX-L device.

Our results are consistent with the previous in vivo cross-calibration study by Blake et al. who observed higher measures at the trochanter and spine using a DPX-L device, although femoral neck BMD was similar between the two densitometers (6). Furthermore, similar studies by Oldroyd et al. and Pearson et al. found no differences between the DPX-L and Prodigy at the femoral neck or trochanter (5,7). At the lumbar spine (either L1-L4 or L2-L4), although all three previous in vivo studies found that no correction factor was necessary when comparing the DPX-L to the Prodigy device (5-7), our L2-L4 region cross-calibration equation did have an intercept that was significantly different from zero. Ours is the first study, to our knowledge, to examine the total hip region, which has recently been recommended as the hip region best suited to follow for longitudinal changes in BMD (11).

Inconsistencies among results of cross-calibration studies may be due to a number of reasons. Our population was comprised of a large sample of healthy volunteers, while the earlier studies included 20, 133 and 72 subjects who attended a hospital (7) or metabolic bone clinic (6) for routine densitometry, or who had special diseases (5), respectively. The three previous studies collected only single scans on both the old and new densitometers and utilized simple linear regression to model the relation between BMD measurements on the DPX-L and the Prodigy devices. Thus, our study may better account for the variability in BMD within individual subjects.

Our results form the basis for the cross-calibration that will optimize the use of measures from both densitometers in longitudinal analyses. For future longitudinal analyses of BMD, we can calibrate the earlier DPX-L BMD values to the Prodigy values using the parameter estimates and standard errors from our linear mixed regression models. Our mixed model approach will ensure that the imputed BMD measures reflect the systematic differences in densitometer measurements between machines. This was demonstrated in our example using both linear regression and mixture models to obtain a single predicted new BMD value, where the mixed model resulted in a slightly larger standard error for the converted BMD value.

While these singly imputed new BMD values could be used in a regression analysis, their standard errors do not reflect the variability generated by the calibration process. We showed that multiple imputation can better account for this variability. The variability in the 5 imputed BMD values is used to generate a more appropriate standard error. Proper analysis of these multiple BMD values within a regression model will incorporate the uncertainty in the calibration process in our study results. Thus, in our longitudinal studies we will translate the DPX-L measures to the Prodigy measures several times in order to generate multiple imputed datasets. While Rubin recommends using about five imputed data sets (12), we will experiment with various numbers of imputed data sets to verify the stability of the results. Separate analyses will be conducted for each imputed data set and results will subsequently be combined, using procedures such as SAS PROC MIANALYZE (10), allowing us to make valid statistical inferences that account for the uncertainty in our imputed BMD values.

It is important to distinguish the utility of cross-calibration of densitometers in a research setting vs. a clinical setting. The efforts of Shepherd et al.(2,3) to derive a clinical cut point that would identify a significant change in BMD, accounting for change in technology, is a useful tool in a clinical setting. The “generalized least significant change (GLSC)” essentially dichotomizes a single result as significant or insignificant. This is a reasonable approach for clinicians as the GLSC can be calculated with a relatively small sample of subjects and provides useable, though limited, information for the clinician. The GLSC falls short, however, of what is needed in a research setting. Defining the variable of interest as a dichotomy or threshold limits the statistical power to detect a meaningful change in BMD. Whether used as a predictor or an outcome, a continuous change in BMD is far more useful for researchers.

The ability to properly define the standard error is important for cross-calibration studies. The GLSC incorporates the error into its calculations; however it is not a useful tool in regression modeling. Individual observations must be used in a regression model, in contrast to GLSC, which uses cutpoints of a certain magnitude. Generating multiple data sets with the corresponding imputed observations differing from each other, with the magnitude of the difference reflecting the standard error, is a useable approach and the methods for using this representation are mature and are standard practice in statistical analysis. The data sets generated provide analyzable values that reflect the uncertainty inherent in the calibration process for a research study that models risk factors for changes in BMD when there has been a densitometer upgrade.

In contrast, multiply imputed data sets are not particularly useful in a clinical setting. Singly imputed predicted values with accompanying standard deviations are far easier to interpret in a clinical setting. While one could multiply impute several values for a clinician, one would be reduced to creating confidence bounds on these estimates – something far easier to do with a single value and standard deviation.

Another issue related to standard errors that we have introduced is the use of random effects models to obtain more accurate standard errors. When using simple linear regression to predict what old measures would have been had they been acquired on the new machine, it is assumed that the subjects used in deriving the slope, intercept and their respective standard errors for the cross-calibration equation will be the same subjects for whom new values will be calculated, which is usually not the case for research studies. Thus, the simple linear regression approach will result in an underestimation of the standard errors as it assumes that predictions are being made for subjects in the calibration sample. This may also be important in clinical settings, thus we suggest that the GLSC approach be modified to account for potential underestimation of standard errors. Random effects models incorporate the additional error that should be introduced when using a cross-calibration system outside of the population it was derived with.

Yet another issue related to proper standard error calculation involves the decision made to not reposition subjects between scans. In most longitudinal studies that involve a change in densitometers, subjects are measured once on the old machine at baseline and once on the new machine at follow-up. In this cross-calibration study, subjects were measured twice on each densitometer without repositioning in order to minimize intra-machine variability, thereby allowing us to derive a more precise estimate of inter-machine differences. With repositioning occurring only between machines, we appropriately incorporated the usual repositioning experience that occurs in longitudinal studies. Although multiple scans with repositioning on a single densitometer is recognized as a necessary approach when determining the least significant change for that single machine's assessment of individual BMD changes, our use of multiple scans without repositioning within both the old and new machines was more appropriate for cross-calibration, where the goal is to isolate the inter-machine differences.

The lack of repositioning begs the question as to why the use of phantoms would not be sufficient for this cross-calibration process. Possible differences in the effect of varying soft tissue and bone composition on the cross-calibration equation should be incorporated into the standard error as such error is inherent to the densitometers and not properly accounted for by using a few different phantoms. The use of multiple phantoms in ordinary cross-calibration reflects this source of error, which is not subject-specific.

Another aspect of this study that bears discussion is an examination of precision and variability using this approach. Increasing the sample size of the calibration sample results in greater precision as the standard errors of our estimates will decline with increased sample size. The use of repeated observations also results in increased precision as it allows within-subject variability to be removed from the calibration process. In a clinical setting, the within-subject variability may greatly overshadow the error present in the calibration process, making large calibration samples irrelevant. In a research setting, eliminating the within-subject variability and obtaining precise translations of values is paramount.

The use of the mixed model, in contrast, does not increase precision when compared to simple linear regression. The purpose of using a mixed model approach is to ensure that estimates are not overly precise and that they properly reflect the uncertainty of the estimates introduced by using a separate calibration data set to obtain estimates that will be used on another population. Similarly multiple imputation, in contrast to single imputation, does not increase precision but simply ensures that the uncertainty inherent in the calibration process is correctly reflected in any regression models that are subsequently done. Simple linear regression and single imputation would provide overly-precise estimates, resulting in type I errors.

In conclusion, we detected systematic differences between BMD measures acquired using the Lunar DPX-L fan beam and the Lunar Prodigy pencil beam densitometers. Furthermore, manual comparison and re-analysis of paired scans between densitometers did not significantly change these differences. Our large sample with repeated measures on each machine allowed us to calculate cross-calibration equations that better account for the within-individual variability in BMD measures. Future longitudinal analyses of population-level changes in BMD in our cohorts will utilize these equations, further employing multiple-imputation techniques to ensure that our imputed BMD values take account of the uncertainty in our estimates. This approach to analyzing longitudinal BMD data when a densitometer upgrade occurs may be the most conservative. A more accurate estimate of variance would be expected to reduce the chance for a type I error when examining risk factors for longitudinal change in BMD.

Acknowledgments

Support for this work was provided by the National Institute of Arthritis and Musculoskeletal and Skin Diseases and National Institute on Aging (AR/AG41398), and by the American College of Rheumatology Research and Education Foundation Health Professional New Investigator Award to Dr. McLean.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References

  • 1.The Writing Group for the ISCD Position Development Conference. 2004 Technical standardization for dual-energy x-ray absorptiometry. J Clin Densitom. 7:27–36. doi: 10.1385/jcd:7:1:27. [DOI] [PubMed] [Google Scholar]
  • 2.Shepherd JA, Lu Y. A generalized least significant change for individuals measured on different DXA systems. J Clin Densitom. 2007;10:249–258. doi: 10.1016/j.jocd.2007.05.002. [DOI] [PubMed] [Google Scholar]
  • 3.Shepherd JA, Morgan SL, Lu Y. Comparing BMD results between two similar DXA systems using the generalized least significant change. J Clin Densitom. 2008;11:237–242. doi: 10.1016/j.jocd.2008.02.001. [DOI] [PubMed] [Google Scholar]
  • 4.Kiel DP, Mercier CA, Dawson-Hughes B, Cali C, Hannan MT, Anderson JJ. The effects of analytic software and scan analysis technique on the comparison of dual X-ray absorptiometry with dual photon absorptiometry of the hip in the elderly. J Bone Miner Res. 1995;10:1130–1136. doi: 10.1002/jbmr.5650100719. [DOI] [PubMed] [Google Scholar]
  • 5.Oldroyd B, Smith AH, Truscott JG. Cross-calibration of GE/Lunar pencil and fan-beam dual energy densitometers--bone mineral density and body composition studies. Eur J Clin Nutr. 2003;57:977–987. doi: 10.1038/sj.ejcn.1601633. [DOI] [PubMed] [Google Scholar]
  • 6.Blake GM, Harrison EJ, Adams JE. Dual X-ray absorptiometry: cross-calibration of a new fan-beam system. Calcif Tissue Int. 2004;75:7–14. doi: 10.1007/s00223-004-0169-y. [DOI] [PubMed] [Google Scholar]
  • 7.Pearson D, Horton B, Green DJ. Cross calibration of DXA as part of an equipment replacement program. J Clin Densitom. 2006;9:287–294. doi: 10.1016/j.jocd.2006.02.006. [DOI] [PubMed] [Google Scholar]
  • 8.Hannan MT, Felson DT, Dawson-Hughes B, Tucker KL, Cupples LA, Wilson PW, Kiel DP. Risk factors for longitudinal bone loss in elderly men and women: the Framingham Osteoporosis Study. J Bone Miner Res. 2000;15:710–720. doi: 10.1359/jbmr.2000.15.4.710. [DOI] [PubMed] [Google Scholar]
  • 9.Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet. 1986;1:307–310. [PubMed] [Google Scholar]
  • 10.SAS Institute Inc. SAS/STAT® 9.1 User's Guide. North Carolina: SAS Institute Inc.; 2004. [Google Scholar]
  • 11.Hans DB, Shepherd JA, Schwartz EN, Reid DM, Blake GM, Fordham JN, Fuerst T, Hadji P, Itabashi A, Krieg MA, Lewiecki EM. Peripheral dual-energy X-ray absorptiometry in the management of osteoporosis: the 2007 ISCD Official Positions. J Clin Densitom. 2008;11:188–206. doi: 10.1016/j.jocd.2007.12.012. [DOI] [PubMed] [Google Scholar]
  • 12.Rubin DB. Multiple Imputation After 18+ Years. Journal of the American Statistical Association. 1996;91:473–489. [Google Scholar]

RESOURCES