Abstract
The reliability and validity of six experts’ exposure ratings were evaluated for 64 nickel-exposed and 72 chromium-exposed workers from six Shanghai electroplating plants based on airborne and urinary nickel and chromium measurements. Three industrial hygienists and three occupational physicians independently ranked the exposure intensity of each metal on an ordinal scale (1–4) for each worker's job in two rounds: the first round was based on responses to an occupational history questionnaire and the second round also included responses to an electroplating industry-specific questionnaire. Spearman correlation (rs) was used to compare each rating's validity to its corresponding subject-specific arithmetic mean of four airborne or four urinary measurements. Reliability was moderately-high (weighted kappa range=0.60–0.64). Validity was poor to moderate (rs= -0.37–0.46) for both airborne and urinary concentrations of both metals. For airborne nickel concentrations, validity differed by plant. For dichotomized metrics, sensitivity and specificity were higher based on urinary measurements (47–78%) than airborne measurements (16–50%). Few patterns were observed by metal, assessment round, or expert type. These results suggest that, for electroplating exposures, experts can achieve moderately-high agreement and (reasonably) distinguish between low and high exposures when reviewing responses to in-depth questionnaires used in population-based case-control studies.
Keywords: expert assessment, reliability, validity, nickel, chromium, electroplating industry
Introduction
Exposure assessment in population-based studies often requires experts to review study subjects’ responses to questionnaires designed to collect occupational information to provide exposure estimates for use in epidemiologic analyses 1-2. The reliability and validity of these expert ratings are important to characterize because exposure misclassification can mask exposure-disease associations. Several studies have evaluated factors that can affect the reliability of experts’ ratings within population-based studies 3-14. Of these, only five studies evaluated the experts’ validity4, 8-9, 11, 14, Validity studies for industry-based studies are somewhat more plentiful10; for example, one of the earliest evaluated semi-quantitative estimates of methylene chloride and styrene for jobs in a small polyester factory15. A previous review of these studies found that the experts’ validity, based on kappas or intraclass correlation coefficients, varied widely from poor to excellent, with a median of ~0.610. Most validity studies have compared the ratings to airborne measurements despite the fact that multiple routes of exposure are relevant for many agents. To date, we identified only three studies that have evaluated experts’ ratings compared to urinary measurements 11, 16-17.
No study has evaluated the reliability and validity of experts’ ratings in the electroplating industry. Previous studies of electroplating workers have often reported poor/no correlations between air and urine metal concentrations 18-20, although some studies have observed moderate to high correlations (range: r = 0.48–0.96, median: r = 0.68) 21-24. The relationship between air and urinary measurements likely varies due to the extent of dermal exposure, the use of personal protective equipment, and personal behaviors such as smoking that may transfer the contaminants from hand to mouth, as well as the time of day the urinary measurements are collected in relation to the air measurements 21-22, 25-26. Subject-specific variations in uptake and metabolic/excretion rates could also reduce the correlation with the post work shift urinary concentrations. Therefore, our primary objective was to characterize the reliability and validity of experts’ ratings of nickel and chromium exposures within the context of the a case-control study design for workers’ current jobs within an electroplating setting in relation to both airborne and urinary measurements of exposure. Our secondary objective was to evaluate the experts’ ratings in relation to the availability of participants’ responses to two types of questionnaires typically used in population-based studies (an occupational history questionnaire (OH) and an electroplating industry-specific questionnaire (EIQ)) and by type of expert (industrial hygienists vs. occupational physicians).
Materials and methods
Study subjects and self-reported occupational information
We recruited 64 nickel-exposed workers and 72 chromium-exposed workers from six electroplating plants (nickel-exposed workers from plants 1–3; chromium-exposed workers from plants 4–6) in Shanghai, China. Subjects were selected based on their willingness to participate from those who had held their current job for at least six months and were expected to remain in their current job for at least six more months. Because this study was meant to mimic the type of information available to experts within a case-control study, each study subject completed an OH only for his/her current job. The OH included open-ended questions on job title, employment dates, products made or services provided by employer, primary work tasks and activities, tools and equipment used, and chemicals and materials used. Each subject also completed an EIQ for the same job, which asked more detailed questions (predominantly with categorical responses) about specific tasks, time spent in work locations within the plants, proximity to the source of metal aerosols, use of personal protective equipment, presence of ventilation systems and the subjects’ impression of operating efficiencies, and contact with the liquids from the plating tank. The OH and EIQ can be obtained from the corresponding author. Participation was voluntary and undertaken according to protocols approved by the Institutional Review Board of National Cancer Institute and the Shanghai Centers for Disease Control (hereafter, Shanghai CDC).
Air and urinary measurements
Personal airborne and urinary nickel and chromium measurements were collected on four occasions per subject over two seasons (summer and winter), from June 2002 through August 2003. The airborne samples collected total particulates in the workers’ breathing zones on mixed cellulose ester filters (pore size = 0.8 μm) with a 37-mm cassette using a portable sampling pump and analyzed according to National Institute for Occupational Safety and Health (NIOSH) Method 7300 27. Each worker provided one spot (~50 ml) urine sample at the end of the work shift on each measurement day, using NIOSH Method 8310 28. These methods measure all insoluble and soluble forms, and all valences, of aerosols of the respective metals.
Expert exposure ratings
Three industrial hygienists and three occupational physicians affiliated with the Shanghai CDC or the US National Cancer Institute estimated subject-specific average (arithmetic mean) exposure intensity over a work shift to nickel or to chromium (depending on the plant in which the subject worked) for each subject's current job in two separate assessment rounds. The experts were asked to estimate the arithmetic mean because it is generally considered the most relevant metric for chronic health effects29. In the first assessment round, the experts provided ratings based solely on the OH responses (OH round). In the second assessment round, the experts provided ratings based on both the OH and EIQ responses (OH/EIQ round). In each round, each expert independently assigned an intensity rating for nickel or chromium using an ordinal scale of 1 (very low) to 4 (high) without access to the exposure measurements. The categories were not anchored to specific exposure levels. The experts had no prior measurements available for these worksites with which to anchor their assignments, however, the experts were advised to consider the categories as approximately <10%, 10–50%, 51–100%, and <100% of the occupational exposure limit (OEL) in place at the time of the evaluation.
Statistical analyses
To evaluate whether the experts were able to accurately estimate the arithmetic mean exposure of each study subject, all comparisons were made to the arithmetic mean of the subject-specific inhalation and urinary concentrations. The subject-specific inhalation concentration was calculated as the arithmetic mean of the four airborne measurements. Similarly, the subject-specific urinary concentration was calculated as the arithmetic mean of the four urinary measurements. All analyses were conducted using Stata 11.1 (StataCorp, College Station, TX, USA).
Descriptive analyses
Descriptive statistics of the subject-specific inhalation and urinary concentrations were calculated, overall and by plant. The Spearman correlation (rs) between the airborne and urinary subject-specific means was also calculated for each metal, overall and by plant. Intraclass correlation coefficients (ICCs) that indicate the contrast in exposures between subjects (between-subject variance/(between-subject variance+ within-subject variance) were calculated from variance components obtained from random-effects models with subject ID included as the random effect.
Inter-expert reliability
Agreement between each pair of the six experts (15 pairs) was calculated using two metrics: proportion of agreement and weighted kappa (κw). We reported the mean and range of each kappa metric observed across the 15 pairs. To interpret the kappa values, we arbitrarily categorized kappa values <0.2 as poor, 0.2–0.4 as fair, >0.4–0.6 as moderate, >0.6–0.8 as moderately-high, and >0.8 as high based on categories originally proposed by Landis and Koch (1977)30.
Validity of expert ratings
We evaluated the validity of expert ratings (ordinal scale, 1–4) compared to the subject-specific arithmetic means of the airborne and urinary concentrations (used as an approximate the ‘gold standard’ of each subject's average exposure) using the Spearman correlation measure (rs), which is a non-parametric comparison that does not assume a linear relationship between the expert ratings and the measured subject-specific arithmetic means. For each metal and sample media, we report the mean and range of the correlations observed across the six experts. We also calculated the Spearmen correlation coefficient between the subject-specific means for each metal and sample media and the arithmetic mean of the six experts’ ratings (‘group rating’; continuous scale, range 1–4), the mean of the three industrial hygienists’ ratings, and the mean of the three occupational physicians’ ratings. 95% confidence intervals (CI) were calculated based on Fisher's transformation. To interpret the correlations, we used the cut points described above.
Sensitivity and specificity of a two-category scale
The sensitivity and specificity of the group rating compared to the subject-specific mean concentrations of airborne and urinary nickel and chromium exposure were calculated based on a two-category scale. For this calculation, group ratings ≤2.5 (the mid-point of the 4-category scale) and subject-specific means ≤ median were categorized as ‘low exposed’ and group ratings >2.5 and subject-specific means > median were categorized as ‘high exposed’.
The reliability and validity analyses described above were stratified by exposure agent, assessment round, and type of expert. Some analyses, identified in the results, were also stratified by plant. Measures of central tendency (mean, range) are reported because we focused on the performance of a ‘typical’ expert rather than a specific expert.
Results
Measurement data
For nickel, the overall arithmetic mean (AM) of the subject-specific means was 7.4 μg m-3 for air and 30.1 μg g-1 creatinine for urine (Table 1). Plant 1 had the highest airborne nickel AM; in contrast, plant 3 had the highest urinary nickel AM. For chromium, the overall AM of the subject-specific means was 3.0 μg m-3 for air and 76.3 μg g-1 creatinine for urine. Plant 5 had the highest airborne and urinary chromium AMs. Poor correlations were observed between the air and urine concentrations overall for nickel (rs = -0.28) and for chromium (rs = 0.09) and by plant for both metals (rs range = -0.39–0.30).
Table 1.
Descriptive statistics of the subject-specific arithmetic means of airborne and urinary nickel and chromium exposure concentrations and the correlations between the airborne and urinary concentrations, overall and by plant.
| Agent | Plant | N subjects | Air (μg m−3)a | Urine (μg g−1 creatinine)a | Spearman correlation | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| AM | SD | GM | GSD | Range | AM | SD | GM | GSD | Range | ||||
| Nickel | 1 | 30 | 11.7 | 4.4 | 10.9 | 1.43 | 5.5–22.7 | 22.9 | 7.3 | 21.8 | 1.38 | 13.4–38.2 | 0.11 |
| 2 | 23 | 3.5 | 6.7 | 1.9 | 2.37 | 0.76–32.0 | 35.4 | 28.5 | 29.6 | 1.78 | 7.6–155 | 0.06 | |
| 3 | 11 | 3.8 | 1.7 | 3.4 | 1.61 | 1.2–6.9 | 38.7 | 13.1 | 36.9 | 1.37 | 24.3–68.0 | 0.30 | |
| Overall | 64 | 7.4 | 6.4 | 4.8 | 2.72 | 0.76–32.0 | 30.1 | 19.5 | 26.6 | 1.60 | 7.6–155 | −0.28 | |
| Chromium | 4 | 36 | 2.0 | 0.5 | 1.9 | 1.26 | 1.3–3.1 | 65.0 | 30.4 | 59.7 | 1.49 | 28.3–175 | 0.22 |
| 5 | 24 | 4.8 | 5.9 | 3.4 | 2.04 | 1.7–26.2 | 107 | 78.4 | 85.5 | 1.93 | 33.2–344 | −0.05 | |
| 6 | 12 | 2.3 | 1.1 | 2.1 | 1.56 | 1.2–5.0 | 49.6 | 13.1 | 48.1 | 1.31 | 33.7–72.4 | −0.39 | |
| Overall | 72 | 3.0 | 3.6 | 2.4 | 1.71 | 1.2–26.2 | 76.3 | 54.5 | 64.9 | 1.69 | 28.3–344 | 0.09 | |
AM: arithmetic mean; GM: geometric mean; GSD: geometric standard deviation; N: number of subjects; SD: standard deviation.
Descriptive statistics reported here are based on the subject-specific means, calculated as the arithmetic means of the 4 air and 4 urinary measurements collected on each subject.
Between- and within-subject variance components are reported in Table 2, overall and by plant. For airborne concentrations, ICCs ranged from <0.001 to 0.57, indicating poor to moderate contrast in airborne concentrations among subjects overall and within individual plants. For urinary concentrations, ICCs ranged from <0.001 to 0.26, indicating nearly all variability in urinary concentrations was within-subject variability.
Table 2.
Within- and between-subject variance components for airborne and urinary concentrations of nickel and chromium, overall and by plant.
| Agent | Plant | Air | Urine | ||||
|---|---|---|---|---|---|---|---|
| Within-subject variance | Between-subject variance | ICC | Within-subject variance | Between-subject variance | ICC | ||
| Nickel | 1 | 0.775 | 0.273 | 0.260 | 0.922 | <0.001 | <0.001 |
| 2 | 0.638 | 0.704 | 0.525 | 1.07 | <0.001 | <0.001 | |
| 3 | 0.715 | 0.233 | 0.246 | 0.837 | <0.001 | <0.001 | |
| Overall | 0.718 | 0.940 | 0.566 | 0.984 | <0.001 | <0.001 | |
| Chromium | 4 | 0.544 | <0.001 | <0.001 | 0.700 | <0.001 | <0.001 |
| 5 | 0.642 | 0.538 | 0.456 | 0.926 | <0.001 | <0.001 | |
| 6 | 0.445 | 0.349 | 0.440 | 0.457 | 0.163 | 0.263 | |
| Overall | 0.566 | 0.400 | 0.414 | 0.763 | <0.001 | <0.001 | |
ICC, intraclass correlation coefficient (between-subject variance/(between-subject variance+within-subject variance)
Expert reliability
The overall means of the expert ratings were similar for both metals and for both assessment rounds (Table 3). Based on weighted kappa, moderately-high agreement among experts was observed for both metals in the OH (κw: nickel = 0.60, chromium = 0.64) and OH/EIQ (κw: nickel = 0.60, chromium = 0.61) rounds. Agreement was only fair to moderate for both metals and assessment rounds when evaluated based on the proportion of agreement (means 0.47−0.57). For all measures, the agreement was somewhat higher for chromium than for nickel. No differences were observed by assessment round for either metal. The industrial hygienists had, on average, somewhat higher agreement amongst themselves than that observed amongst the occupational physicians for nickel but not for chromium (Supplementary Table S1 and S2).
Table 3.
Measures of agreement by metal and assessment round
| Agent/round | Mean of all Ratingsa (SD) | Agreement between any two expertsb |
|||
|---|---|---|---|---|---|
| Proportion of agreement (%) | Weighted kappa (κw) | ||||
| Mean | Range | Mean | Range | ||
| Nickel | |||||
| OH | 2.7 (1.0) | 47.6 | 28.1–68.8 | 0.60 | 0.08–0.82 |
| OH/EIQ | 2.7 (0.9) | 47.4 | 25.0–67.2 | 0.60 | 0.13–0.85 |
| Chromium | |||||
| OH | 2.6 (0.9) | 55.6 | 34.7–91.7 | 0.64 | 0.48–0.93 |
| OH/EIQ | 2.6 (0.9) | 56.7 | 37.5–79.2 | 0.61 | 0.39–0.97 |
OH, occupational history round; OH/EIQ, occupational history and electroplating industry-specific questionnaire round; SD, standard deviation.
Arithmetic mean of 6 experts × 64 ratings for nickel and 6 experts × 72 ratings for chromium.
Arithmetic mean and range of the given statistic from all 2-expert combinations (n=15) of the 6 experts.
Validity of expert ratings
For nickel, the mean Spearman correlation between each expert's rating in the OH round and the subject-specific AM was -0.30 (rs range: -0.48–0.04) for air and 0.38 (rs range: 0.27–0.47) for urine. For chromium, the mean Spearman correlation between each expert's rating in the OH round and the subject-specific AM was -0.02 (rs range: -0.11–0.06) for air and 0.04 (rs range: - 0.06–0.15) for urine. Similarly poor correlations in the EIQ round were observed (not shown). The Spearman correlations between the subject-specific means and the group ratings are shown in Table 4 for the OH round. Overall, poor to moderate correlations (rs = -0.37–0.46) were observed and did not vary by assessment round or type of expert. However, plant-specific differences were observed. For airborne nickel, we observed a good correlation for plant 1 (rs = 0.70), but poor correlations for plant 2 (rs = 0.15) and plant 3 (rs = 0.19). For urinary nickel, poor correlations were observed in all three plants. For chromium, poor correlations (-0.67– -0.04) were observed for both airborne and urinary concentrations in all three plants. For nickel, the industrial hygienists’ ratings had higher validity than the occupational physicians’ ratings in all plants based on the airborne measurements and in plants 2 and 3 based on urinary concentrations. For chromium, ratings from both types of experts had similarly low validity for both airborne and urinary concentrations.
Table 4.
For the OH round, Spearman correlation coefficients between the group rating (arithmetic mean of the six experts’ ratings) and the subject-specific arithmetic mean of airborne and urinary metal exposures, overall and by plant and type of expert
| Measurement | Type of Expert | Nickel Spearman correlation (95% CI) | Chromium Spearman correlation (95% CI) | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Overall | Plant 1 | Plant 2 | Plant 3 | Overall | Plant 4 | Plant 5 | Plant 6 | ||
| Air | Groupa | −0.37 (−0.56 – −0.14) | 0.70 (0.55 – 0.81) | 0.15 (−0.10 – 0.38) | 0.19 (−0.06 – 0.42) | −0.03 (−0.26 – 0.20) | −0.41 (−0.59 – −0.20) | −0.29 (−0.49 – −0.06) | −0.04 (−0.27 – 0.19) |
| Industrial hygienists | −0.38 (−0.57 – −0.15) | 0.75 (0.62 – 0.84) | 0.38 (0.15 – 0.57) | 0.55 (0.35 – 0.70) | −0.01 (−0.24 – 0.22) | −0.35 (−0.54 – −0.13) | −0.29 (−0.49 – −0.06) | −0.05 (−0.28 – 0.18) | |
| Occupational physicians | −0.30 (−0.51 – −0.06) | 0.62 (0.44 – 0.75) | −0.11 (−0.35 – 0.14) | 0.19 (−0.06 – 0.42) | 0.05 (−0.18 – 0.28) | −0.38 (−0.56 – −0.16) | −0.27 (−0.47 – −0.04) | 0.01 (−0.22 – 0.24) | |
| Urine | Groupa | 0.46 (0.24 – 0.63) | 0.24 (−0.01 – 0.46) | 0.04 (−0.21 – 0.28) | 0.23 (−0.02 – 0.45) | 0.03 (−0.20 – 0.26) | −0.14 (−0.36 – 0.10) | −0.06 (−0.29 – 0.17) | −0.67 (−0.78 – −0.52) |
| Industrial hygienists | 0.46 (0.24 – 0.63) | 0.15 (−0.10 − 0.38) | 0.07 (−0.18 – 0.31) | 0.57 (0.38 – 0.72) | 0.06 (−0.17 – 0.29) | −0.10 (−0.32 – 0.14) | −0.03 (−0.26 – 0.20) | −0.46 (−0.66 – 0.26) | |
| Occupational physicians | 0.43 (0.21 – 0.61) | 0.34 (0.10 – 0.54) | 0.03 (−0.22 – 0.27) | 0.16 (−0.09 – 0.39) | −0.002 (−0.23 – 0.23) | −0.11 (−0.33 – 0.13) | −0.09 (−0.32 – 0.15) | −0.77 (−0.85 – 0.66) | |
CI, confidence intervals based on Fisher's transformation; OH, occupational history.
Industrial hygienists + Occupational physicians.
We explored the relationship between the group ratings and the subject-specific means further visually. Figure 1 shows the scatter plot between the group rating and the subject-specific means of airborne nickel exposure in the OH and OH/EIQ rounds for plants 1, 2, and 3 (Figures 1a–c) and each plant's corresponding distribution of the airborne nickel subject-specific AMs (Figures 1d–f). We found that the distribution of the subject-specific AMs for airborne nickel exposure was much wider and more evenly spread for plant 1 (Figure 1d), where we also observed moderately-high validity, than for plants 2 and 3 (Figures 1e and 1f), where we observed poor validity. For urinary nickel and for airborne and urinary chromium, we observed narrow distributions for the subject-specific AMs in each plant (shown in supplemental Figures S1, S2, and S3) that were similar to the distributions shown for plants 2 and 3 for airborne nickel (Figures 1e and 1f).
Figure 1.
Scatter plot, best-fit linear line, and Spearman correlation statistic between the average rating and the subject-specific arithmetic mean (AM) of airborne nickel exposure in the OH and OH/EIQ rounds for Plants 1, 2, and 3 (a–c) and each plant's corresponding distribution of the airborne nickel subject-specific AMs (d–f). OH = assessment based on the occupational history questionnaire only; OH/EIQ = assessment based on both the occupational history questionnaire and electroplating industrial questionnaire.
For nickel, the two-category scale derived for the group rating had higher sensitivity and specificity when compared to the urinary measurements (sensitivity = 75% in OH round, 78% in OH/EIQ round; specificity = 66% in both rounds) than when compared to the airborne measurements (sensitivity = 25% in OH round, 28% in OH/EIQ round; specificity = 16% in both rounds). For chromium, the two-category scale also had slightly higher sensitivity and specificity with the urinary measurements (sensitivity = 50% in OH round, 47% in OH/EIQ round; specificity = 58% in both rounds) than with the airborne measurements (sensitivity = 47% in OH round, 39% in OH/EIQ round; specificity = 50% in both rounds). No consistent pattern was observed across assessment rounds for these metrics. The two metals showed somewhat opposite patterns, with 9–12% higher sensitivity than specificity for nickel and 8–11% lower sensitivity than specificity for chromium.
Discussion
In this study, experts’ ratings for nickel- and chromium-exposed workers based on OH and OH/EIQ responses were evaluated against repeated airborne and urinary measurements. The experts’ moderately-high agreement amongst themselves (κw = 0.60–0.64) was comparable to the median agreement reported in previous studies 9-10, 12-14, 31-32. Despite their moderately-high reliability, the experts’ ratings had only poor to moderate validity overall in relation to both airborne and urinary measurements and was lower than most previous validity studies 9, 11, 17, 33. For example, Hertzman et al. 17 found much higher correlations (> 0.65) between experienced workers’ estimates of exposure and urinary chlorophenate levels in a large cohort study of the lumber industry. However, Tielemans et al. 11 observed similarly poor to fair agreement (kappa < 0.4) between estimates based on expert review of job-specific questionnaires (which has similarities to the EIQ used here) and urinary measurements (i.e., methylhippuric, hippuric and chromium) in a population-based study. The validity observed here was also much lower than previously reported for two other Shanghai industries (textile industry, rs = 0.30–0.65; foundry industry, rs = 0.65–0.85) evaluated using nearly the same study design but based solely on airborne measurements 9. The group rating provided minimal improvement to the experts’ validity in this study, unlike previous studies that observed improvements in the validity when the estimates of multiple raters were averaged 9, 12, 34-35. Moderate to moderately-high validity, however, was observed when the experts’ four category rating scale was dichotomized.
The poor overall validity observed here was likely, at least in part, a function of the limited exposure distribution between subjects within the same plant. Our plant-stratified analyses found much higher validity for Plant 1, which was the only plant where the subject-specific means were relatively evenly distributed across a relative wide range of exposure levels. All other plants had subject-specific means that were skewed and/or were narrowly clustered in the low exposure range (See Figure 1). Thus, the experts assessed workers’ exposures on a four-category scale when, for some plants in this study, there was little contrast (ICCs ranging from 0 to 0.5)among the exposures of most of the assessed workers. This finding is consistent with our previous study that observed poorer validity in the textile industry (with a highly skewed exposure distribution that clustered at low concentrations) than in the foundry industry (with more evenly distributed exposure concentrations) 9. This finding may also point to the difficulty of asking experts to provide exposure ratings in the absence of any exposure measurements on which to anchor their estimates, which has previously been shown to improve experts’ validity8, 10, 15, The availability of at least some measurements for these workplaces in advance of the rater evaluations may have revealed the lack of exposure contrast. Although evaluating whether the experts’ ratings were better associated with air or urine concentrations was our primary objectives, our findings are inconclusive because of the overall poor validity observed here. The experts’ ratings had better sensitivity and specificity for the urinary concentrations than the airborne concentrations, which provides some, but not conclusive, evidence that the experts may have considered all routes of exposure. This result was somewhat surprising because dermal exposure is more difficult to assess, has not been as well studied as airborne exposure, and little is known about contribution of various dermal exposure determinants 36. These analyses also demonstrated that assessing exposure using a two-category scale (low and high) is easier than using a four-category scale when between-worker difference in exposure is small. However, the sensitivities (nickel: 75– 78%; chromium: 47–50%) and specificities (nickel: 66%; chromium: 58%) based on the urinary concentrations remained only moderate to moderately-high for both metals in both assessment rounds, suggesting that the experts were only partly successful in distinguishing exposures based on a two category scale in this setting. The poor validity on both the four and two-category scales suggests that even a limited, a priori, characterization of exposure may help determine appropriate semi-quantitative categories for the exposure range and the number of exposure categories to use for expert assessment in a workplace or study before requesting experts to provide their exposure estimates. For example, syntheses of the data in the published literature 37-39 or from inspection measurements 40-44 and other exposure databases 45 can provide useful information on exposure variability and help anchor the experts’ ratings to a concentration scale.
The use of the electroplating industry questionnaire (EIQ) responses had little impact on the overall measures of reliability or validity of the experts’ ratings in this study, likely (at least in part) due to the limited contrast in exposure concentrations between participants within the same plant. However, our finding is consistent with our similar evaluations in the textile and foundry industries 9 and with previous studies that showed that additional information did not improve raters’ reliability and validity 12, 34., 46, although some improvements were observed by Tielemans et al. 11 with the use of similar types of questionnaires. The advantage of the EIQ, however, is that within-job differences are systematically captured and thus more easily used in programmable decision rules 47.
No consistent patterns were observed in the reliability and validity of the experts’ ratings by type of rater, similar to our previous findings in the textile and foundry industries 9. Industrial hygienists had somewhat higher reliability than the occupational physicians for both metals in most comparisons. However, both types of experts generally had similar poor-to-fair validity with airborne and urinary nickel and chromium exposures overall. Plant-specific differences were observed, with the industrial hygienists’ ratings having higher validity than the occupational physicians’ ratings in some comparisons. The similarity of the two types of raters likely relates to these occupational physicians’ regular work site visits and familiarity with exposure monitoring and may not be generalizable to occupational physicians without this expertise. For instance, several studies have reported that substantial field experience is critical to provide valid exposure estimates 10, 16, 34, 48.
This study has several limitations. As in Friesen et al. 9, this study design straddled the population-based and industry-based study designs by using questionnaires designed for a typical case-control study but evaluating the performance in a single industry for a single year. Thus, this study cannot predict how well these experts would have performed if asked to estimate these same exposures across a wide time span or across multiple industries with a wide range of exposures. In addition, the small amount of contrast in exposures in these plants limited our ability to assess the validity of experts’ ratings in the electroplating industry. The contrast in case-control studies is likely to vary by the agent, time period and population being assessed. In addition, the airborne and urinary measurements may be ‘alloyed’ gold standards because the measurements did not account for the metal's solubility, valence, form, or particle size and four measurements may not be sufficient to fully characterize the subjects’ average exposures.
Conclusion
In this study, despite moderately-high reliability among experts, experts ratings had low validity in relation to airborne and urinary exposure measurements which was likely attributable to low contrast in exposures. As a result, our evaluations of the influence of responses to industry-specific questionnaires on the validity of the experts’ ratings in relation to airborne and urinary measurements were inconclusive. However, this study provides some insight into the challenges for designing a study to evaluate the validity of exposure ratings. Validity was generally better when there was a wider and more even distribution of exposure and poor when there was low contrast. In addition, this study showed that the experts were reasonably able to distinguish between high and low exposures, but not finer categories. Both findings indicate that prior knowledge of the variability in exposure may be necessary for setting exposure categories for experts’ ratings. In population-based studies, however, information on historical exposure and its variability is generally limited or nonexistent and thus we will likely need to rely on syntheses of publicly available data to anchor experts’ estimates, which do not necessarily reflect the working conditions of the job being assessed. Our results also point to the continued need for methods to increase the ability of experts to evaluate exposures. For instance, training experts in the interpretation of exposure distributions and determinants49 and developing transparent, programmable decision rules to systematically capture the exposure differences that can be reviewed and refined by multiple experts47 may improve the accuracy of expert judgment.
Supplementary Material
Acknowledgements
This study was funded by the Intramural Research Program of the Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health.
Footnotes
Supplementary information
Supplementary information is available at the Journal of Exposure Science and Environmental Epidemiology website.
Reference
- 1.Siemiatycki J, Day N, Fabry J, Cooper JA. Discovering carcinogens in the occupational environment: A novel epidemiologic approach. J Natl Cancer I. 1981;66:217–225. [PubMed] [Google Scholar]
- 2.Stewart PA, Stewart WF, Siemiatycki J, Heineman EF, Dosemeci M. Questionnaires for collecting detailed occupational information for community-based case control studies. Am Ind Hyg Assoc J. 1998;59:39–44. doi: 10.1080/15428119891010325. [DOI] [PubMed] [Google Scholar]
- 3.Rybicki BA, Peterson EL, Johnson CC, Kortsha GX, Cleary WM, Gorell JM. Intra- and inter-rater agreement in the assessment of occupational exposure to metals. Int J Epidemiol. 1998;27:269–273. doi: 10.1093/ije/27.2.269. [DOI] [PubMed] [Google Scholar]
- 4.Benke G, Sim M, Forbes A, Salzberg M. Retrospective assessment of occupational exposure to chemicals in community-based studies: validity and repeatability of industrial hygiene panel ratings. Int J Epidemiol. 1997;26:635–642. doi: 10.1093/ije/26.3.635. [DOI] [PubMed] [Google Scholar]
- 5.Siemiatycki J, Fritschi L, Nadon L, Gérin M. Reliability of an expert rating procedure for retrospective assessment of occupational exposures in community-based case-control studies. Am J Ind Med. 1997;31:280–286. doi: 10.1002/(sici)1097-0274(199703)31:3<280::aid-ajim3>3.0.co;2-1. [DOI] [PubMed] [Google Scholar]
- 6.McGuire V, Longstreth WT, Nelson LM, Koepsell TD, Checkoway H, Morgan MS, et al. Occupational Exposures and Amyotrophic Lateral Sclerosis. A Population-based Case-Control Study. Am J Epidemiol. 1997;145:1076–1088. doi: 10.1093/oxfordjournals.aje.a009070. [DOI] [PubMed] [Google Scholar]
- 7.Goldberg MS, Siemiatycki J, Gerin M. Inter-rater agreement in assessing occupational exposure in a case-control study. Br J Ind Med. 1986;43:667–676. doi: 10.1136/oem.43.10.667. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Fritschi L, Nadon L, Benke G, Lakhani R, Latreille B, Parent ME, et al. Validation of expert assessment of occupational exposures. Am J Ind Med. 2003;43:519–522. doi: 10.1002/ajim.10208. [DOI] [PubMed] [Google Scholar]
- 9.Friesen MC, Coble JB, Katki HA, Ji BT, Xue S, Lu W, et al. Validity and Reliability of Exposure Assessors’ Ratings of Exposure Intensity by Type of Occupational Questionnaire and Type of Rater. Ann Occup Hyg. 2011;55:601–611. doi: 10.1093/annhyg/mer019. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10.Teschke K, Olshan AF, Daniels JL, De Roos AJ, Parks CG, Schulz M, et al. Occupational exposure assessment in case–control studies: opportunities for improvement. Occup Environ Med. 2002;59:575–594. doi: 10.1136/oem.59.9.575. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Tielemans E, Heederik D, Burdorf A, Vermeulen R, Veulemans H, Kromhout H, et al. Assessment of occupational exposures in a general population: comparison of different methods. Occup Environ Med. 1999;56:145–151. doi: 10.1136/oem.56.3.145. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Steinsvåg K, Bråtveit M, Moen BE, Kromhout H. Inter-rater agreement in the assessment of exposure to carcinogens in the offshore petroleum industry. Occup Environ Med. 2007;64:582–588. doi: 10.1136/oem.2006.030528. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Rocheleau CM, Lawson CC, Waters MA, Hein MJ, Stewart PA, Correa A, Echeverria D, et al. Inter-Rater Reliability of Assessed Prenatal Maternal Occupational Exposures to Solvents, Polycyclic Aromatic Hydrocarbons, and Heavy Metals. J Occup Environ Hyg. 2011;8:718–728. doi: 10.1080/15459624.2011.627293. [DOI] [PubMed] [Google Scholar]
- 14.Mannetje At, Fevotte J, Fletcher T, Brennan P, Legoza J, Szeremi M, et al. Assessing Exposure Misclassification by Expert Assessment in Multicenter Occupational Studies. Epidemiology. 2003;14:585–592. doi: 10.1097/01.ede.0000072108.66723.0f. [DOI] [PubMed] [Google Scholar]
- 15.Post V, Kromhout H, Heederik D, Noy D, Duilzentkunst RS. Semiquantitative estimates of exposure to methylene chloride and styrene: the influence of quantitative exposure data. Appl Occup Environ Hyg. 1991;6:197–204. [Google Scholar]
- 16.Teschke K, Hertzman C, Dimich-Ward H, Ostry A, Blair J, Hershler R. A comparison of exposure estimates by worker raters and industrial hygienists. Scand J Work Environ Health. 1989;15:424–429. doi: 10.5271/sjweh.1831. [DOI] [PubMed] [Google Scholar]
- 17.Hertzman C, Teschke K, Dimich-Ward H, Ostry A. Validity and reliability of a method for retrospective evaluation of chlorophenate exposure in the lumber industry. Am J Ind Med. 1988;14:703–713. doi: 10.1002/ajim.4700140609. [DOI] [PubMed] [Google Scholar]
- 18.Guillemin MP, Berode M. A study of the difference in chromium exposure in workers in two types of electroplating process. Ann Occup Hyg. 1978;21:105–112. doi: 10.1093/annhyg/21.2.105. [DOI] [PubMed] [Google Scholar]
- 19.Kiilunen M, Utela J, Rantanen T, Norppa H, Tossavainen A, Koponen M, et al. Exposure to soluble nickel in electrolytic nickel refining. Ann Occup Hyg. 1997;41:167–173. doi: 10.1016/s0003-4878(96)00032-4. [DOI] [PubMed] [Google Scholar]
- 20.Pierre F, Diebold F, Baruthio F. Biomonitoring of two types of chromium exposure in an electroplating shop. Int Arch Occ Env Hea. 2008;81:321–329. doi: 10.1007/s00420-007-0216-x. [DOI] [PubMed] [Google Scholar]
- 21.Lumens MEGL, Ulenbelt P, Géron HMA, Herber RFM. Hygienic behaviour in chromium plating industries. Int Arch Occ Env Hea. 1993;64:509–514. doi: 10.1007/BF00381100. [DOI] [PubMed] [Google Scholar]
- 22.Tola S, Kilpio J, Virtamo M. Urinary and Plasma Concentrations of Nickel as Indicators of Exposure to Nickel in an Electroplating Shop. J Occup Environ Med. 1979;21:184–188. [PubMed] [Google Scholar]
- 23.Liu CS, Kuo HW, Lai JS, Lin TI. Urinary N-acetyl-beta-glucosaminidase as an indicator of renal dysfunction in electroplating workers. Int Arch Occ Env Hea. 1998;71:348–352. doi: 10.1007/s004200050291. [DOI] [PubMed] [Google Scholar]
- 24.Caglieri A, Goldoni M, Acampa O, Andreoli R, Vettori MV, Corradi M, et al. The Effect of Inhaled Chromium on Different Exhaled Breath Condensate Biomarkers among Chrome-Plating Workers. Environ Health Perspect. 2006;114:542–546. doi: 10.1289/ehp.8506. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25.Bavazzano P, Bolognesi R, Cassinelli C, Gori R, Li Donni V, Martellini F, et al. Skin contamination and low airborne nickel exposure of electroplaters. Sci Total Environ. 1994;155:83–86. doi: 10.1016/0048-9697(94)90363-8. [DOI] [PubMed] [Google Scholar]
- 26.Makinen M, Linnainmaa M. Dermal exposure to chromium in electroplating. Ann Occup Hyg. 2004;48:277–283. doi: 10.1093/annhyg/meg072. [DOI] [PubMed] [Google Scholar]
- 27.NIOSH . Manual of Analytical Methods (NMAM): Method 7300. 4th edn. US Department of Health and Human Services; Cincinnati, OH: 1994. [Google Scholar]
- 28.NIOSH . Manual of Analytical Methods (NMAM): Method 8310. 4th edn. US Department of Health and Human Services; Cincinnati, OH: 1994. [Google Scholar]
- 29.Smith TJ, Kriebel D. A biologic approach to environmental assessment and epidemiology. Oxford University Press; New York: 2010. pp. pp77–79. [Google Scholar]
- 30.Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33:159–174. [PubMed] [Google Scholar]
- 31.Ciccone G, Vineis P. Inter-rater agreement in the assessment of occupational exposure to herbicides. Med Lav. 1988;79:363–367. [PubMed] [Google Scholar]
- 32.Correa A, Min YI, Stewart PA, Lees PS, Breysse P, Dosemeci M, et al. Inter-rater agreement of assessed prenatal maternal occupational exposures to lead. Birth Defects Res A Clin Mol Teratol. 2006;76:811–824. doi: 10.1002/bdra.20311. [DOI] [PubMed] [Google Scholar]
- 33.Cherrie JW, Schneider T. Validation of a New Method for Structured Subjective Assessment of Past Concentrations. Ann Occup Hyg. 1999;43:235–245. [Google Scholar]
- 34.de Cock J, Kromhout H, Heederik D, Burema J. Experts’ subjective assessment of pesticide exposure in fruit growing. Scand J Work Environ Health. 1996;22:425–432. doi: 10.5271/sjweh.163. [DOI] [PubMed] [Google Scholar]
- 35.Semple SE, Proud LA, Tannahill SN, Tindall ME, Cherrie JW. A training exercise in subjectively estimating inhalation exposures. Scand J Work Environ Health. 2001;27:395–401. doi: 10.5271/sjweh.632. [DOI] [PubMed] [Google Scholar]
- 36.Vermeulen R, Stewart P, Kromhout H. Dermal exposure assessment in occupational epidemiologic research. Scand J Work Environ Health. 2002;28:371–385. doi: 10.5271/sjweh.689. [DOI] [PubMed] [Google Scholar]
- 37.Bakke B, Stewart PA, Waters MA. Uses of and Exposure to Trichloroethylene in U.S. Industry: A Systematic Literature Review. J Occup Environ Hyg. 2007;4:375–390. doi: 10.1080/15459620701301763. [DOI] [PubMed] [Google Scholar]
- 38.Park D, Stewart PA, Coble JB. A Comprehensive Review of the Literature on Exposure to Metalworking Fluids. J Occup Environ Hyg. 2009;6:530–541. doi: 10.1080/15459620903065984. [DOI] [PubMed] [Google Scholar]
- 39.Pronk A, Coble J, Stewart PA. Occupational exposure to diesel engine exhaust: A literature review. J Expos Sci Environ Epidemiol. 2009;19:443–457. doi: 10.1038/jes.2009.21. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 40.Lavoue J, Friesen MC, Burstyn I. Workplace Measurements by the US Occupational Safety and Health Administration since 1979: Descriptive Analysis and Potential Uses for Exposure Assessment. Ann Occup Hyg. 2013;57:77–97. doi: 10.1093/annhyg/mes055. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Froines JR, Baron S, Wegman DH, O'Rourke S. Characterization of the airborne concentrations of lead in U.S. industry. Am J Ind Med. 1990;18:1–17. doi: 10.1002/ajim.4700180102. [DOI] [PubMed] [Google Scholar]
- 42.Friesen MC, Coble JB, Lu W, Shu XO, Ji BT, Portengen L, et al. Combining a Job-Exposure Matrix with Exposure Measurements to Assess Occupational Exposure to Benzene in a Population Cohort in Shanghai, China. Ann Occup Hyg. 2012;56:80–91. doi: 10.1093/annhyg/mer080. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Burstyn I, Jonasi L, Wild TC. Obtaining compliance with occupational health and safety regulations: a multilevel study using self-determination theory. Int J Environ Heal R. 2010;20:271–287. doi: 10.1080/09603121003663461. [DOI] [PubMed] [Google Scholar]
- 44.Koh DH, Bhatti P, Coble JB, Stewart PA, Lu W, Shu XO, et al. Calibrating a population-based job-exposure matrix using inspection measurements to estimate historical occupational exposure to lead for a population-based cohort in Shanghai, China. J Expos Sci Environ Epidemiol. 2012 doi: 10.1038/jes.2012.86. e-pub ahead of print 22 August 2012; doi: 10.1038/jes.2012.86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Peters S, Vermeulen R, Olsson A, Van Gelder R, Kendzia B, Vincent R, et al. Development of an Exposure Measurement Database on Five Lung Carcinogens (ExpoSYN) for Quantitative Retrospective Occupational Exposure Assessment. Ann Occup Hyg. 2012;56:70–79. doi: 10.1093/annhyg/mer081. [DOI] [PubMed] [Google Scholar]
- 46.Stewart PA, Carel R, Schairer C, Blair A. Comparison of industrial hygienists' exposure evaluations for an epidemiologic study. Scand J Work Environ Health. 2000;26:44–51. doi: 10.5271/sjweh.509. [DOI] [PubMed] [Google Scholar]
- 47.Pronk A, Stewart PA, Coble JB, Katki HA, Wheeler DC, Colt JS, et al. Comparison of two expert-based assessments of diesel exhaust exposure in a case–control study: programmable decision rules versus expert review of individual jobs. Occup Environ Med. 2012;69:752–758. doi: 10.1136/oemed-2011-100524. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Kromhout H, Oostendorp Y, Heederik D, Boleij JSM. Agreement between qualitative exposure estimates and quantitative exposure measurements. Am J Ind Med. 1987;12:551–562. doi: 10.1002/ajim.4700120509. [DOI] [PubMed] [Google Scholar]
- 49.Logan P, Ramachandran G, Mulhausen J, Hewett P. Occupational Exposure Decisions: Can Limited Data Interpretation Training Help Improve Accuracy? Ann Occup Hyg. 2009;53:311–324. doi: 10.1093/annhyg/mep011. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.

