Skip to main content
Annals of Occupational Hygiene logoLink to Annals of Occupational Hygiene
. 2014 Mar 3;58(5):612–624. doi: 10.1093/annhyg/meu012

Systematically Extracting Metal- and Solvent-Related Occupational Information from Free-Text Responses to Lifetime Occupational History Questionnaires

Melissa C Friesen 1,*, Sarah J Locke 1, Carina Tornow 2, Yu-Cheng Chen 1, Dong-Hee Koh 1, Patricia A Stewart 1,3, Mark Purdue 1, Joanne S Colt 1
PMCID: PMC4053931  PMID: 24590110

Abstract

Objectives:

Lifetime occupational history (OH) questionnaires often use open-ended questions to capture detailed information about study participants’ jobs. Exposure assessors use this information, along with responses to job- and industry-specific questionnaires, to assign exposure estimates on a job-by-job basis. An alternative approach is to use information from the OH responses and the job- and industry-specific questionnaires to develop programmable decision rules for assigning exposures. As a first step in this process, we developed a systematic approach to extract the free-text OH responses and convert them into standardized variables that represented exposure scenarios.

Methods:

Our study population comprised 2408 subjects, reporting 11991 jobs, from a case–control study of renal cell carcinoma. Each subject completed a lifetime OH questionnaire that included verbatim responses, for each job, to open-ended questions including job title, main tasks and activities (task), tools and equipment used (tools), and chemicals and materials handled (chemicals). Based on a review of the literature, we identified exposure scenarios (occupations, industries, tasks/tools/chemicals) expected to involve possible exposure to chlorinated solvents, trichloroethylene (TCE) in particular, lead, and cadmium. We then used a SAS macro to review the information reported by study participants to identify jobs associated with each exposure scenario; this was done using previously coded standardized occupation and industry classification codes, and a priori lists of associated key words and phrases related to possibly exposed tasks, tools, and chemicals. Exposure variables representing the occupation, industry, and task/tool/chemicals exposure scenarios were added to the work history records of the study respondents. Our identification of possibly TCE-exposed scenarios in the OH responses was compared to an expert’s independently assigned probability ratings to evaluate whether we missed identifying possibly exposed jobs.

Results:

Our process added exposure variables for 52 occupation groups, 43 industry groups, and 46 task/tool/chemical scenarios to the data set of OH responses. Across all four agents, we identified possibly exposed task/tool/chemical exposure scenarios in 44–51% of the jobs in possibly exposed occupations. Possibly exposed task/tool/chemical exposure scenarios were found in a nontrivial 9–14% of the jobs not in possibly exposed occupations, suggesting that our process identified important information that would not be captured using occupation alone. Our extraction process was sensitive: for jobs where our extraction of OH responses identified no exposure scenarios and for which the sole source of information was the OH responses, only 0.1% were assessed as possibly exposed to TCE by the expert.

Conclusions:

Our systematic extraction of OH information found useful information in the task/chemicals/tools responses that was relatively easy to extract and that was not available from the occupational or industry information. The extracted variables can be used as inputs in the development of decision rules, especially for jobs where no additional information, such as job- and industry-specific questionnaires, is available.

Keywords: cadmium, chlorinated solvents, exposure assessment methodology, lead

INTRODUCTION

Most population-based studies use lifetime occupational history (OH) questionnaires as the starting point for evaluating occupational risk factors. These questionnaires systematically collect information from study subjects for each job held using a series of open-ended questions, including job title, company name, product made or service provided by the company, and the job’s start and stop years. Variants of these questionnaires also ask questions on work frequency patterns, the main tasks and activities performed in the job, the tools and equipment used, and the chemicals and materials handled.

The OH responses serve a dual purpose. First, they may be used to code jobs into standardized occupation and industry classification codes (e.g. SOC and SIC, respectively). These codes can be directly used in epidemiologic analyses of occupation and/or industry or they can be used to link the reported jobs to exposure metrics in population-based job-exposure matrices (JEMs) that can be used in epidemiologic analyses for the relevant agents. However, neither type of analysis captures important differences in tasks and exposure between people with the same job title (Bouyer and Hemon, 1993; Stewart and Stewart, 1994; Kromhout and Vermeulen, 2001; Teschke et al., 2002). Second, OH responses may be used during a job-by-job review by exposure assessors. This expert review can capture more of the within-job variability in exposure than analyses by occupation, industry, or JEMs; however, it is time consuming and the rationale for the exposure decision rules is rarely published.

The utility of the OH responses in the exposure assessment process was recently reported in a study that developed programmable decision rules to assign estimates of occupational diesel exhaust exposure in a population-based case–control study of bladder cancer (Pronk et al., 2012). The resulting estimates of probability, intensity, and frequency developed for diesel exhaust exposure using only the OH responses had moderate to moderately high agreement with estimates based on more detailed occupation- and industry-specific questionnaires that contained diesel exhaust–related questions (2749 jobs, proportion of agreement: 71–84%; weighted kappa: 0.50–0.74), providing evidence for the utility of OH responses in the exposure assessment process. To use the OH responses as inputs in the programmable rules, however, the authors first had to convert the free-text information in the OH responses into standardized variables identifying exposure scenarios (e.g. exposed occupations and industries; exposure sources) where occupational diesel exhaust exposure was likely to have occurred. They derived these variables manually using text-based filters in Microsoft Excel, a process that was neither transparent nor transferrable to other studies.

In this study, our aim was to develop a more systematic and transferable approach for extracting free-text OH responses and converting them into standardized exposure variables representing exposure scenarios. This is the first step in the process of developing decision rules that incorporate information from both the occupational histories and the job- and industry-specific modules. The development and application of decision rules for assessing exposures for jobs reported by our study participants will be described separately.

METHODS

Study population and data collection

The study population comprised 2408 subjects, reporting 11991 jobs, who were enrolled in a population-based case–control study of renal cell carcinoma that was conducted in Chicago and Detroit (Colt et al., 2011; Purdue et al., 2011). We describe only the study components related to occupation here. Trained interviewers administered to all subjects a structured interview that included a lifetime OH where subjects reported all jobs held for a minimum of 12 months from the age of 16 years. The OH comprised open-ended questions that asked, for each job reported, the job title, employer name, job start and stop year, employer’s activities (product made or service provided), work frequency (i.e. hours per day, days per week, months per year), main activities or duties (hereafter, task question), the tools and equipment used (hereafter, tool question), and the chemicals or materials used (hereafter, chemical question). Each job was assigned a four-digit SOC (US Department of Commerce, 1980) and a four-digit SIC (Office of Management and Budget, 1987). Responses to the OH questions triggered through key words additional questions (modules) if the given job was linked to 1 of 23 occupations (e.g. welders, painters) or 13 industries of interest (e.g. chemical industry, textile industry) and if the job was held for a minimum of 3500h. All jobs were asked OH questions; 9309 jobs (78%) had additional information from completed modules.

Deriving exposure variables representing exposure scenarios from the OH responses

We used a multistep process (A–H) shown in Fig. 1 to derive exposure variables identifying exposure scenarios from the OH responses. Each step is described in detail below.

1.

1

Process for extracting free-text OH responses into exposure scenario–related variables linked to the study subjects’ jobs.

  • A. Exposure scenarios in the literature: We reviewed the published literature to identify exposure scenarios where exposure to each of four agents—chlorinated solvents as a group, trichloroethylene (TCE) in particular, lead, and cadmium—could be expected to have occurred. The agents were chosen based on ongoing exposure assessment needs in this study. For each exposure scenario, we identified related occupations, industries, tasks, tools, and chemicals. Examples of exposure scenarios related to chlorinated solvents included the occupations of mechanics, janitors, dry cleaners, and painters; the automobile manufacturing and the textile industries; tasks and tools associated with degreasing, gluing, and printing (e.g. printing press); and chemicals that indicated solvents, chlorinated solvents (e.g. TCE), or solvent-containing products (e.g. paint remover). Some exposure scenarios overlapped but were kept separate because the definitions varied slightly by agent. For example, chemical paint removal was identified with chlorinated solvent exposure, whereas chemical and mechanical paint removal was identified with lead exposure.

  • B and C. Occupation- and industry-related exposure scenarios: Our literature review identified 51 occupation groups and 43 industry groups associated with exposure to one or more of the four agents of interest. We added an additional occupation group to identify administrative jobs because these occupations usually indicate the absence of exposure and thus this designation is expected to be a useful input in future exposure decision rules. We assigned one or more relevant four-digit SOC codes to each occupation group, and one or more relevant four-digit SIC codes to each industry group. We created two spreadsheets, one for occupation and one for industry, with three columns each: the first column listed a unique record identifier, the second column contained a four-digit SOC or SIC code, and the third column listed the occupation or industry exposure scenario group associated with that SOC or SIC code. Only SOCs and SICs associated with administrative occupations or with occupation groups related to exposure scenarios to at least one of the four agents were listed in the spreadsheet; all other SOCs and SICs were excluded.

  • D. Task-, tool-, and chemical-related exposure scenarios: For exposure scenarios related to task, tools, or chemicals, a team of industrial hygienists used their professional judgment to develop a list of character strings that represented words and phrases (hereafter, key words) that may occur in the subjects’ responses. For example, the painting scenario was linked to specific responses, such as spray paint (from the task question), airless sprayer (tool question), and paint remover (chemical question). The team identified additional key words by reviewing lists of comma-parsed free-text OH responses from the occupational histories. Key words included truncated words, complete words, and phrases (e.g. ‘icide’ identified ‘insecticide’, ‘pesticide’, ‘herbicide’). Where truncated words or key words identified activities related to one or more exposure scenarios and/or an unexposed situation (e.g. ‘gas’ identified ‘pumping gas’, ‘read gas meters’, ‘inspect gaskets’), we used phrases to make the link to the appropriate scenario (e.g. ‘pump gas’ and ‘gas fill up’). Misspellings (e.g. aluminum was misspelled as ‘aluminim’, ‘alumiun’, and ‘alumunum’) identified in the lists of comma-parsed free-text OH responses and other permutations (e.g. ‘pumping gas’ and ‘pumped gas’) were included as separate key words.

  • Three separate spreadsheets, one each for the task, tool, and chemical questions, were developed by the industrial hygienists to record the key words and their applicable exposure scenarios. Table 1 provides an excerpt for the ‘chemicals’ question. Complete spreadsheets for all variables can be obtained from the corresponding author. Each spreadsheet had multiple columns: the first column listed the unique record identifier, the second column listed the key word, and the subsequent columns listed all possible exposure scenarios relevant to that OH question (e.g. for the ‘chemicals’ spreadsheet shown in Table 1, the chemical-related exposure scenarios were TCE, chlorinated solvents, solvents, chemical degreasers, paint, glues, inks, lubricants, and pesticides). When a key word was associated with an exposure scenario, a ‘1’ was recorded in the corresponding cell; otherwise, the cell remained empty. Key words could be associated with multiple exposure scenarios within each spreadsheet.

Table 1.

Excerpt of key words, including observed misspellings, and the structure of the spreadsheet developed to search free-text responses to the OH ‘chemicals used’ question for linkage to chemical-related exposure scenarios

ID Key worda TCE Chlorinated solvents Solvents Chemical degreasers Paint Glues Inks Lubricants Pesticides
1 TAP-MAJIC 1 1 1
2 TRI CHLORO ETHYLENE 1 1 1
3 TRICHLORETHYLENE 1 1 1
4 CHLOROFORM 1 1
5 DRY CLEANING 1 1
6 FREON 1 1
7 INK CLEANER 1 1
8 TOLUENE 1
9 DECREASER 1
10 METAL WASHING SOLVENTS 1
11 PARTS CLEANER 1
12 ENAMELS 1
13 PAINT 1
14 PRIMERS 1
15 GLUE 1
16 BLUEPRINT 1
17 INK FROM NEWSPAPER 1
18 PLOTTER INK 1
19 AIR TOOL OIL 1
20 HYDRALIC 1
21 245T 1
22 agent orange 1
23 INSECTICIDE 1

The full list can be obtained from the corresponding author.

aMisspellings are listed intentionally, to reflect the need to include misspellings in the linkage process.

  • E. Linking exposure scenarios to study subjects to derive exposure variables: A customized SAS macro program using SAS 9.1.2 for Windows (SAS Institute Inc., Cary, NC, USA) was developed to read the search terms (i.e. SOC, SIC, key words) in the spreadsheets. The SAS macro converted each line from the spreadsheets into a conditional statement (if…, then…) to search the data set containing the OH responses and assigned the subject–job record a value of ‘1’ for that exposure scenario if the key word was found. The search was not case sensitive; in addition, the program removed spaces between words when searching for a match (e.g. ‘pump gas’ became ‘pumpgas’). For each exposure scenario listed in the spreadsheets, the macro added corresponding exposure variables (with responses ‘1’ or missing) to the subjects’ work history records. For example, a subject reporting his or her tasks for a given job as ‘cleaning parts and gluing them together’ was assigned a value of ‘1’ for both the degreasing and gluing task variables; no value was added to the other task variables for that job. To capture the absence of information in the subject’s responses, we created a variable that indicated whether there was a ‘none’ response to the tool question, another for a ‘none’ response to the chemical question, and a third that identified when both the tool and chemical questions had a response of ‘none’.

  • F. Exposure variable review: After applying the macro to add exposure scenario–related variables to the data set containing the OH responses, an industrial hygienist reviewed the free-text responses with a ‘1’ in at least one exposure variable, excluding the ‘none’ scenarios, to verify that the reported information was appropriately linked to the assigned exposure scenario. The key word lists were revised based on this review (for example to be more specific to avoid false matches) and the program rerun. After the final application of the macro, an industrial hygienist conducted a second review of the free-text responses linked to each exposure variable. In this review, false positive identifications were revised directly in the data set of OH responses using Stata/SE v.11.2 for Windows (StataCorp LP, College Station, TX, USA). For example, a job record with the job title ‘IV (intravenous) technician’ that had a response of ‘starting IVs, changing fluids’ to the task question had been assigned by the macro a value of ‘1’ for the scenario ‘vehicle repair’ based on the key word ‘changing fluids’; the positive flag for vehicle repair was removed for this record. The second review identified 117 false positive identifications for chlorinated solvent–related variables and 268 false positive identifications for lead- and cadmium-related variables.

  • G. Merging task, tool, and chemical exposure variables: We reviewed the extracted variables to combine, into a single variable, those task, tool, and chemical exposure variables that were related to the same exposure scenario. For example, to create a single variable for the exposure scenario ‘painting’, we merged all relevant information from variables representing painting from the task question (e.g. painting), the tool question (e.g. paint equipment), and the chemical question (e.g. paints) to form a single task/tool/chemical variable. There were two occasions where we did not merge variables. First, we kept separate any variable considered too generic to be combined with more specific information from other OH questions. For example, the tool-related variables for construction were frequently identified within nonconstruction-related jobs (e.g. a hammer reported by a jewelry maker; a saw reported by an auto technician); as a result, the final construction task/tool/chemical variable was based solely on the task question. Second, we did not combine a chemical variable with its associated task and tool variables when the mentioned chemical could be used in multiple exposure scenarios. For example, the chemical scenario ‘chlorinated solvents’ was associated with multiple tasks and tools (i.e. painting, gluing, and degreasing) and thus was kept separate.

  • H. Final data set: At the end of this process, we had a data set linking each subject–job to 52 occupation, 43 industry, and 46 task/tool/chemical exposure variables related to exposure scenarios associated with the four agents. Descriptions for each variable are provided in Supplementary Table S1 (available at Annals of Occupational Hygiene online).

Evaluation of the extracted occupational information

We conducted descriptive and evaluative analyses of the data set using Stata/SE v.11.2 for Windows (StataCorp LP, College Station, TX, USA). Hereafter, we use the term ‘possibly exposed’ to indicate that a job was positively linked to an agent based on an occupation, industry, and/or task/tool/chemical exposure variable. The term does not indicate that an exposure decision was made. Future work, beyond the scope of this paper, will involve developing decision rules to estimate exposure that will consider both the OH responses and the module responses, focusing on the jobs with positive linkages in the current effort.

We began by identifying the occupational variables, industry variables, and task/tool/chemical exposure variables that were associated with the highest number of possibly exposed jobs in our study population. Our next step was to examine each agent separately; we calculated the number of jobs that were linked to each agent by possibly exposed exposure variables based only on occupation, only on industry, only on tasks/tools/chemicals, and finally on all three types of variables combined. To illustrate the variability in the identification of exposure variables within occupations, we then selected 10 possibly exposed occupational groups and examined the proportion of jobs in each group that were identified as possibly exposed to each agent based only on industry variables and only on the task/tool/chemical variables.

We then performed a limited comparison of this approach for TCE exposure, for which an industrial hygienist (who was not involved in the work described here) had previously developed estimates of the probability of exposure for each job. Using the OH and, when available, the job- and industry-specific modules, the industrial hygienist had assessed probability using the following categories. We calculated the number and proportion of jobs assessed by the expert as having a probability of TCE exposure ≥1% for each combination of exposure variable types.

RESULTS

The most frequently identified occupation, industry, and task/tool/chemical exposure variables appearing in the OH responses of this study population are listed in Table 2, along with the agents of interest for each variable. The most common occupational group was administrative/clerk (1206 jobs) a variable that identified jobs that were unlikely to be exposed to any of the agents. Similarly, the most frequent task/tool/chemical variables were a ‘none’ response to the tool question (6600 jobs) and to the chemical question (6903 jobs). Among the occupations with possible exposure to one or more of the agents, the most frequent was assembly workers (321 jobs). The most common industry was automotive manufacturing (1277 jobs), which is highly prevalent in Detroit, one of the study sites. The most common task/tool/chemical variables were ‘driving gas-powered vehicles’ (753 jobs), ‘fabricating or machining tasks that may use solvents’ (504 jobs), and ‘fabricating metal parts’ (408 jobs). The most commonly identified chemicals were gasoline (139 jobs), lubricants (126 jobs), and chlorinated solvent–containing chemicals (109 jobs).

Table 2.

Most prevalent occupation-, industry-, and task/tool/chemical-related exposure variables in the study population, and agent of interest for each variable

Exposure variable N jobs identified Agent of interesta
Chlorinated solvents TCE Lead Cadmium
Most common occupations
 Administrative/clerk 1206
 Assembler   321 X X X X
 Engineers and engineering technologists   281 X X X X
 Machine operator, not metals or plastic   263 X X X X
 Machine operator, NEC, solvent exposure   260 X X X X
Most common industries
 Automobile manufacturing 1277 X X X X
 Military, national security   649 X X X
 Surgery or hospital care   536 X X
 On-road vehicle transport   342 X
 Metal fabrication   340 X X X X
Most common task/tool/chemicals
 Tool question: ‘none’ 6600
 Chemical question: ‘none’ 6903
Variables derived from only the task question, or combination of task, tool, and/or chemical questions
 Drive gas-powered vehicles   753
 Fabricate or machine, solvent-related processes   504 X X X X
 Fabricate metal parts   408 X X X X
 Use, mix, test, sell paint   339 X X X X
 Painting   338 X X X X
Variables derived from the chemical questions
 Solvent-containing chemicals   739 X X
 Gasoline   139 X
 Lubricants   126 X X X
 Chlorinated solvent–containing chemicals   109 X X
 Lead-containing chemicals  69 X

NEC, not elsewhere classified. The full list of exposure scenario–related variables and their definitions are provided in Supplementary Table S1 (available at Annals of Occupational Hygiene online).

Table 3 shows the distribution of identified scenarios by type of exposure variable (job, industry, or task/tool/chemical) for each of the four agents. For example, 30.1% of the 11991 jobs reported by study participants were linked to occupation variables with potential exposure to chlorinated solvents, while the corresponding percentages for industry variables and task/tool/chemical variables were 44.0 and 18.5%, respectively. For all four agents, industry variables were consistently linked to more possibly exposed jobs than were the occupation and task/tool/chemical-based exposure variables. Overall, no exposure scenarios related to chlorinated solvents, TCE, lead, or cadmium were identified in 46.8, 46.9, 47.9, and 60.0% of the jobs, respectively.

Table 3.

Number of jobs linked to each agent by occupation, industry, and task/tool/chemical exposure variables

Type of question identifying possibly exposed variables Scenarios identified with possible exposure to agent
N %
Chlorinated solvent
 Exposed occupation 3609 30.1
 Exposed industry 5279 44.0
 Exposed task/tool/chemical 2225 18.5
 Any of the above 6385 53.2
TCE
 Exposed occupation 3608 30.0
 Exposed industry 5249 43.8
 Exposed task/tool/chemical 2084 17.4
 Any of the above 6388 53.1
Lead
 Exposed occupation 3530 29.5
 Exposed industry 4915 41.0
 Exposed task/tool/chemical 2469 20.5
 Any of the above 6242 52.1
Cadmium
 Exposed occupation 2650 22.1
 Exposed industry 3949 32.9
 Exposed task/tool/chemical 1634 13.6
 Any of the above 4799 40.0

N, number; %, proportion of jobs.

The prevalence of identified exposure scenarios, and the type (industry, or task/tool/chemical) of information identified, varied by occupation group and demonstrates the heterogeneity in the information from the OH responses (Table 4). Most precision metal workers and painters were identified as also being in possibly exposed industries (all agents: 98–99% for precision metal workers; 89–91% for painters) and associated with possibly exposed task/tool/chemicals (all agents: 81–83% for precision metal workers; 91–96% for painters). The majority of construction laborers were also in possibly exposed industries (all agents, 82%), but only 36–45% of the jobs were identified with possibly exposed tasks/tools/chemicals related to the four agents. Many administrative jobs were in possibly exposed industries (28–39%, varying by agent); however, only 1–5% of the administrative jobs had identifiable possibly exposed task/tool/chemicals. We report the proportion of jobs identified for each occupation, industry, and task/tool/chemical variable for each 3-digit SOC in the study in Supplementary Table S2 (available at Annals of Occupational Hygiene online).

Table 4.

For selected possibly exposed occupation groups, proportion of jobs with possibly exposed exposure variables based on industry or task/tool/chemicals

Occupation group N jobs Proportion of jobs in possibly exposed industry Proportion of jobs with possibly exposed task/tool/chemical
Chlorinated solvent/TCE (%) Lead (%) Cadmium (%) Chlorinated solvent/TCE (%) Lead (%) Cadmium (%)
Carpenter  51 90 90 89 73 69 69
Construction laborer  56 82 82 82 36 45 39
Dry cleaner  31 52   3   3 16   3   0
Farmer, agricultural worker  63 86 86 86 21 40 21
Mechanic   150 65 70 42 82 79 69
Nurse   140 77   4   1   4   0   0
Painter  90 90 91 89 96 92 91
Physician, surgical assistant  36 78   6   3 14   3   3
Plumber  42 76 76 76 38 69 31
Precision metal workers   145 99 99 98 83 83 81
Administration 1206 39 33 28   2   5   1

N, number.

The number of possibly exposed jobs identified by each unique combination of variable types for TCE is shown in the first two columns of Table 5 (other agents not shown). Varying only slightly by agent, only 3–4% of the jobs were identified as possibly exposed based solely on occupation variables (group #5), 14–18% were identified solely by industry variables (group #7), and 2–3% were identified solely by task/tool/chemical variables (group #8). All three variable types combined (occupation, industry, and task/tool/chemical; group #2) identified possibly exposed jobs for 8–12% of the jobs. For jobs identified as possibly exposed based on occupation variables (groups #2, 3, 4, 5), 44–51% of the jobs also had possibly exposed task/tool/chemical variables identified [(#2+#4)/(#2+#3+#4+#5)]. In contrast, for jobs not identified as being in a possibly exposed occupation, a nontrivial 9–14% of the jobs had possibly exposed task/tool/chemical variables identified [(#6+#8)/(#1+#6+#7+#8)].

Table 5.

Comparison of possibly exposed TCE exposure variables to an expert’s assignment of TCE exposure from a one-by-one job review, by type of information (occupation, industry, and/or task/tool/chemical variable) identifying possible exposure

Type of possibly exposed exposure variable identified in the OH N jobs (% of all jobs) N jobs identified as TCE exposed by an expert (% of jobs in stratum)
1. No variables identified 5606 (46.7)  75 (1.3)
2. Occupation, industry, and task/tool/chemical 1326 (11.1)   671 (50.6)
3. Occupation and industry 1448 (12.1)   435 (30.0)
4. Occupation and task/tool/chemical   313 (2.6)   175 (55.9)
5. Occupation only   522 (4.4)   109 (20.9)
6. Industry and task/tool/chemical   315 (2.6)   115 (36.5)
7. Industry only 2190 (18.3)   184 (8.4)
8. Task/tool/chemical only   271 (2.3)  52 (19.2)
At least one variable identified (sum of #2–#8) 6385 (53.2) 1741 (27.3)

N, number of jobs.

We compared the prevalence of possibly exposed jobs to the prevalence of jobs assessed as TCE exposed by an expert, shown in the last column of Table 5. Overall, expert-assigned TCE ratings (≥1% probability) were rare for jobs for which we found no possibly TCE-exposed variables (group #1, 1.3%) and were more prevalent for jobs with at least one possibly TCE-exposed variable (last row, 27.3%). The highest proportion of nonzero expert-based probability estimates was observed for jobs identified as being possibly exposed from occupation and task/tools/chemical variables (group #4, 55.9%). The lowest proportion of nonzero expert-based probability estimates was observed for possibly exposed jobs identified solely by industry variables (group #7, 8.4%). When restricted to jobs without modules (N = 2682), for which the expert used only the OH responses to assign exposure, only two jobs (both barbers) for which there were no extracted exposure scenarios had been assigned as possibly TCE exposed by the expert (not shown).

DISCUSSION

The derived exposure variables from the SOC and SIC codes and the free-text responses to the task, tool, and chemical OH questions can serve several purposes in retrospective exposure assessment efforts. We developed these variables as a first step in deriving decision rules that incorporate both the OH information and job- and industry-specific module information in future exposure assessment efforts. In contrast, with the exception of Pronk et al. (2012), recently approaches to develop decision rules have focused solely on using the module information (Fritschi et al., 2009; Behrens et al., 2012; MacFarlane et al., 2012; Carey et al., 2014). The derived OH variables can also be used to extract decision rules using statistical learning models, as in Wheeler et al. (2013), to improve the transparency of the exposure decision process.

The prevalence of exposure scenarios varied by type of question. Possibly exposed jobs were most frequently identified from occupation and industry variables, reflecting in part the ease of deriving variables from the previously assigned SOC and SIC codes. Although less prevalent than the occupation and industry variables, the tasks/tools/chemical variables also helped identify possibly exposed jobs. Although the prevalence of possibly exposed tasks/tools/chemical variables was three to five times higher in jobs identified in possibly exposed occupations than in jobs in nonidentified occupations, a nontrivial 9–14% of the jobs in nonexposed occupations had at least one identified task/tool/chemical variable. This finding showed that relying only on occupation, such as is commonly done when JEMs are used, without considering reported tasks, tools, and chemicals, could result in exposure misclassification. The occupation, industry, and task/tool/chemical variables derived from the OH responses can be used together in future exposure assessments.

Support for deriving exposure variables from the tasks/tools/chemicals free-text responses was also found in our comparisons to the expert-assigned TCE estimates. For instance, for jobs without modules, combining the information from the occupation, industry, and task/tool/chemical variables missed capturing only two expert-assigned TCE-exposed jobs. Both of these jobs were barbers, who were assigned a very low probability of exposure to TCE by the expert. Even when module information was available to the expert, only 1% were missed by our process. Similarly, the prevalence of expert-assigned TCE-exposed jobs was generally higher when there was supportive information from the task, tool, and chemical questions compared to when only SOC or SIC was available. Our findings also show that jobs in possibly exposed industries have a much lower prevalence of exposure when there is no supporting information from the occupation or task/tool/chemical question. For instance, the prevalence of TCE-exposed jobs identified by the expert was only 8.4% when only a possibly exposed industry scenario was identified. This is not surprising since many jobs within an industry, such as administrative jobs, are not exposed.

This process has several limitations. First, we can reasonably anticipate, based on subject burden and recall (Teschke et al., 2002), that extracting information from the OH free-text responses captured, at best, only the most common and frequently occurring tasks, tools, and chemicals for a job. Rarely reported scenarios may have been missed and would have required reviewing each job/OH response one-by-one to ensure complete capture, but as suggested here few jobs may be missed. Second, although within-job differences in the identification of task/tool/chemical variables reflect the natural within-job variability in activities, they may also reflect potential subject-specific differences in the completeness and specificity of the OH responses. For example, the ‘solvent-containing chemical’ variable, which included paint and paint strippers, was identified in only 48% of the painter jobs (not shown). This likely underestimated reporting by painters may have occurred because the respondent felt the response was obvious (e.g. a painter uses paints), repeated information provided in previous questions (e.g. a painter mentioning painting for the task question, paint tools for the tool question, or paint for the chemical question), or because the respondent did not consider ‘paint’ a chemical (perhaps because of its commonness). As a result, one must not interpret a lack of an identified task, tool, or chemical as the absence of exposure for a job without considering underreporting by subjects, potential for lower frequency events that may not have been mentioned, lack of the participants’ understanding of what was wanted, and recall bias, which must be considered in all exposure assessment approaches for population-based studies. Lastly, using the four-digit SOC and SIC codes to identify possibly exposed occupations may result in some misclassification over using the actual job title because of the often heterogeneous nature, with regard to exposure, of the SOCs and SICs.

Our approach serves as a methodological framework when valuable occupational information is present in the form of free-text responses to open-ended questions but does not cover all exposure agents, possibly exposed scenarios, or potential key words. Developing the key word lists to extract the free-text responses was a moderately time-consuming task. Improved efficiency comes from the ability to apply these key word lists to other studies to evaluate the same agent. However, future work is needed to evaluate the resources that would be needed to apply these key word lists to other studies. We expect that future users of these key word lists may need to add other key words and variables that are relevant to their study population. For example, because this study included two urban centers, including Detroit, the study population overrepresented US automobile industry workers and underrepresented rural or other geographic-specific industries. Similarly, future work is needed to evaluate the usefulness of this approach for other exposures. We expect that it will be most useful to extract exposure scenarios from the task/tool/chemical variables when the exposure is reasonably prevalent and for which related activities will be more likely reported by the study subjects (e.g. diesel exhaust, metals, and solvents). In contrast, our approach will be of limited use to extract information for rarer exposures or exposures from infrequent tasks whose exposure scenarios are less likely to be mentioned in the free-text information (e.g. polychlorinated biphenyls).

In summary, in this study, we used SIC and SOC codes and extracted free-text responses to the task/tool/chemical questions in the OH questionnaires to derive exposure variables that can be used in future exposure assessment efforts. This approach serves as a first step in our ability to include the OH responses when developing transparent, programmable decision rules to estimate occupational exposure. In addition, the process will help identify, for jobs with only OH responses, important differences in exposure between people with the same job title that would not be captured by a job exposure matrix. The key word list may also assist investigators of other studies in using the OH responses in their exposure assessment efforts.

SUPPLEMENTARY DATA

Supplementary data can be found at http://annhyg.oxfordjournals.org/.

FUNDING

Intramural Research Program of the Division of Cancer Epidemiology and Genetics , National Cancer Institute, National Institutes of Health (Z01 CP10122-19; Z01 CP010136-19).

Supplementary Material

Supplementary Data

ACKNOWLEDGEMENTS

We thank industrial hygienists Elizabeth Boyle, Pabitra Josse, and Susan Viet of Westat for developing the lists of key words, phrases, and strings used to extract occupational information from the free-text responses.

REFERENCES

  1. Behrens T, Mester B, Fritschi L. (2012). Sharing the knowledge gained from occupational cohort studies: a call for action. Occup Environ Med; 69: 444–8. [DOI] [PubMed] [Google Scholar]
  2. Bouyer J, Hemon D. (1993). Retrospective evaluation of occupational exposures in population-based case-control studies: general overview with special attention to job exposure matrices. Int J Epidemiol; 22 (Suppl. 2): S57–64. [DOI] [PubMed] [Google Scholar]
  3. Carey RN, Driscoll TR, Peters S, et al. (2014). Estimated prevalence of exposure to occupational carcinogens in Australia (2011-2012). Occup Environ Med; 71: 55–62. [DOI] [PubMed] [Google Scholar]
  4. Colt JS, Schwartz K, Graubard BI, et al. (2011). Hypertension and risk of renal cell carcinoma among white and black Americans. Epidemiology; 22: 797–804. [DOI] [PMC free article] [PubMed] [Google Scholar]
  5. Fritschi L, Friesen MC, Glass D, et al. (2009). OccIDEAS: retrospective occupational exposure assessment in community-based studies made easier. J Environ Public Health; 2009: 957023. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Kromhout H, Vermeulen R. (2001). Application of job-exposure matrices in studies of the general population: some clues to their performance. Eur Respir Rev; 11: 80. [Google Scholar]
  7. Macfarlane E, Benke G, Sim MR, et al. (2012). OccIDEAS: An innovative tool to assess past asbestos exposure in the Australian Mesothelioma Registry. Saf Health Work; 3: 71–6. [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Office of Management and Budget. (1987). Standard industrial classification manual. Washington, DC: Executive Office of the President. [Google Scholar]
  9. Pronk A, Stewart PA, Coble JB, et al. (2012). Comparison of two expert-based assessments of diesel exhaust exposure in a case-control study: programmable decision rules versus expert review of individual jobs. Occup Environ Med; 69: 752–8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  10. Purdue MP, Colt JS, Graubard B, et al. (2011). A case-control study of reproductive factors and renal cell carcinoma among black and white women in the United States. Cancer Causes Control; 22: 1537–44. [DOI] [PMC free article] [PubMed] [Google Scholar]
  11. Stewart WF, Stewart PA. (1994). Occupational case-control studies: I. Collecting information on work histories and work-related exposures. Am J Ind Med; 26: 297–312. [DOI] [PubMed] [Google Scholar]
  12. Teschke K, Olshan AF, Daniels JL, et al. (2002). Occupational exposure assessment in case-control studies: opportunities for improvement. Occup Environ Med; 59: 575–93; discussion 594. [DOI] [PMC free article] [PubMed] [Google Scholar]
  13. US Department of Commerce. (1980). Standard occupational classification manual. Washington, DC: Office of Federal Statistical Policy and Standards. [Google Scholar]
  14. Wheeler DC, Burstyn I, Vermeulen R, et al. (2013). Inside the black box: starting to uncover the underlying decision rules used in a one-by-one expert assessment of occupational exposure in case-control studies. Occup Environ Med; 70: 203–10. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Data

Articles from Annals of Occupational Hygiene are provided here courtesy of Oxford University Press

RESOURCES