Abstract
Few studies have evaluated the validity of self-report of work activities because of challenges in obtaining objective measures. In this study, farmers’ recall of the previous day’s agricultural activities was compared to activities observed by field staff during air monitoring.
Recall was assessed in 32 farmers from the Biomarkers of Exposure and Effect in Agriculture Study, a subset of a prospective cohort study. The farmers participated in 56 visits that comprised air monitoring the day before an interview. The answers for 14 agricultural activities were compared to activities observed by field staff during air monitoring (median duration 380 minutes, range 129–486). For each task, evaluated as yes/no, overall agreement, sensitivity, specificity, and kappa were calculated. Median prevalence of the 14 activities was 8% from observation and 13% from participants (range: 2–54%). Agreement was generally good to perfect, with a median overall agreement of 95% (range: 89–100%), median sensitivity of 84% (50–100%), median specificity of 95% (88–100%), and median kappa of 0.65 (0.31–1.0). Reasons for disagreement included activities occurring when the field staff was not present (i.e., milking cows), unclear timing notes that made it difficult to determine whether the activity occurred the day of and/or day before the interview, definition issues (i.e., participant included hauling in the definition of harvesting), and difficulty in observing details of an activity (i.e., whether hay was moldy). This study provides support for accurate participant recall the day after activities.
Keywords: occupational questionnaires, farming tasks, recall validity
INTRODUCTION
Most evaluations of subject-reported work information in questionnaires have focused on assessing the repeatability of their responses across multiple administrations of the same questionnaire or comparing the responses to expert judgment (Friesen et al. 2015; Ngabirano et al. 2020; Teschke et al. 2002). Assessment of the responses’ validity has been rare because of challenges in obtaining objective measures for comparison purposes.
The few available studies have generally found good to high accuracy. For example, good accuracy has been observed in reported job title, job start, and job stop dates in comparison to employer or registry records (Teschke et al. 2002; Hobson et al, 2009). Similarly, self-reported responses to questions on hand hygiene (i.e., dirtiness of hands, glove use duration, glove type) were highly correlated with workplace observations (kappa range 0.75–0.97) (Timmerman et al. 2014). Compared to measures based on an accelerometer, questionnaire-derived metrics of physical activity duration and energy expenditure had good to high correlation (Pearson correlation = 0.57 and 0.88, respectively) (Welk et al. 2014). Self-reported exposures to work postures and manual materials handling had good agreement with observations for most dichotomous metrics, but not for duration measures (Wiktorin et al. 1993). The decade of first use and duration of use reported by licensed pesticide applicators for several pesticide active ingredients compared reasonably well with registration information from the US Environmental Protection Agency (Hoppin et al. 2002). In contrast, a study of drivers and nurses found poor agreement with observations for self-reported durations of work tasks, activities, and postures of the trunk (Van der Beek et al. 1994).
This study provided a rare opportunity to evaluate short-term recall of farmers’ agricultural activities using detailed activity logs completed by field staff during air monitoring conducted the day before the farmer participated in a comprehensive interview.
METHODS
The study population included 32 farmers in Iowa who had participated in the previously described bioaerosol air monitoring study (Sauvé et al. 2020). They were a subset of the Biomarkers of Exposure and Effect in Agriculture (BEEA) Study (Hofmann et al. 2015). Their mean age was 59 years old (range 50 to 80). They participated in paired two-day visits between September 2015 and October 2016 that comprised bioaerosol air monitoring the day before a home visit, which included a comprehensive interviewer-administered questionnaire and biospecimen collection. Twenty-four farmers (75%) participated twice, once in the fall of 2015 or 2016 and once in the spring of 2016 - for a total of 56 paired visits. A typical monitoring day occurred from 8:00 to 14:00, with some visits starting at 6:00 and others ending at 17:00 (median duration 380 minutes, range 129 to 486) because of the distance between the farm and laboratory. As a result, the majority of a farmer’s time spent farming, but not all, was captured; thus, the observations may have been incomplete and not necessarily a perfect gold standard. During the monitoring day, the research team noted in an activity log the details of every activity undertaken during the monitoring period and took time-stamped photographs.
The extensive questionnaire included 18 agricultural activities that asked whether the activity was conducted ‘yesterday or today’; for most activities they were also asked its duration. The analyses were restricted to 14 activities with ≥3 self-reports of the activity. Because the question time frame was ‘yesterday or today’, we reviewed detailed interviewer comments regarding whether the activity occurred ‘yesterday’, ‘today’ or ‘both days’ to extract whether the activity occurred on the bioaerosol monitoring day; the activity was considered self-reported for the prior day if it was noted as ‘yesterday’ or ‘both days’. One study team member reviewed detailed logs of the farmers’ activities completed by the air monitoring staff, as well as accompanying time-stamped photographs, to identify whether these 14 activities were observed (yes/no), independent of the participant response. Then, if the initial review of the observation and the farmer’s response did not agree, four team members re-reviewed the activity log and photos together and came to a consensus on the activity’s occurrence. The overall agreement, sensitivity, and specificity was calculated for each task using the observed activity as the ‘gold standard’. We also calculated kappa assuming neither was the gold standard.
RESULTS
Median prevalence of the 14 activities was 8% from observation and 13% from participant responses (Table 1). Prevalence of observed activities ranged from 2% for ‘working with or around moldy hay or straw’ to 54% for ‘working with or around stored seed or grain’. Across all paired visits, agreement was mostly good to high, with a median overall agreement of 95% (range: 89–100%), median sensitivity of 84% (range: 50–100%), median specificity of 95% (88–100%), and median kappa of 0.65 (0.31–1.0). For 8 activities, the prevalence of self-report was higher than the expert observation. Five of these activities had sensitivities greater than 90%, indicating that those activities observed by the field staff were also reported by the subjects and thus the higher prevalence for self-report may reflect unobserved activities. ‘Working with or around stored seed or grain’ was the most prevalent activity by both self-report and expert observation; however, it had only moderate sensitivity (53%) and had the lowest specificity of all activities (88%). Moderate sensitivity (50–67%) was also observed for three low prevalence animal activities (swine confinement; swine feed; ground animal feed). Activities with less than 80% sensitivity usually had 25% or more of the participants reporting that the activities were on average less than 30 minutes in duration. Ten activities had kappa values >0.6 and six activities had kappa values >0.75, indicating good to high agreement between reported and observed activities.
TABLE 1.
Agreement between self-reported and observed activities for 32 farmers with 56 paired visits
| Activity (If yes, proportion reporting activity duration of <30 min)A | Self-report N Yes (%) | Expert observation N Yes (%) | Agreement (%) | Sensitivity (%) | Specificity (%) | Kappa | 
|---|---|---|---|---|---|---|
| Harvested grain/soybeans/corn- field/corn seed (>0-<30 min: n.a.) | 13 (23) | 12 (21) | 95 | 92 | 95 | 0.85 | 
| Hauled grain/soybeans/corn- field/corn seed (>0-<30 min: 0%) | 17 (30) | 13 (23) | 93 | 100 | 91 | 0.82 | 
| Baled alfalfa or hay seed (>0-<30 min: n.a.) | 3 (5.4) | 2 (3.6) | 98 | 100 | 98 | 0.79 | 
| Hauled alfalfa or hay seed (>0-<30 min: 41%) | 9 (16) | 8 (14) | 91 | 75 | 94 | 0.65 | 
| Worked with or around stored seed or grain (>0-<30 min: 27%) | 19 (34) | 30 (54) | 70 | 53 | 88 | 0.41 | 
| Cleaned grain bins (>0-<30 min: 16%) | 7 (13) | 5 (8.9) | 93 | 80 | 94 | 0.63 | 
| Spent time in swine- confinement area (>0-<30 min: 14%) | 7 (13) | 8 (14) | 98 | 88 | 100 | 0.92 | 
| Spent time cleaning the swine confinement area seed (>0-<30 min: 67%) | 3 (5.4) | 3 (5.4) | 96 | 67 | 98 | 0.65 | 
| Spent time mixing swine feed and feeding swine seed (>0-<30 min: 50%) | 3 (5.4) | 2 (3.6) | 95 | 50 | 96 | 0.37 | 
| Ground animal feed (>0-<30 min: 45%) | 3 (5.4) | 3 (5.4) | 96 | 67 | 98 | 0.65 | 
| Milked cows or other animals (>0-<30 min: 77%) | 7 (13) | 3 (5.4) | 89 | 67 | 91 | 0.35 | 
| Cleaned barns, animal- confinements or replaced animal bedding in an indoor facility (>0-<30 min: 27%) | 12 (21) | 9 (16) | 95 | 100 | 94 | 0.83 | 
| Worked with or around moldy hay or straw (>0-<30 min: 58%) | 5 (8.9) | 1 (1.8) | 93 | 100 | 93 | 0.31 | 
| Worked around wood dust seed (>0-<30 min: 0%) | 4 (7.1) | 4 (7.1) | 100 | 100 | 100 | 1.0 | 
| Median agreement across all activities B | 95 | 84 | 95 | 0.65 | 
Proportion reporting duration of >0 - <30 minutes; remainder reported durations >30 minutes
min = minutes; n.a. indicates information was not available in questionnaire
The median reported here is the median of the 14 values reported in the rows above.
DISCUSSION
This study provides evidence that farmers had good to very good recall of farming activities conducted the prior day, consistent with the above-mentioned previous evaluations using objective measures (Timmerman et al. 2014; Welk et al. 2014; Wiktorin et al. 1993; Hoppin et al. 2002). Activities with lower sensitivities may be in part due to activities that were shorter in duration, which may have poorer recall or may not have been observed; however, there was insufficient power to examine this in detail. True agreement may have been higher, as shown by excellent sensitivities of activities that had higher prevalence in self-report versus observation, since the observation period did not cover the entire farming day.
Extrapolating whether the 14 activities occurred from the free-text activity logs was a laborious process and required substantial expertise in agricultural activities. To provide consistency but reduce our efforts, one team member reviewed all activity logs and only discordances between self-report and the initial review were examined by the full team. This approach provided a double-check whether the activity log may have been mis-characterized and leveraged the extensive agricultural exposure expertise of the entire team. Review of discordances identified several potential explanations that suggest some were not due to recall. There was occasional unclear timing of an activity noted in an interviewer’s comments, especially with animal-related activities that were likely conducted daily. For example, an interviewer note of ‘today’ for a frequent activity may have also included ‘yesterday’. There were possible differences in how the participant and study team defined an activity. For example, the study team did not include transport of the crop to storage in the definition of harvesting, but it may have been included in the participants’ definition. Similarly, participants may have included moving seed as hauling, whereas the study team restricted the definition of hauling to moving harvest crop to storage. There may have also been difficulty in observing some details of an activity or some details may not have been noted by field staff (e.g., whether the hay was moldy).
This study has several limitations. First, the incomplete observation period may underestimate true agreement, as shown by excellent sensitivities of activities that had higher self-reported vs. observed prevalence. Second, the population was a small, relatively homogeneous group of male farmers in Iowa of similar ages. Recall may differ by several factors, including gender, age, time since exposure, and socio-demographic characteristics (Friesen et al. 2015). Third, in overall analyses, the observations were treated as independent, despite the repeated visits, because of the small sample size and low prevalence of most activities. However, the median overall agreement (96%) and sensitivity (100%) did not change across visits (not shown). Fourth, only dichotomous metrics were compared, as there was insufficient prevalence to evaluate the reported activity duration that was available for some activities. Fifth, the participants may have had better recall of their previous day’s activities because of being observed during the air monitoring study; however, neither the participants nor the field staff were aware that interview answers to the field observations would be compared. Finally, the evaluations focused on the previous day and may not hold for longer time scales. However, a study of construction workers found that the percentage of time at tasks reported during a six month interview had a median accuracy of 91% (range 52–100%) when compared to daily activity reports completed within those six months; moreover, estimates of noise exposure based on long-term recall were strongly correlated with estimates derived from the daily activity logs (Reeb-Whitaker et al. 2004). Similarly, the reproducibility of self-reported information on pesticide use on questionnaires administered one-year apart had agreement typically ranging from 70 to 90% for ever-/never-use of specific pesticides and moderate agreement (50–60%) for duration, frequency, or decade of first use (Blair et al. 2002).
In conclusion, this study provides evidence that farmers can accurately recall their previous day activities using interviewer-assisted questionnaires. Opportunities to validate the accuracy of recall of work tasks over longer time periods, as well as to evaluate the impact of task duration on recall, remains an important but challenging research need.
ACKNOWLEDGMENTS
Amy Miller, Kate Torres, Sarah Woodruff, Marsha Dunn, and other staff at Westat, Inc. (Rockville, MD) contributed to the study coordination and data management; Susan Viet contributed to the expert review of the activity logs. We also thank Anne Taylor from Information Management Services, Inc. (Rockville, MD) for data management support, and the field research team in Iowa, including Charles Lynch, Debra Lande, Debra Podaril, and Jennifer Hamilton. Finally, we gratefully acknowledge the participation of the Biomarkers of Exposure and Effect in Agriculture Study participants that made this work possible.]
Funding for this work was provided by the Intramural Research Program of the National Institutes of Health (Z01CP010119), with support from National Cancer Institute Director’s Intramural Innovation Award Program. Felicia Hung was funded by the Yale School of Public Health summer scholarship funding.
Data availability statement
Data available on request from the authors.
REFERENCES
- Blair A, Tarone R, Sandler D, Lynch CF, Rowland A, Wintersteen W, et al. 2002. Reliability of reporting on life-style and agricultural factors by a sample of participants in the agricultural health study from Iowa. Epidemiol. 13(1):94–99. [DOI] [PubMed] [Google Scholar]
 - Friesen M, Lavoue J, Teschke K, Van Tongeren M 2015. Occupational exposure assessment in industry- and population-based epidemiological studies. In: Nieuwenhuijsen MJ (ed.). Exposure Assessment in Environmental Epidemiology. 2nd ed. Oxford, England: Oxford University Press. p. 139–162. [Google Scholar]
 - Hobson AJ, Sterling DA, Emo B, Evanoff BA, Sterling CS, Good L, et al. 2009. Validity and reliability of an occupational exposure questionnaire for parkinsonism in welders. J. Occup. Environ. Hyg 6(6):324–331. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Hofmann JN, Beane Freeman LE, Lynch CF, Andreotti G, Thomas KW, Sandler DP, et al. (2015) The Biomarkers of Exposure and Effect in Agriculture (BEEA) study: Rationale, design, methods, and participant characteristics. J Tox Environ Health Part A; 78: 1338–1347. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Hoppin JA, Yucel F, Dosemeci M, Sandler DP. 2002. Accuracy of self-reported pesticide use duration information from licensed pesticide applicators in the Agricultural Health Study. J. Expos. Analysis Environ. Epidemiol 12(5):313–318. [DOI] [PubMed] [Google Scholar]
 - Ngabirano L, Fadel M, Leclerc A, Evanoff BA, Dale AM, Roquelaure Y, et al. 2020. Comparison between a job-exposure matrix (JEM) score and self-reported exposures for carrying heavy loads over the working lifetime in the constances cohort. Ann. Work Expos. Health 64(4):455–460. [DOI] [PubMed] [Google Scholar]
 - Reeb-Whitaker CK, Seixas NS, Sheppard L, Neitzel R. 2004. Accuracy of task recall for epidemiological exposure assessment to construction noise. Occup Environ Med. 61(2):135–142. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Sauvé JF, Locke SJ, Josse PR, Stapleton EM, Metwali N, Altmaier RW, et al. 2020. Characterization of inhalable endotoxin, glucan, and dust exposures in Iowa farmers. Int. J. Hyg. Environ. Health 228:113525. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Teschke K, Olshan AF, Daniels JL, De Roos AJ, Parks CG, Schulz M, et al. 2002. Occupational exposure assessment in case-control studies: Opportunities for improvement. Occup. Environ. Med 59(9):575–593. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Timmerman JG, Zilaout H, Heederik D, Spee T, Smit LAM. 2014. Validation of a questionnaire on hand hygiene in the construction industry. Ann. Occup. Hyg 58(8):1046–1056. [DOI] [PubMed] [Google Scholar]
 - Van der Beek AJ, Braam IT, Douwes M, Bongers PM, Frings-Dresen MH, Verbeek JH, et al. 1994. Validity of a diary estimating exposure to tasks, activities, and postures of the trunk. Int. Arch. Occup. Environ. Health 66(3):173–178. [DOI] [PubMed] [Google Scholar]
 - Welk GJ, Kim Y, Stanfill B, Osthus DA, Calabro MA, Nusser SM, et al. 2014. Validity of 24-h physical activity recall: Physical activity measurement survey. Med. Sci. Sports Exerc 46(10):2014–2024. [DOI] [PMC free article] [PubMed] [Google Scholar]
 - Wiktorin C, Karlqvist L, Winkel J. 1993. Validity of self-reported exposures to work postures and manual materials handling. Stockholm Music I Study Group. Scand. J. Work Environ. Health 19(3):208–214. [DOI] [PubMed] [Google Scholar]
 
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
Data available on request from the authors.
