Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2010 Sep 1.
Published in final edited form as: Ethics Behav. 2009 Sep 1;19(5):379–402. doi: 10.1080/10508420903035380

A Meta-Analysis of Ethics Instruction Effectiveness in the Sciences

Alison L Antes 1, Stephen T Murphy 2, Ethan P Waples 3, Michael D Mumford 4, Ryan P Brown 4, Shane Connelly 4, Lynn D Devenport 4
PMCID: PMC2762211  NIHMSID: NIHMS133081  PMID: 19838311

Abstract

Scholars have proposed a number of courses and programs intended to improve the ethical behavior of scientists in an attempt to maintain the integrity of the scientific enterprise. In the present study, we conducted a quantitative meta-analysis based on 26 previous ethics program evaluation efforts, and the results showed that the overall effectiveness of ethics instruction was modest. The effects of ethics instruction, however, were related to a number of instructional program factors, such as course content and delivery methods, in addition to factors of the evaluation study itself, such as the field of investigator and criterion measure utilized. An examination of the characteristics contributing to the relative effectiveness of instructional programs revealed that more successful programs were conducted as seminars separate from the standard curricula rather than being embedded in existing courses. Furthermore, more successful programs were case-based, interactive and allowed participants to learn and practice the application of real-world ethical decision-making skills. The implications of these findings for future course development and evaluation are discussed.

Keywords: ethics, scientific ethics, ethics instruction, ethics training, meta-analysis


Cases of scientific misconduct range from extremely serious events, such as blatant fabrication of study findings and harm to research participants (Resnick, 2003), to less serious, yet more prevalent, instances of misbehavior, such as inappropriate assignment of authorship and withholding details of methodology or results in publications (Martinson, Anderson, & De Vries, 2005). Regrettably, instances of misconduct undermine progress in science and, moreover, create a sense of distrust for science among the public and breed distrust within the scientific community (Abbott, 1999; Friedman, 2002; Kalichman, 2007). As the nature of science continues to become increasingly competitive, interdisciplinary, and global, not only do new ethical considerations enter the field, but the implications of scientific misconduct become even more significant. Thus, it is not surprising that the scientific community is paying a great deal of attention to understanding unethical behavior in scientific work and what might be done to manage it.

Among the most commonly suggested remedies for addressing this growing concern is to provide ethics education to scientific researchers and practitioners. In fact, some institutions have implemented mandatory ethics instruction in an attempt to manage scientific misconduct (e.g., Barnes, Friedman, Rosenberg, Russell, Beedle, & Levine, 2006). Moreover, funding agencies, such as the National Institutes of Health, now mandate that scientists complete an instructional program in the responsible conduct of research in order to be eligible for funding under their sponsorship (Dalton, 2000). Given the widespread application of instruction in ethics as a potential solution for misbehavior in the sciences, not to mention the substantial time and resources required for the development and implementation of instructional programs, a critical question arises: Are such programs effective?

Although addressing this question has been of primary concern for researchers in the field of ethics, the approaches to designing and assessing instructional programs in ethics have been quite varied. Consequently, although there appears to be a general consensus about the importance of ethics education for researchers and scientists, there is little agreement about the most effective approach to instruction, or even the most appropriate goals for these programs (Kalichman, 2007; Steneck & Bulger, 2007). Moreover, evaluation studies have reported mixed findings regarding the effectiveness of instruction. Some ethics courses have been shown to induce the desired effects, whereas others indicate little or no effects of ethics instruction on learning outcomes (Kalichman & Plemmons, 2007).

The intent of the present study was to provide a comprehensive examination of ethics instruction effectiveness. Meta-analytic procedures were used to quantitatively assess prior program evaluation efforts. In addition to addressing the general effectiveness of ethics instruction, key characteristics of instructional programs and evaluation efforts that may be associated with degrees of effectiveness were identified. Before turning to the specifics of the present study, we first consider some key issues with respect to approaches to ethics instruction.

Ethics Instruction in the Sciences

As noted above, researchers have taken several approaches to the design of ethics instruction. Distinctions between instructional approaches are important because they point to a fundamental issue with respect to ethics education. Specifically, alternative approaches reflect differences in the frameworks being applied for understanding ethical behavior. These frameworks lead to different assumptions about how ethical behavior might be improved and ultimately lead to differences in the goals and design of instructional courses (Tannenbaum & Yukl, 1992).

A number of ethics instructional courses in the sciences rely on Kohlberg’s (1969) and Rest’s (1986) models of moral development and moral reasoning. Although these models are commonly referenced for constructing and conducting instructional programs in ethics, the implementation of the programs conducted under these frameworks vary rather widely. These differences seem to arise as a function of how the models are interpreted and what aspects are specifically emphasized. For instance, one approach to ethics instruction has been to emphasize ethical sensitivity. Rest (1986) asserted that ethical sensitivity is an awareness of the ethical implications of a situation and involves empathetic understanding of how others might be affected by the situation. Instructional programs based on ethical sensitivity assume that improving ethical behavior rests in enhancing scientists’ ability to recognize the presence of an ethical problem, as this is the first step in real-world ethical decision-making (Clarkeburn, 2002; Clarkeburn, Downie, & Matthew, 2002; Myyry & Helkama, 2002).

Another common approach to ethics instruction has emphasized the developmental nature of ethical behavior. In this approach, students may be merely exposed to the regular curriculum with the expectation that general education in health and medicine, for instance, might implicitly advance students’ level of moral development (e.g., Bebeau & Thoma, 1994; Duckett, Rowan, Ryden, Krichbaum, Miller, Wainwright, & Savik, 1997; Self, Schrader, Baldwin, & Wolinsky, 1993). Other programs operating under this developmental framework have emphasized the abstract, philosophical nature of moral dilemmas (e.g., Goldman & Arbuthnot, 1979; Penn, 1990). The intent of such courses is to promote students’ progress to an advanced stage of moral development by shifting one’s thinking about abstract moral dilemmas to a more sophisticated level. In turn, it is believed that a higher level of moral development will translate into improved moral reasoning and ethical behavior.

Other ethics instructional programs, more directly emphasize the cognitive nature of moral reasoning. This approach focuses less on the philosophical nature of moral dilemmas and ascending to a higher level of moral development, and puts greater emphasis on the need to think through and analyze complex ethical problems before responding (e.g., Frisch, 1987; Gaul, 1987). The underlying assumption of this approach is that moral reasoning is a function of how one thinks through an ethical problem. Thus, ethical behavior improves as ethical problem-solving and decision-making skills are enhanced (Gawthrop & Uhlemann, 1992).

In line with this assertion, some researchers have emphasized the importance of understanding the cognitive nature of ethical decision-making (e.g., the specific processes underlying it) along with individual, situational, and organizational influences on these processes (Antes, Brown, Murphy, Waples, Mumford, Connelly, & Devenport, 2007; Treviño, 1986; Jones, 1991; O’Fallon & Butterfield, 2005; Treviño, Weaver, Reynolds, 2006). Recently, scholars have argued that these rational approaches may be made even more complete by focusing not only on cognitive processes of ethical decision-making but also on social-psychological processes and the emotional nature of ethical problem-solving (Haidt, 2001; Sonenshein, 2007).

It is important to note at this juncture that, although many programs follow the common themes noted above, ethics courses do not always fit neatly into clear categories. Rather, some courses employ a mixture of themes for ethics education (e.g., Ryden & Duckett, 1991). Given that differing frameworks for developing ethics instruction have led to a number of differences in instructional programs, we examined a host of characteristics of instructional design and delivery that might impact the effectiveness of ethics courses. Moreover, aspects of an evaluation study itself that might impact the observed effectiveness of instruction were also examined. In the following section, we outline several plausible moderators of instructional effectiveness.

Potential Moderators of Instructional Effectiveness

Although an understanding of instructional effectiveness in general is of value, identification of moderating variables linked to the effectiveness of instruction provides practical guidance for the design and delivery of instruction. Therefore, based on the above mentioned distinctions in ethics instruction, in addition to recommendations offered by experts in instructional design and evaluation (c.f. Goldstein & Ford, 2002; Wexley & Latham, 2002), seven categories of factors that might account for differences in the effectiveness of ethics instruction were examined. These categories of factors likely to influence the success of ethics instruction included: 1) criterion type, 2) study design characteristics, 3) participant characteristics, 4) quality ratings, 5) instructional content, 6) general instructional characteristics, and 7) characteristics of instructional methods.

Criterion type

A distinction between types of criteria used to examine the effects of ethics instruction was included in the analysis. The criterion measure selected to assess instruction should reflect the intended outcome of the instructional program (Kraiger & Jung, 1996). For example, if the program is intended to enhance ethical sensitivity, then ethical sensitivity is the most appropriate criterion. Given the widespread application of Kohlberg’s (1969) and Rest’s (1986) models of cognitive-moral development, their measures of moral reasoning, or moral judgment, have been the most commonly applied instruments for assessing instructional effectiveness. More specifically, Kohlberg (1976) developed a measure of moral judgment called the Moral Judgment Scale (MJS). Rest (1974; 1976; 1988) constructed a measure of moral development (the Defining Issues Test; DIT) intended to address the shortcomings of the complex coding procedure used for Kohlberg’s MJS. The DIT requires an individual to select responses to six moral problems. Examining different criteria types, such as ethical sensitivity and moral reasoning, makes possible an examination of the relative differences in the impact of ethics instruction on these outcomes.

Study design characteristics

Characteristics of the design of the evaluation study may impact the size of the effects observed for instruction (Goldstein & Ford, 2002). For example, the type of design employed in the study (e.g., pre-post, pre-post with control, post only, or longitudinal) and the sample size can systematically influence the observed effectiveness of ethics programs (Kirk, 1995). Moreover, factors related to the design of the study have implications for the internal validity of the study and thus the value of any conclusions drawn from it. For example, whether the author of the study was involved in instructing the course might introduce bias or demand characteristics (Cook & Campbell, 1979) that ultimately influence observed effects. Along related lines, another variable that might be important for the validity of conclusions drawn from these studies is whether or not the study was externally funded. In fact, externally funded studies are generally more likely to produce larger effects (Conn, Valentine, Cooper, & Rantz, 2003).

Participant characteristics

Characteristics of the participants may play a role in the effectiveness of instruction and, therefore, may have implications for the generalizability of findings regarding ethics instruction effectiveness. For instance, participants’ career stage and field of study may impact ethical reasoning and responses to ethics instruction (Mumford, Connelly, Murphy, Devenport, Antes, Brown, Hill, & Waples, in press; Weeks, Moore, McKinney, & Longenecker, 1999). In addition, several studies suggest that gender and age may influence ethical attitudes and behavior (Borkowski & Ugras, 1998; Ruegger & King, 1992; Weeks, et al., 1999). Moreover, whether or not participants received an incentive to complete the ethics course might be associated with differences in motivation for completing the course and thus reactions to, and outcomes of, ethics instruction (Colquitt & Simmering, 1998).

Quality ratings

The impact of general quality variables on instructional effectiveness was examined (Scott, Leritz, & Mumford, 2004a). Differences across studies in the overall quality of the instructional program, the overall quality of the study design, and the overall quality of the criteria utilized were captured via three subjective quality ratings assigned by expert raters. A more detailed description of these ratings will be provided in the method.

Instructional content

As noted previously, the approach taken to instruction creates rather significant variations in instructional content. For instance, courses may or may not cover domains of ethical practice (e.g., objectivity, conflicts of interest) and ethical standards (e.g., avoiding harm, maintaining confidentiality). Additionally, the focus of skills to be learned may be primarily cognitive (e.g., moral reasoning or ethical decision-making) or social-interactional (e.g., ethical sensitivity). Moreover, the domain-specificity of the skills taught may differ (Perkins & Salomon, 1989; Smith, 2002). For instance, skills may be taught in a global manner, focusing on skills that apply generally to ethics across domains, or they may be covered in a domain-specific manner. Domain-specific skills are limited to ethical considerations in a specific domain, such as nursing or psychology. In addition, covering reasoning errors that hinder ethical decision-making and reasoning strategies that help people work through the complexities common to real ethical problems may improve ethical decision-making (Kligyte, Marcy, Waples, Sevier, Godfrey, Mumford, & Hougen, 2008; Mumford, Connelly, Brown, Murphy, Hill, Antes, Waples, & Devenport, in press). Thus, we examined whether there was an association between training effectiveness and the coverage of reasoning errors and/or strategies.

General instructional characteristics

In addition to instructional content, general characteristics of the instructional environment might moderate the effectiveness of ethics instruction. For example, organizational support for the program might impact instructional effectiveness (Baldwin & Ford, 1988; Hung & Wong, 2007), thus whether the instructional program was supported by the organization was examined. In addition, we examined whether instruction conducted in a traditional classroom setting differed from instruction administered as a separate activity in a seminar or workshop setting. Additionally, the general purpose of the instructional program (e.g., standard education versus experimental investigation) might be associated with effectiveness.

Characteristics of instructional methods

The delivery approach for learning experiences (e.g., opportunities for application, interaction, and involvement) proves a critical influence on instructional effectiveness (Fink, 2003). Thus, characteristics of learning and practice activities, for instance, whether practice activities included a single type or multiple activities, were examined. In addition, participant interaction during instruction might be limited, for instance in courses structured primarily around lecture, or extensive, such as in courses utilizing role-play activities and group discussion. The level of participant interaction is likely to influence engagement and thus learning outcomes (Slavin, 1996). We now turn to the meta-analytic procedure applied in the present study to address two overarching research questions: 1) How effective is ethics instruction in the sciences?, and 2) What characteristics are associated with the effectiveness of ethics instruction in the sciences?

Method

Literature Search

To identify potential studies for inclusion in the meta-analysis, an extensive literature search was conducted. First, we identified any major review articles pertaining to ethics and ethics education in the sciences. Second, journals pertaining to ethics in research and the sciences were searched. For instance, these journals included the following: Accountability in Research, Ethics and Behavior, Science and Engineering Ethics, Journal of Moral Education, Journal of Medical Ethics, and Nursing Ethics. In addition, we identified journals associated with higher education and education in the sciences to search for additional articles pertaining to training and instruction in ethics. Some of these journals included the following: Teaching Higher Education, Academic Medicine, Studies in Higher Education, Journal of Further and Higher Education, Teaching of Psychology, Medical Education, and Journal of Nursing Education.

Following this search of specific journals, we explored major databases, such as PsycINFO, ERIC, Academic Search Elite, Blackwell-Synergy, Chronicle of Higher Education, EBSCO Collection, Health Source: Nursing/Academic Edition, LexisNexis Academic, MEDLINE, and Professional Development Collection, using targeted search terms including, but not limited to the following: “ethics training”, “responsible conduct of research training”, “moral development training”, “ethics education”, “ethics instruction”, and “training and professional ethics”.

After obtaining the studies identified in these searches, their reference sections were searched for additional ethics training and instruction articles that might be included in this study. In order to address the file drawer problem (Rosenthal, 1979; Hunter & Schmidt, 2004), we also searched Dissertation Abstracts International, a database of unpublished dissertations. In addition, we posted an announcement on the online message boards of organizations committed to ethics and responsible conduct of research training and education (e.g., Responsible Conduct of Research Education Committee). This announcement asked for individuals with available instructional evaluation data who were willing to participate in a meta-analytic study of ethics instruction effectiveness to contact us via email with their data and materials describing their instructional course or program. This initial search for instructional evaluation articles and unpublished studies resulted in 140 studies that were candidates for potential inclusion in the meta-analysis.

Inclusion Criteria

Several criteria were applied to determine which studies would be included in the meta-analysis. First, each article was required to include an empirical investigation of the effectiveness of some type of ethics education effort for scientists or researchers. Ethics education was defined as any instructional program or course, including single courses in ethics, multiple courses in a sequence covering ethics, or an entire curriculum, spread over time, that addressed scientific, research, or medical ethics. It is of note that about half of the initially identified studies were not included because they did meet the first inclusion criteria. Although these studies discussed ethics instruction or training in some fashion, they were not empirical investigations of an evaluation effort. Thus, approximately 70 remaining studies were subjected to the remaining two inclusion criteria.

Second, if the study discussed an evaluation effort, the researchers must have included, at a minimal level, descriptions of both the general instructional approach and an ethics related outcome measure. Third, and most importantly, the article had to report appropriate descriptive (e.g., M, SD) and/or inferential (e.g., F, t, X2) statistics in order to calculate the effect size, or d statistic. We utilized the effect size formulas recommended by Arthur, Bennett, and Huffcutt (2001) in order to calculate d statistics.

Before calculating the d statistics, the independence (or non-independence) of data points was considered. Here, we first determined if the effect size to be computed from the reported statistics would be distinct (independent) from other effect sizes produced from the same dataset. For instance, if an article produced an effect for ethical sensitivity and ethical decision-making, these effects were considered independent. Second, we determined if the effect sizes from an article represented one construct or multiple constructs. For example, if an article reported multiple effects for moral reasoning (e.g., one effect for the MJS and one effect for the DIT), these effects were combined to avoid problems caused by data dependency.

In addition to determining the dependency of the data, we corrected, where possible, each effect size for measurement error. For example, where the Defining Issues Test was used, we used a reliability coefficient of 0.76 (Rest, 1979) in order to correct the computed d statistic. As suggested by Arthur et al. (2001) and Hunter and Schmidt (2004), the formula used to correct for unreliability specified that the effect size should be divided by the square root of the criterion reliability.

Following the application of these remaining inclusion criteria and the calculation of d statistics, we were left with 26 independent effect sizes drawn from 20 empirical studies involving 3,041 individuals. As may be seen in the results tables, however, the total number of effect sizes (k) and the sample size (N) for the subsequent moderator analyses was typically less than that of the overall effect size estimates. This reduction in k and N size across the moderator analyses can be accounted for by the fact that a number of moderator variables were uncodeable based on the information provided in the articles. Thus, all effect size estimates were included in the overall analysis of instructional effectiveness, but only those with codeable, or non-missing data for a given moderator, were included in the subsequent moderator analyses.

Content Coding Procedure

To examine the impact of relevant instructional characteristics and study characteristics on instructional effectiveness, all of the articles were content analyzed. Three industrial and organizational psychologists, familiar with the ethics literature and the training and instructional design literature, coded the articles for the meta-analysis. Each coder received approximately 30 hours of training in the coding process and the variable set to be coded. The coders utilized a detailed glossary containing definitions of all variables to be coded for reference throughout the coding process. For all variables, coders provided a rating only if the material was explicitly discussed in the article or could be reasonably inferred based on information provided. Otherwise, coders provided a missing data code.

After this initial introduction to the coding process, the coders made initial ratings for a set of 10 articles. Next, coders met to discuss any discrepancies in their ratings. Then, after demonstrating proficiency in the coding process, the judges coded the remaining articles independently. In order to ensure the accuracy of the data, the three coders held consensus meetings to discuss any discrepancies in the ratings obtained in order to reach consensus on their ratings. Prior to these meetings, the average inter-rater agreement across the seven broad coding dimensions was approximately 75%. Following these meetings, each data point entered in the analysis reflected almost complete agreement (i.e., inter-rater agreement of 95%). The specific variables coded in the content analysis consisted of the criterion used in the study which yielded the obtained effect size (i.e., d statistic) and the sets of potential moderators of instructional effect sizes mentioned in the introduction. We describe these variables in more detail below.

Coding Criteria

Previous researchers have used different criteria to evaluate the effectiveness of ethics instructional programs. Thus, in coding the effect size estimates for each independent data point, we recorded the criterion applied to assess the effects of instruction. Due to the limited number of effect sizes available for analysis, we collapsed the criteria into two broad types. Specially, the two types were “moral development” criteria (e.g., the Defining Issues Test and Kohlberg’s Moral Judgment Scale) and “ethical analysis” criteria (e.g., ethical decision-making and ethical sensitivity). We also report the results specifically for the DIT and MJS, along with an aggregate of the ethical decision-making measures and an aggregate of the ethical sensitivity measures. Moreover, all criterion types were aggregated into one overall effect size estimate for instructional effectiveness. Alongside these considerations in our coding of criteria, we also coded whether the article reported the reliability of the criterion measure which allowed us to adjust the effect sizes for unreliability.

Coding Moderators

Although the overall effect sizes arising from the meta-analysis are of interest, an examination of potential moderators that might influence these effect sizes is of critical importance. Given the limited number of effect size estimates available for analysis, analysis at the individual criterion level was not feasible. Thus, the relationship of these moderators to the observed effect sizes was computed with respect to the overall effect size estimate.

Study design characteristics

The primary purpose of coding characteristics of the study and design as moderators of instructional effectiveness was to examine to what extent aspects of internal validity of the studies included in the meta-analysis might, in part, account for the observed effect sizes. Specifically, first we coded for the type of design used to examine instructional effectiveness (e.g., pre-post, pre-post with control, post with control, or longitudinal). Next, we also identified the size of the sample utilized for the study and whether or not the author of the study served as the instructor. Additionally, we coded the field of the investigator (i.e., health/medicine, philosophy, psychology, and other), the funding status of the study (funded or not funded), the publication area of the article (ethics, health, medicine, social science, or other), and whether or not the publication was peer reviewed.

Participant characteristics

This set of moderator variables included characteristics associated with the individuals who participated in the instructional program. The purpose of these variables was to provide some evidence for whether or not the overall observed effect size for instructional effectiveness might be externally valid. More specifically, these variables provide evidence for whether or not the effects might generalize across different populations of people. Therefore, first we coded for the audience of the instructional program (i.e., undergraduate students, graduate/medical students, and residents/interns) and the field of study of the participants (i.e., health, medicine, psychology/counseling, or other). Here, we also coded for the participants’ gender majority (male, female, or mixed) and age majority (under 35 years old, 35 and older, or mixed ages). Finally, we coded for whether or not the participants received an incentive (e.g., course credit) to complete the instructional program.

Quality ratings

Several ratings of general quality were made by the three trained content coders. The coders judged, and then rated on a 5-point Likert scale, the overall quality of the instructional program, the quality of the study design implemented to test the effectiveness of the course, and the quality of the criterion used to evaluate instructional effectiveness. The quality rating of the instructional program was judged based on an overall assessment of the quality of the content covered, use of delivery media, and the practice and application exercises utilized. The assessment of quality of the study design was based on as assessment of the adequacy of study design (e.g., sample size and inclusion of a control group). Finally, we rated the quality of the criterion measure utilized. This rating was intended to capture whether the criterion measure matched the instructional course and its intended outcomes. Thus, if a course purported to teach ethical decision-making skills but assessed outcomes via the DIT, a measure more appropriate for assessing moral development, that course received a low criterion quality rating. The inter-rater reliability of these quality ratings was assessed using an intraclass correlation coefficient and showed fairly high consistency (average ICC = .82).

Instructional content

The instructional content moderators included characteristics of course content capturing how courses in ethics might differ. Thus, these moderators provide evidence for which characteristics might lead to more or less effective instruction. First, we coded the overarching instructional objective of the program — specifically, whether the instruction focused on enhancing decision-making/problem-solving, moral development, or ethical sensitivity. After this general rating of instructional objective, we coded for the overall pedagogical approach. Specifically, we coded whether the instruction was primarily cognitive in nature (i.e., focusing on thinking about and solving ethical problems), or social-interactional in nature (i.e., focusing on the social and interpersonal aspects of ethical problems such as how others might react to the problem or how one’s behavior might affect others). Moreover, we coded whether the types of skills learned via the course were global skills that translate to real-world ethical problems across domains and settings or specific skills that are limited to the domain (e.g., nursing) at hand.

In addition to these broader elements of instructional content, we coded a number of specific elements of the instructional content. Our review of the literature uncovered a number of ethical domains, behaviors, and standards that might be discussed in ethics instruction. Thus, we coded for whether the courses included these elements. For example, ethical domains included responsibility, objectivity and fairness, mentor-mentee relationships, conflicts of interest, and peer review and publication. The ethical behaviors taxonomy included, for example, coverage of appropriate data management, informed consent, treatment of human and animal subjects, protection of intellectual property, protection of public welfare and environment, fair treatment of staff and collaborators, and appropriate use of physical resources (Helton-Fauth, Gaddis, Scott, Mumford, Devenport, Connelly, & Brown, 2003). Coverage of ethical standards included coding for whether courses included material on ethical values considered central to ethics as a researcher or scientist (e.g., avoiding harm, maintaining confidentiality, avoiding personal gain, and confronting ethical issues).

Additionally, recent research has stressed the importance of covering common reasoning errors that might be encountered in ethical decision-making and strategies for dealing with these errors and the social-cognitive complexities of ethical problems (Kligyte et al., 2008; Mumford et al., in press). Therefore, we coded whether typical reasoning errors (e.g., personal biases, thinking in simplistic terms) and strategies (e.g., perspective taking, emotion regulation, self-skepticism) were covered in the course. Because each individual content variable was covered only intermittently across courses, after coding for specific characteristics, we had to collapse the coding according to whether or not any elements of these instructional content areas were covered in the ethics course. Thus, we used a present/not present approach to coding these instructional content variables.

General instructional characteristics

This set of moderating variables consisted of more general characteristics associated with ethics courses. These variables included the setting for instruction (e.g., academic classroom or workshop/seminar), whether or not the organization actively advocated the program by providing resources and encouraging participation, whether or not the instructional program was mandatory, and the purpose of the program (standard education, professional development, or experimental investigation). Finally, we also coded for whether the course was integrated into the curriculum (e.g., an ethics section in a regular course) or whether it was a stand-alone course (e.g., an ethics course taken by medical students separately from regular coursework).

Characteristics of instructional method

In the final set of moderators, specific instructional methods, such as learning activities and instructional media, were examined. Specifically, this coding dimension included the length of instruction (less than 9 hours, or equal to or greater than 9 hours), the primary delivery method utilized (e.g., traditional classroom approach or case-based approach), the type of learning methods employed (variable or constant), and the type of practice sessions utilized (massed or distributed). We also coded the different types of learning activities used, for example, case-based exercises, essay or diary entries, face-to-face discussion, lecture, textbook readings, and role-plays. Because we could not analyze each type of activity separately due to the lack of consistency in activity use, we collapsed this coding dimension into the number of types of learning activities utilized (less than or equal to 3, or equal to or greater than 4). We coded practice exercises for use of a single type, multiple types, or none. Finally, the level of participant interaction during learning (low, moderate, or high) was coded.

Analysis Plan

Using the procedures recommended by Arthur et al. (2001), we used a SAS PROC MEANS program to conduct the analyses based on the meta-analytic approach recommended by Hunter and Schmidt (1990). This approach allowed sample-weighted means to be computed, which were corrected for sampling error.

In deciding to test moderators of instructional effectiveness, we employed the 75% rule-of-thumb suggested by Hunter and Schmidt (2004). Thus, if the overall meta-analysis resulted in less than 75% of the variance in studies being accounted for by sampling error (i.e., if correcting for statistical artifacts did not account for nearly all of the observed variation in effect sizes across studies), then there was reason to suspect that the effect size estimates were dependent on moderators. When conducting moderator analyses, most scholars (Arthur et al., 2001; Hunter & Schmidt, 2004) suggest that for best results, analyses should be limited to situations in which large samples of studies are available (i.e., k ≥ 10). However, based on the already limited k, we examined moderators if there were at least two cases available; this approach is consistent with Arthur and colleague’s (2001). Nonetheless, any interpretation made from analyses with such limited k size should be made with caution.

In our analysis, we calculated 95% confidence intervals for the sample-weighted mean effect sizes (Md’s). The confidence interval provides an indication of the accuracy of the estimate of the mean effect size by representing the extent to which sampling error may remain in the sample-weighted Md. More specifically, the confidence interval provides a range of values that the mean effect size is likely to take if other studies from the population were to be used in the meta-analysis. Furthermore, fail-safe N statistics were calculated to provide an estimate of the number of null effect sizes required to reduce a particular Md to below .20 (Orwin, 1983). In examining the results of the meta-analysis, it is of note that the analysis occasionally yielded confidence intervals with zero range (e.g., .11 to .11). This finding reveals that all of the observed variance in the Md for that moderator analysis was due to sampling error; thus, after correcting for sampling error, no additional variance in the effect size estimates remains to be moderated.

Results

Overall Effectiveness

The results of the overall meta-analysis are presented in Table 1. We applied Cohen’s (1969; 1992) recommendations for the interpretation of effect size magnitude. More specifically, when d = .20, this is considered a small effect; when d = .50, this is considered a medium effect; and when d = .80, this is considered a large effect. As may be seen in Table 1, the overall instructional effectiveness of ethics courses was moderate, d = .42 (SD = .27). However, the percent of variance accounted for by sampling error was also small (33%), thus we investigated the presence of moderators.

Table 1.

Overall Meta-Analysis and Criterion Type

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Ethics Instruction Effectiveness
 Overall Meta-Analysis 26 3041 .42 .27 33 –.10 .95 78.41 29
General Criterion Type
 Moral Development 17 2229 .36 .26 31 –.16 .88 55.20 14
 Ethical Analysisa 9 812 .61 .16 65 –.29 .93 13.97 18
Specific Criterion Measures
 MJS 4 106 −.14 .27 67 −.67 .38 5.74 --
 DIT 13 2123 .38 .24 31 −.09 .85 42.52 12
 Ethical Sensitivityb 6 701 .58 .20 48 .19 .97 12.44 11
 Ethical Decision-Makingc 3 111 .77 .00 100 .77 .77 .80 9
Reliability Corrected
 No 3 346 .37 .00 100 .37 .37 .68 3
 Yes 23 2695 .43 .29 30 –.14 1.00 77.43 26

Note.

a

Ethical decision-making and ethical sensitivity combined

b

all ethical sensitivity measures combined due to limited sample size

c

all ethical decision-making measures combined due to limited sample size

k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size (d) corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Before turning to the moderator results, of note are the findings with respect to the criterion type. Specifically, moral development (d = .36, SD = .26) and ethical analysis criteria (d = .61, SD = .16) revealed differences in effect sizes. We also found that when correction for reliability was possible in the calculation of the d statistic, the effect size was only slightly larger (d = .43, SD = .29) than when it was not possible (d = .37, SD = .00). Thus, this difference is of limited concern given that it is small and that only three effect sizes were not able to be corrected for the reliability of the measure.

Effects of Moderating Variables

Study Design Characteristics

The results with respect to study and design characteristics are presented in Table 2. We found that effect size estimates differed based on the type of design used in the study. A post-test with control design yielded the largest effects (d = .68, SD = .13) followed by a pre-test, post-test design (d = .52, SD = .00), longitudinal design (d = .39, SD = .21), and pre-test, post-test with control (d = .35, SD = .31). This finding highlights the point that conclusions derived from studies of instructional effectiveness are, at least in part, contingent on the type of study design utilized. Of note is the finding that, if the author conducted the ethics course, effect sizes were larger (d = .61, SD = .12) compared to when the instructor was not the author (d = .29, SD = .38). The investigator’s field was also associated with the obtained effect sizes. The largest effects were observed for investigators in the field of psychology (d = .80, SD = .00), followed by philosophy (d = .54, SD = .21), and the health and medical fields (d = .38, SD = .17). Moreover, the publication outlet was also associated with effects, such that studies published in the social sciences (d = .78, SD = .00) had the largest effects, followed by ethics (d = .58, SD = .25), health (d = .44, SD = .00), and medicine (d = .00, SD = .23).

Table 2.

Study Design Characteristics

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Design Type
 Pre-Post w/ Control 12 1380 .35 .31 27 –.27 .96 44.69 9
 Longitudinal 5 1027 .39 .21 33 –.01 .79 15.50 5
 Pre-Post 3 160 .52 .00 100 .52 .52 .12 5
 Post w/ Control 6 474 .68 .13 76 .43 .94 7.90 14
Sample Size
 Less than 50 7 153 .23 .00 100 .23 .23 5.14 1
 50 – 100 8 526 .52 .46 23 –.38 1.42 34.12 13
 101 – 150 5 642 .29 .30 26 –.31 .88 19.40 2
 151+ 6 1720 .46 .13 47 .21 .71 12.63 8
Author Served as Instructor
 No 13 1122 .29 .38 25 –.46 1.04 52.67 6
 Yes 10 884 .61 .12 77 .37 .85 13.07 21
Investigator Field
 Health/Medicine 9 1313 .38 .17 49 .05 .72 18.42 8
 Philosophy 4 648 .54 .21 38 .13 .94 10.52 7
 Psychology 6 441 .80 .00 100 .80 .80 4.95 18
 Other 7 639 .13 .21 50 –.29 .55 14.08 --
Study Funded
 No 14 1680 .46 .20 47 .07 .85 29.79 18
 Yes 10 1167 .42 .27 33 –.10 .94 29.89 11
Publication Outlet
 Medicine 5 170 .00 .23 70 –.46 .45 7.16 --
 Health 5 1155 .44 .00 100 .44 .44 4.01 6
 Ethics 6 680 .58 .25 38 .10 1.06 15.72 11
 Social Science 4 373 .78 .00 100 .78 .78 3.92 11
 Other 6 663 .14 .20 49 –.24 .53 12.25 --
Publication Type
 Non-Published 5 396 –.02 .09 87 –.19 .15 5.74 --
 Peer Reviewed 20 2512 .49 .23 39 .04 .93 50.99 29

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size (d) corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Participant characteristics

The moderating effects of participant characteristics are presented in Table 3. It was found that when residents and interns were the audience for instruction in ethics, the largest effects (d = .66, SD = .18) were produced, followed by undergraduate students (d = .40, SD = .22) and graduate/medical students (d = .33, SD = .42). Thus, the audience of the instructional program seems to influence the effectiveness of the course. Perhaps greater experience in one’s field, as is the case for residents and interns, promotes the effectiveness of ethics instruction (Ericsson & Charness, 1994). Along related lines, when the participants consisted of mixed ages (i.e., participants both above and below 35 years of age) the effect was larger (d = .45 SD = .46) than when the majority of participants were younger than 35 (d = .22, SD = .40). However, the variance in these effects was quite large, as indicated by the standard deviations, and somewhat unstable, as reflected in their respective confidence intervals. Also of note is the finding with respect to the participants’ field of study. Similar to the findings for the field of investigator, when the participant’s field of study was social sciences (i.e., psychology or counseling), the effect size was largest (d = .66, SD = .29). When trainees were in health fields, instruction showed greater effects (d = .44, SD = .00) than when participants were in medicine (d = .05, SD = .25) or other fields (d = .24, SD = .24). These findings also revealed that, when the participants were a female majority, the effects were largest (d = .40, SD = .35) compared to a majority of males (d = .02, SD = .13) or mixed gender (d = .27, SD = .00). However, this finding should be interpreted with caution given the limited number of effect sizes that could be included in this moderator analysis because of inadequate reporting of participant gender across studies.

Table 3.

Participant Characteristics

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Audience
 Graduate/Medical Students 9 399 .33 .42 36 –.49 1.15 25.27 6
 Undergraduate Students 13 2264 .40 .22 33 –.03 .83 39.66 13
 Residents/Interns 4 378 .66 .18 30 .30 1.01 6.87 9
Field of Study
 Medicine 6 185 .05 .25 70 –.42 .53 8.61 --
 Health 5 1155 .44 .00 100 .44 .44 4.01 6
 Psychology/Counseling 8 535 .66 .29 43 .09 1.23 18.48 18
 Other 4 685 .24 .24 29 –.23 .71 13.78 1
Participant Gender
 70% Males 3 347 .02 .13 70 –.23 .26 4.34 --
 Mixed Gender 3 104 .27 .00 100 .27 .27 .12 1
 70% Females 9 787 .40 .35 28 –.29 1.09 32.64 9
Participant Age
 70% less than 35 7 274 .22 .40 41 –.55 1.00 17.16 1
 Mixed Ages 4 142 .45 .46 37 –.45 1.34 10.89 5
Participants had Incentive
 No 7 1182 .41 .00 100 .41 .41 6.36 7
 Yes 16 1545 .36 .34 27 –.30 1.03 59.35 13

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size (d) corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Quality Ratings

The findings with respect to the quality ratings are provided in Table 4. Not surprisingly, as quality ratings increased, the magnitude of the observed effect sizes increased. For quality of the instructional program, above average instructional programs demonstrated much larger effects sizes (d = .72, SD = .15) than average (d = .39, SD = .02) or below average (d = .18, SD = .25) programs. Similarly, when the quality of the study was below average (d = .16, SD = .23), effect sizes were much smaller than when the study quality was average (d = .48, SD = .20) or above average (d = .65, SD = .20). Finally, when the quality of the criterion was average (d = .51, SD = .08) or above average (d = .57, SD = .20), effect sizes were larger than when the criterion used for assessing instructional effectiveness was below average (d = .29, SD = .34). Thus, as expected, the effectiveness of ethics instruction is contingent on the quality of the instruction itself, the quality of the study designed to examine the effects of instruction, and the quality of the criterion applied to evaluate the course.

Table 4.

Quality Ratings

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Quality Rating of Instructional Program
 Below Average 5 674 .18 .25 32 –.32 .68 15.67 --
 Average 8 1301 .39 .02 99 .35 .42 8.11 8
 Above Average 6 445 .72 .15 72 .42 1.01 8.30 16
Quality Rating of Study
 Below Average 10 817 .16 .23 50 –.29 .60 20.13 --
 Average 12 1688 .48 .20 44 .10 .86 27.41 17
 Above Average 4 536 .65 .20 45 .26 1.03 8.83 9
Quality Rating of Criterion
 Below Average 14 1284 .29 .34 28 –.38 .95 50.22 6
 Average 9 1375 .51 .08 80 .35 .68 11.28 14
 Above Average 3 382 .57 .20 45 .17 .96 6.71 6

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size d corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Instructional Content

The results obtained when instructional content was examined as a moderator of instructional effectiveness are provided in Table 5. An examination of the overarching instructional objective revealed that an ethical decision-making/problem-solving (d = .52, SD = .15) approach was most effective, followed by ethical sensitivity (d = .42, SD = .11) and moral development (d = .17, SD = .28). In addition, courses that focused on skills applicable to ethics in a global sense (i.e., focusing on ethical problems encountered in a number of real-world settings that span across domains and fields) were more effective (d = .64, SD = .11) than courses that focused on limited skills that are only specific to a particular field (d = .35, SD = .27). In addition, when the general approach to instruction was cognitive in nature, the effect size was slightly larger (d = .44, SD = .22) than social-interactional approaches (d = .37, SD = .22). This finding suggests that cognitive approaches may be most effective but that social-interactional approaches may also be of value. Moreover, these findings are consistent with those obtained for overarching instructional objective, such that decision-making, a cognitive approach, and sensitivity, a social-interactional approach, were both fairly effective.

Table 5.

Instructional Content

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Overarching Instructional Objective
 Moral Development 4 619 .17 .28 25 –.38 .73 16.18 --
 Ethical Sensitivity 7 780 .42 .11 74 .19 .64 9.48 8
 Decision Making/Problem-Solving 9 1234 .52 .15 59 .23 .81 15.31 14
Overarching Instructional Approach
 Social-Interactional 6 938 .37 .22 35 –.06 .80 16.92 5
 Cognitive 12 1366 .44 .22 43 .01 .87 28.06 14
Type of Skills Instructed
 Domain-Specific 15 2257 .35 .27 27 –.18 .89 55.97 11
 Global 9 744 .64 .11 81 .43 .86 11.12 20
Ethical Domains Coverage
 No 4 460 –.11 .14 66 –.38 .15 6.06 --
 Yes 16 2014 .48 .14 62 .20 .76 26.00 22
Ethical Behaviors Taxonomy
 No 6 579 .03 .00 100 .03 .03 5.07 --
 Yes 12 1829 .51 .15 55 .21 .80 21.75 19
Ethical Standards Coverage
 No 6 1192 .20 .25 24 –.29 .71 24.98 --
 Yes 14 1282 .52 .19 57 .16 .88 24.38 22
Problems in EDM Coverage
 No 10 1845 .33 .17 43 .00 .67 23.33 7
 Yes 9 575 .57 .30 43 –.01 1.16 20.91 17
Strategies for EDM Coverage
 No 7 1229 .22 .25 27 –.27 .71 25.93 1
 Yes 13 1245 .52 .20 53 .13 .90 24.20 21

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size d corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); EDM = ethical decision-making; -- = effect size already below .20.

With respect to ethical domains (e.g., mentor-mentee relationships, authorship and publication), coverage of these key domains was associated with much higher effectiveness (d = .48, SD = .14) compared to no inclusion of these domains in the instruction (d = -.11, SD = .14). Similar findings emerged for coverage of ethical behaviors (e.g., maintaining confidentiality, protection of intellectual property) and ethical standards for conducting research and science (e.g., avoiding harm and avoidance of personal gain). Courses covering behaviors and standards revealed larger effect sizes (d = .52, SD = .17) than those that did not include this material (d = .12, SD = .13).

Finally, with regard to inclusion of possible reasoning errors in ethical decision-making (e.g., thinking in black and white terms, making hasty decisions, failing to weigh future consequences), courses that covered this material showed larger effects (d = .57, SD = .30) compared to courses that did include this information (d = .33, SD = .17). Furthermore, courses that included strategies that can be used by scientists to assist them in working through ethical problems (e.g., asking for help from someone with an outside perspective, considering the perspectives of others, and managing one’s own emotions) revealed larger effects (d = .52, SD = .20) relative to those that did not include such instruction on strategies (d = .22, SD = .25). Thus, it appears that including content that focuses on how one might address ethical problems and work through decisions may improve ethics instruction effectiveness.

General Instructional Characteristics

The findings with respect to general instructional characteristics as moderators of instructional effectiveness are presented in Table 6. Most notably, we found that instructional courses conducted in a separate workshop or seminar format (d = .52, SD = .02), as compared to being held in a typical academic (i.e., classroom) setting (d = .38, SD = .26), were more effective. Similarly, courses held in a stand-alone fashion focusing solely on ethics (d = .51, SD = .30), instead of being embedded in existing courses or curriculum (d = .37, SD = .22) showed greater effectiveness. In addition, courses that were conducted for the purpose of professional development (d = .73, SD = .29) showed large effects compared to the modest effects of instruction conducted solely for educational purposes (d = .36, SD = .26). Experimental courses conducted solely for the purpose of research showed the largest effects (d = .85, SD = .00); however there were only two studies include in this analysis. It was also found that instruction that was not mandatory (d = .53, SD = .25) was more effective than instruction that was required (d = .32, SD = .23).

Table 6.

General Instructional Characteristics

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Setting of Instruction
 Academic Setting 19 2532 .38 .26 31 –.14 .89 61.31 17
 Workshop/Seminar 4 195 .52 .02 99 .48 .56 4.02 6
Organization Advocates Program
 No 11 874 .42 .28 41 –.13 .96 26.87 12
 Yes 11 1838 .37 .25 28 –.12 .86 38.68 9
Instructional Program Mandatory
 No 10 909 .53 .25 43 .04 1.02 23.51 17
 Yes 13 1818 .32 .23 36 –.13 .76 35.97 8
Primary Purpose of Program
 Education 19 2577 .36 .26 31 –.14 .89 61.33 15
 Professional Development 3 175 .73 .29 48 .17 1.30 6.25 8
 Basic Experimentation 2 71 .85 .00 100 .85 .85 .57 7
Type of Instructional Program
 Integrated 11 1832 .37 .22 33 –.07 .80 33.08 9
 Stand-alone 15 1209 .51 .30 36 –.08 1.11 41.30 23

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size d corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Characteristics of Instructional Methods

The last set of moderators concerned characteristics of instructional methods. These results are shown in Table 7. A case-based approach to instruction yielded larger effects (d = .53, SD = .14) than a standard lecture approach (d = .36, SD = .25). Furthermore, the application of a variable learning method (d = .52, SD = .09), where learning activities (e.g., discussion, cases, journaling, lecture) vary throughout instruction, yielded larger effects than constant learning methods (d = .18, SD = .24), where a single learning activity is utilized throughout instruction. Along similar lines, using four or more learning activities (d = .48, SD = .14) compared to three or fewer (d = .12, SD = .34) revealed greater instructional effectiveness. Furthermore, those courses that applied practice techniques in a distributed approach throughout the instructional course (d = .47, SD = .18), as opposed to practicing in one massed session (d = .18, SD = .33) were more effective. In a related vein, multiple types of practice activities (d = .52, SD = .12) were more effective than a single type of practice activity (d = .18, SD = .29). Finally, courses that allowed for greater trainee interaction during learning and practice activities were more effective (d = .63, SD = .09) than courses with moderate (d = .37, SD = .16) or low levels of trainee interaction (d = .05, SD = .10).

Table 7.

Characteristics of Instructional Methods

Sample
Weighted
Variance due
to Sampling
Error (%)
95% CI
k N Md SD L U χ 2 Nfs

Length of Instruction
 Less than 9 hours 9 867 .35 .25 41 –.13 .84 21.84 7
 Equal to or greater than 9 hours 13 1867 .47 .24 33 .00 .94 38.86 18
Primary Delivery Method
 Classroom-Based 8 1091 .36 .25 33 –.12 .85 24.44 6
 Case-Based 9 1214 .53 .14 60 .24 .81 14.98 15
 Other 3 328 .11 .00 100 .11 .11 1.52 --
Learning Method
 Constant 9 926 .18 .24 41 –.29 .64 21.77 --
 Variable 10 1494 .52 .09 76 .35 .70 12.89 16
Learning Activity Usage
 Less than or equal to 3 8 692 .12 .34 29 –.55 .78 27.63 --
 Equal to or greater than 4 13 1995 .48 .14 59 .21 .75 22.17 18
Practice
 Massed 6 382 .18 .33 38 –.46 .82 15.77 --
 Distributed 11 1655 .47 .18 47 .12 .82 23.37 15
Practice Activities
 Single Type 8 543 .18 .29 42 –.40 .75 19.20 --
 Multiple Types 8 1442 .52 .12 62 .29 .75 12.83 13
 None 3 435 .23 .11 71 .02 .44 4.24 0
Level of Participant Interaction
 Low 4 411 .05 .10 81 –.14 .24 4.91 --
 Moderate 6 1198 .37 .16 43 .05 .70 13.80 5
 High 7 722 .63 .09 84 .45 .81 8.32 15

Note. k = number of effect sizes; N = Total sample; M d = Sample weighted mean effect size d corrected for measurement error; SD = Standard deviation of mean effect size; CI = Confidence interval; L = Lower; U = Upper; Nfs = Orwin’s (1983) Fail safe N (number of null effects to reduce M d below .20); -- = effect size already below .20.

Discussion

Before turning to the conclusions and implications arising from these findings, several limitations of this study must first be noted. To begin, these meta-analytic findings should be interpreted with some caution given the limited number of studies included in the meta-analysis. Although a large number of studies discussing ethics instruction in general were identified, few studies explicitly evaluated ethics instruction in the sciences. Furthermore, after applying the inclusion criteria for evaluation efforts, several studies could not be included because they lacked descriptiveness, or simply because basic statistics, such as standard deviations, necessary for calculation of the d statistic were not reported. Nevertheless, these basic statistics are considered essential for reporting the results of empirical research, particularly when studying human subjects (Wilkinson et al., 1999). Thus, these observations suggest a great deal of interest in ethics instruction, but limited systematic, rigorous evaluation of ethics instruction.

In addition, moderator data could not be provided for every effect size included in the analysis. Instead, coding of moderators was limited to the descriptiveness provided within the studies, which was unfortunately often less than ideal. As a result, conclusions arising from the analyses of moderators of instructional effectiveness were often based on a sample of studies smaller than the overall effect size analysis.

Along similar lines, due to limited sample size, examination of moderators at the level of each individual variable within a particular dimension was not possible. Instead, moderator analyses were typically dichotomized, although three to four unique categories could be created for some variables. Unfortunately, this approach of collapsing into broader dimensions limits the specificity of our conclusions. For example, we cannot conclude which specific types of learning activities (i.e., lecture, group discussion, and journaling) are most effective and how they compare to one another. Moreover, every possible moderator that might influence instructional effectiveness could not be coded for in this study. However, focus was placed on identifying key factors of instructional programs that might represent differences in effectiveness.

Even taking these limitations into consideration, the findings obtained in the present study suggest some noteworthy conclusions regarding the effectiveness of ethics instruction in the sciences and, furthermore, point to issues to be considered in the design and evaluation of ethics courses. To begin, we return to our first question: How effective is ethics instruction in the sciences? The answer appears to be that ethics instruction is at best moderately effective as it is currently conducted. Not surprisingly, however, the findings also suggest that, when the instructional program quality is high, effectiveness is greater. Therefore, it appears that if instructional programs are well designed, they have the potential to be fairly effective. Hence, we posed our second question: What characteristics are associated with the effectiveness of ethics instruction in the sciences? A response to this question emerges from an examination of the pattern of findings with respect to characteristics associated with larger effects.

First and foremost, it appears that a cognitive decision-making approach to instruction is most effective, followed by ethical sensitivity, which focuses on the social-interactional nature of ethical problems. In fact, a relevant question for future research might be whether, in combination, these approaches would complement one another, producing even higher levels of instructional effectiveness (Sonenshein, 2007). Along these lines, it was found that covering potential reasoning errors (e.g., thinking in black and white terms, making hasty decisions, and overlooking key causes) that might hinder thinking through ethical situations was especially valuable. Furthermore, providing cognitive strategies (e.g., considering others’ perspectives, considering personal motivations, anticipating consequences) for thinking through the likely outcomes and social implications of the problem was also especially beneficial for instructional effectiveness. Indeed, it has been shown in past studies that strategy-based instructional interventions are particularly effective for improving people’s problem-solving on complex, ambiguous problems (Scott, Leritz, & Mumford, 2004a; 2004b).

Additionally, it should be noted that providing specific content, such as ethical domains, standards, and behaviors, appears to be important for constructing effective ethics instruction. This material may provide a basis for framing what constitutes an ethical problem, and thus for applying newly learned strategies for working through these problems (Gick & Holyoak, 1983). In fact, it seems that older participants may benefit the most from ethics instruction. It may be that they possess the requisite knowledge to serve as a foundation for strategy-based training (Clapham, 1997; Kolodner, 1997; Önkal et al., 2003). Thus, it appears that the success of ethics instruction may, in part, be attributed to developing an understanding of, and providing guidance concerning, the application of requisite strategies for confronting real-world ethical problems.

In addition to foundational knowledge provided by covering ethical standards and behaviors, cases also provide knowledge, in the form of contextualized exemplars (Hammond, 1991; Kolodner, 1993; 1997; Patalano & Seifert, 1997), to be applied for addressing ethical problems. Moreover, case examples provide a learning tool for practicing to apply relevant knowledge and strategies to problem scenarios (Erickson & Kruschke, 1998; Jonassen & Hernandez-Serrano, 2002). In line with these observations, case-based instruction produced larger effects than classroom-based, lecture style instruction. Moreover, student engagement, by means of highly interactive courses and a number of different learning and practice activities, also promoted instructional effectiveness. In fact, because case-based models are often acquired through social experiences, the effectiveness of case-based approaches to instruction is likely to be enhanced by interactive, cooperative learning (Aronson & Patnoe, 1997; Slavin, 1991). These observations regarding the criticality of interaction among course participants are particularly significant given current trend towards online formats for ethics instruction (Barnes et al., 2006; Braunschweiger & Goodman, 2007; Kiser, 1999). Although online courses have their advantages, they typically fail to involve any degree of social interaction, limiting training to individual-level application of rules to relatively simple, context-free cases. This all-too-common limitation must be considered seriously as institutions consider the best ways to implement effective ethics instruction, whether it is delivered online or face-to-face. Moreover, given the implicit aim of ethics instruction — to foster a community of social responsibility — it seems reasonable to expect that a learning environment that involves social interaction might facilitate such a goal better than one that does not involve social interaction. Whether or not this expectation is true, of course, is a question that remains open to empirical investigation.

Before turning to our final conclusions, it is important to reiterate the critical importance of careful, thorough evaluation of ethics instruction. The design of the evaluation study must be as complete and systematic as possible, and the criterion measure must match the intended outcomes of the program (Alliger, Tannenbaum, Bennett, Traver, & Shotland, 1997; Kraiger & Jung, 1996). In fact, in order to reach a coherent understanding of what might constitute effective ethics instruction, a fundamental consideration includes the most appropriate criterion measure to be utilized in evaluation. In the present study, the DIT (Rest, 1979) was the most commonly applied criterion measure in studies of ethics instruction. If, as the present study suggests, less effective programs use a moral development framework and more effective programs involve instruction in the process of ethical decision-making; specifically, learning about social-cognitive elements of ethical problems and the application of strategies for working through problems, then it might be necessary to consider whether a different measure of training effectiveness might more completely assess whether these instructional goals have been accomplished. In fact, the DIT, a measure of moral development, may be limited in its ability to address all potential, and desired, outcomes of instruction.

In conclusion, although this study points to several important considerations for the design and delivery of ethics instruction, these recommendations are certainly not suggested as a panacea for ethics instruction. The present study merely skims the surface of a rather extensive issue still requiring a great deal of research. Fortunately, we did find evidence that ethics instruction in the sciences, if carefully designed and evaluated, has the potential to be fairly effective. Hopefully, the present study will provide some practical guidance for future course development and evaluation. Moreover, we hope that the present effort might provide direction for researchers generally concerned with studying ethics and ethics instruction.

Acknowledgements

We would like to thank Jason Hill and Jared Caughron for their contributions to this research. This research was supported by grant #5R01-NS049535-02 from the National Institutes of Health and the Office of Research Integrity, Michael D. Mumford, Principal Investigator.

References

*Articles included in the meta-analysis

  1. Antes AL, Brown RP, Murphy ST, Waples EP, Mumford MD, Connelly S, Devenport LD. Personality and ethical decision-making in research: The role of perceptions of self and others. Journal of Empirical Research on Human Research Ethics. 2007;2:15–34. doi: 10.1525/jer.2007.2.4.15. [DOI] [PubMed] [Google Scholar]
  2. Abbott A. Science comes to terms with the lessons of fraud. Nature. 1999 March;398:13–17. doi: 10.1038/17883. [DOI] [PubMed] [Google Scholar]
  3. Alliger GM, Tannenbaum SI, Bennett W, Jr., Traver H, Shotland A. A meta-analysis of the relations among training criteria. Personnel Psychology. 1997;50:341–358. [Google Scholar]
  4. Arthur W, Jr., Bennett W, Jr., Huffcutt AI. Conducting meta-analysis using SAS. Erlbaum; Mahwah, NJ: 2001. [Google Scholar]
  5. Aronson E, Patnoe S. The jigsaw classroom: Building cooperation in the classroom. 2nd ed Longman; New York: 1997. [Google Scholar]
  6. Baldick TL. Ethical discrimination ability of intern psychologists: A function of training in ethics. Professional Psychology. 1980;11:276–282. [Google Scholar]
  7. Baldwin TT, Ford JK. Transfer of training: A review and directions for future research. Personnel Psychology. 1988;41:63–105. [Google Scholar]
  8. Barnes BE, Friedman CP, Rosenberg JL, Russell J, Beedle A, Levine AS. Creating an infrastructure for training in the responsible conduct of research: The University of Pittsburgh’s Experience. Academic Medicine. 2006;81:119–127. doi: 10.1097/00001888-200602000-00004. [DOI] [PubMed] [Google Scholar]
  9. Bebeau MJ, Thoma SJ. The impact of a dental ethics curriculum on moral reasoning. Journal of Dental Education. 1994;58:684–692. [PubMed] [Google Scholar]
  10. Borkowski SC, Ugras YJ. Business students and ethics: A meta-analysis. Journal of Business Ethics. 1998;17:1117–1127. [Google Scholar]
  11. Braunschweiger P, Goodman KW. The CITI program: An international online resource for education in human subjects protection and the responsible conduct of research. Academic Medicine. 2007;82:861–864. doi: 10.1097/ACM.0b013e31812f7770. [DOI] [PubMed] [Google Scholar]
  12. Chase NM. A cognitive-development approach to professional ethics training for counselor education students. Dissertation Abstracts International. 1999;59(08):2865. (UMI No. 9903261) [Google Scholar]
  13. Clapham MM. Ideational skills training: A key element in creativity training programs. Creativity Research Journal. 1997;10:33–44. [Google Scholar]
  14. Clarkeburn H. A test for ethical sensitivity in science. Journal of Moral Education. 2002;31:439–453. [Google Scholar]
  15. Clarkeburn H, Downie JR, Matthew B. Impact of an ethics programme in a life sciences curriculum. Teaching in Higher Education. 2002;7:65–79. [Google Scholar]
  16. Cohen J. Statistical power analysis for the behavioral sciences. Academic Press; New York: 1969. [Google Scholar]
  17. Cohen J. A power primer. Psychological Bulletin. 1992;112:155–159. doi: 10.1037//0033-2909.112.1.155. [DOI] [PubMed] [Google Scholar]
  18. Colquitt JA, Simmering MJ. Conscientiousness, goal orientation, and motivation to learn during the learning process: A longitudinal study. Journal of Applied Psychology. 1998;83:654–665. [Google Scholar]
  19. Conn VS, Valentine JC, Cooper HM, Rantz MJ. Grey literature in meta-analyses. Nursing Research. 2003;52:256–261. doi: 10.1097/00006199-200307000-00008. [DOI] [PubMed] [Google Scholar]
  20. Cook TD, Campbell DT. Quasi-Experimentation: Design and Analysis for Field Settings. Rand McNally; Chicago: 1979. [Google Scholar]
  21. Dalton R. NIH cash tied to compulsory training in good behaviour. Nature. 2000;408:629. doi: 10.1038/35047242. [DOI] [PubMed] [Google Scholar]
  22. Drake MJ, Griffin PM, Kirkman R, Swann JL. Engineering ethical curricula: Assessment and comparison of two approaches. Journal of Engineering Education. 2005;94:223–232. [Google Scholar]
  23. Duckett L, Rowan M, Ryden M, Krichbaum K, Miller M, Wainwright H, Savik K. Progress in the moral reasoning of baccalaureate nursing students between program entry and exit. Nursing Research. 1997;46:222–229. doi: 10.1097/00006199-199707000-00007. [DOI] [PubMed] [Google Scholar]
  24. Ericsson KA, Charness N. Expert performance: Its structure and acquisition. American Psychologist. 1994;49:725–747. [Google Scholar]
  25. Erickson MA, Kruschke JK. Rule and exemplars in category learning. Journal of Experimental Psychology: General. 1998;127:107–140. doi: 10.1037//0096-3445.127.2.107. [DOI] [PubMed] [Google Scholar]
  26. Fink LD. Creating significant learning experiences: An integrated approach to designing college courses. Jossey-Bass; San Francisco: 2003. [Google Scholar]
  27. Friedman PJ. The impact of conflict of interest on trust in science. Science and Engineering Ethics. 2002;8:413–420. doi: 10.1007/s11948-002-0063-9. [DOI] [PubMed] [Google Scholar]
  28. Frisch NC. Value analysis: A method for teaching nursing ethics and promoting the moral development of students. Journal of Nursing Education. 1987;26:328–332. doi: 10.3928/0148-4834-19871001-07. [DOI] [PubMed] [Google Scholar]
  29. Gaul AL. The effect of a course in nursing ethics on the relationship between ethical choice and ethical action in baccalaureate nursing students. Journal of Nursing Education. 1987;26:113–117. doi: 10.3928/0148-4834-19870301-08. [DOI] [PubMed] [Google Scholar]
  30. Gawthrop JC, Uhlemann MR. Effects of the problem-solving approach in ethics training. Professional Psychology: Research & Practice. 1992;23:38–42. [Google Scholar]
  31. Gick ML, Holyoak KJ. Schema induction and analogical transfer. Cognitive Psychology. 1983;15:1–38. [Google Scholar]
  32. Goldman SA, Arbuthnot J. Teaching medical ethics: The cognitive-developmental approach. Journal of Medical Ethics. 1979;5:170–181. doi: 10.1136/jme.5.4.170. [DOI] [PMC free article] [PubMed] [Google Scholar]
  33. Goldstein IL, Ford JK. Training in organizations. Wadsworth; Belmont, CA: 2002. [Google Scholar]
  34. Haidt J. The emotional dog and its rational tail: A social intuitionist approach to moral judgment. Psychological Review. 2001;108:814–834. doi: 10.1037/0033-295x.108.4.814. [DOI] [PubMed] [Google Scholar]
  35. Hammond KJ. Case-based planning: A framework for planning from experience. Cognitive Science. 1990;14:385–443. [Google Scholar]
  36. Helton-Fauth WB, Gaddis B, Scott G, Mumford MD, Devenport LD, Connelly S, Brown RP. A new approach to assessing ethical conduct in scientific work. Accountability in Research. 2003;10:205–228. doi: 10.1080/714906104. [DOI] [PubMed] [Google Scholar]
  37. Hung H, Wong YH. The relationship between employer endorsement of continuing education and training and work and study performance: a Hong Kong case study. International Journal of Training & Development. 2007;11:295–313. [Google Scholar]
  38. Hunter JE, Schmidt FL. Methods of meta-analysis: Correcting error and bias in research findings. 1st ed Sage; Newbury Park, CA: 1990. [Google Scholar]
  39. Hunter JE, Schmidt FL. Methods of meta-analysis: Correcting error and bias in research findings. 2nd ed Sage; Newbury Park, CA: 2004. [Google Scholar]
  40. Jonassen DH, Hernandez-Serrano J. Case-based reasoning and instructional design: Using stories to support problem solving. Educational Technology Research and Development. 2002;50:65–77. [Google Scholar]
  41. Jones TM. Ethical decision making by individuals in organizations: An issue-contingent model. Academy of Management Review. 1991;16:366–395. [Google Scholar]
  42. Kalichman MW. Responding to challenges in educating for the responsible conduct of research. Academic Medicine. 2007;82:870–875. doi: 10.1097/ACM.0b013e31812f77fe. [DOI] [PubMed] [Google Scholar]
  43. Kalichman MW, Plemmons DK. Reported goals for responsible conduct of research courses. Academic Medicine. 2007;82:846–852. doi: 10.1097/ACM.0b013e31812f78bf. [DOI] [PubMed] [Google Scholar]
  44. Kiser K. 10 thinks we know so far about online training. Training. 1999;36:66–68. [Google Scholar]
  45. Kligyte V, Marcy RT, Waples EP, Sevier ST, Godfrey ES, Mumford MD, Hougen DF. Science and Engineering Ethics. 2. Vol. 14. 2008. Application of a sensemaking approach to ethics training for physical sciences and engineering; pp. 251–278. [DOI] [PubMed] [Google Scholar]
  46. Kohlberg L. Stage and sequence: The cognitive development approach to socialization. In: Goslin DA, editor. Handbook of Socialization Theory. Rand McNally; Chicago: 1969. pp. 347–480. [Google Scholar]
  47. Kohlberg L. Moral stages and moralization: The cognitive-developmental approach. In: Lickona T, editor. Moral development and behavior: Theory, research, and social issues. Holt, Rinehart & Winston; New York: 1976. [Google Scholar]
  48. Kolodner JL. Case Based Reasoning. Morgan Kaufmann Publishers; San Mateo, CA: 1993. [Google Scholar]
  49. Kolodner JL. Educational implications of analogy. American Psychologist. 1997;52:57–67. doi: 10.1037//0003-066x.52.1.57. [DOI] [PubMed] [Google Scholar]
  50. Kraiger K, Jung KM. Linking training objectives to evaluation criteria. In: Quiñones MA, Ehrenstein A, editors. Training for a Rapidly Changing Workplace: Application of Psychological Research. American Psychological Association; Washington, DC: 1996. pp. 151–175. [Google Scholar]
  51. Kirk RE. Experimental design: Procedures for behavioral sciences. 3rd ed Brooks/Cole; Pacific Grove, CA: 1995. [Google Scholar]
  52. Major-Kincade TL, Tyson JE, Kennedy KA. Training pediatric house staff in evidence-based ethics: An exploratory controlled trial. Journal of Perinatology. 2001;21:161–166. doi: 10.1038/sj.jp.7200570. [DOI] [PubMed] [Google Scholar]
  53. Martinson BC, Anderson MS, de Vries R. Scientists behaving badly. Nature. 2005;435:737–738. doi: 10.1038/435737a. [DOI] [PubMed] [Google Scholar]
  54. McKellar KA. Ethical decision-making: Does practice make a difference? Dissertation Abstracts International. 1999;60(02):574. (UMI No. 9920900) [Google Scholar]
  55. Mumford MD, Connelly S, Brown RP, Murphy ST, Hill JH, Antes AL, Waples EP, Devenport LD. A sensemaking approach to ethics training for scientists: Preliminary evidence of training effectiveness. Ethics and Behavior. doi: 10.1080/10508420802487815. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Mumford MD, Connelly S, Murphy ST, Devenport LD, Antes AL, Brown RP, Hill JH, Waples EP. Field and experience influences on ethical decision making in the sciences. Ethics and Behavior. doi: 10.1080/10508420903035257. in press. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Myyry L, Helkama K. The role of value priorities and professional ethics training in moral sensitivity. Journal of Moral Education. 2002;31:35–50. [Google Scholar]
  58. O’Fallon MJ, Butterfield KD. A review of the empirical ethical decision-making literature: 1996-2003. Journal of Business Ethics. 2005;59:375–413. [Google Scholar]
  59. Önkal D, Yates JF, Simga-Mugan C, Öztin Ş . Professional vs. amateur judgment accuracy: The case of foreign exchange rates. Organizational Behavior & Human Decision Processes. 2003;91:169–186. [Google Scholar]
  60. Orwin RG. A fail-safe N for effect size in meta-analysis. Journal of Educational Statistics. 1983;8:157–159. [Google Scholar]
  61. Patalano AL, Siefert CM. Opportunistic planning: being reminded of pending goals. Cognitive Psychology. 1997;34:1–36. doi: 10.1006/cogp.1997.0655. [DOI] [PubMed] [Google Scholar]
  62. Patenaude J, Niyonsenga T, Fafard D. Changes in students’ moral development during medical school: A cohort study. Canadian Medical Association Journal. 2003;168:840–844. [PMC free article] [PubMed] [Google Scholar]
  63. Penn WY. Teaching ethics: A direct approach. Journal of Moral Education. 1990;19:124–139. [Google Scholar]
  64. Perkins DN, Salomon G. Are cognitive skills context-bound? Educational Researcher. 1989;18:16–25. [Google Scholar]
  65. Powell ST, Allison MA, Kalichman MW. Effectiveness of a responsible conduct of research course: A preliminary study. Science and Engineering Ethics. 2007;13:249–264. doi: 10.1007/s11948-007-9012-y. [DOI] [PubMed] [Google Scholar]
  66. Resnick DL. From Baltimore to Bell Labs: Reflections on two decades of debate about scientific misconduct. Accountability in Research. 2003;10:123–135. doi: 10.1080/08989620300508. [DOI] [PubMed] [Google Scholar]
  67. Rest JR. Development in judging moral issues. University of Minnesota Press; Minneapolis, MN: 1979. [Google Scholar]
  68. Rest J, Cooper R, Coder R, Masanz J, Anderson D. Judging the important issues on moral dilemmas. Developmental Psychology. 1974;10:491–501. [Google Scholar]
  69. Rest J. New approaches in the assessment of moral judgment. In: Lickona T, editor. Moral development and behavior: Theory, research, and social issues. Holt, Rinehart & Winston; New York: 1976. [Google Scholar]
  70. Rest JR. An overview of the psychology of morality. In: Rest JR, editor. Moral Development and Behavior: Theory, research, and social issues. Praeger; New York: 1986. pp. 133–175. [Google Scholar]
  71. Rest JR. DIT manual: Manual for the defining issues test. 3rd ed University of Minnesota Center for the Study of Ethical Development; St. Paul, MN: 1988. [Google Scholar]
  72. Rosenthal R. The “file drawer problem” and tolerance for null result. Psychological Bulletin. 1979;85:638–641. [Google Scholar]
  73. Ruegger D, King EW. A study of the effect of age and gender upon student business ethics. Journal of Business Ethics. 1992;11:179–186. [Google Scholar]
  74. Ryden MB, Duckett L. Technical report for the Improvement of Post Secondary Education grant. U.S. Department of Education; Washington, DC: 1991. Ethics education for baccalaureate nursing. [Google Scholar]
  75. Scott GM, Leritz LE, Mumford MD. The effectiveness of creativity training: A meta-analysis. Creativity Research Journal. 2004a;16:361–388. [Google Scholar]
  76. Scott GM, Leritz LE, Mumford MD. Types of creativity: Approaches and their effectiveness. The Journal of Creative Behavior. 2004b;38:149–179. [Google Scholar]
  77. Self DJ, Schrader DE, Baldwin DC, Wolinsky FD. The moral development of medical students: A pilot study of the possible influence of medical education. Medical Education. 1993;27:26–34. doi: 10.1111/j.1365-2923.1993.tb00225.x. [DOI] [PubMed] [Google Scholar]
  78. Self DJ, Schrader DE, Baldwin DC, Root SK, Wolinsky FD, Shadduck JA. Study of the influence of veterinary medical education on the moral development of veterinary students. Journal of the American Veterinary Medical Association. 1991;198:782–787. [PubMed] [Google Scholar]
  79. Slavin RE. Synthesis of research on cooperative learning. Educational Leadership. 1991;48:71–81. [Google Scholar]
  80. Slavin RE. Research on cooperative learning and achievement: What we know, what we need to know. Contemporary Educational Psychology. 1996;21:43–69. [Google Scholar]
  81. Smith G. Are there domain-specific thinking skills? Journal of Philosophy of Education. 2002;36:207–227. [Google Scholar]
  82. Sonenshein S. The role of construction, intuition, and justification in responding to ethical issues at work: The sensemaking-intuition model. The Academy of Management Review. 2007;32:1022–1040. [Google Scholar]
  83. Steneck NH, Bulger RE. The history, purpose, and future of instruction in the responsible conduct of research. Academic Medicine. 2007;82:829–834. doi: 10.1097/ACM.0b013e31812f7d4d. [DOI] [PubMed] [Google Scholar]
  84. Tannenbaum SI, Yukl G. Training and development in work organizations. Annual Review of Psychology. 1992;43:399–441. [Google Scholar]
  85. Treviño LK. Ethical decision making in organizations: A person-situation interactionist model. The Academy of Management Review. 1986;11:601–617. [Google Scholar]
  86. Treviño LK, Weaver GR, Reynolds SJ. Behavioral ethics in organizations: A review. Journal of Management. 2006;32:951–990. [Google Scholar]
  87. Weeks WA, Moore CW, McKinney JA, Longenecker JG. The effects of gender and career stage on ethical judgment. Journal of Business Ethics. 1999;20:301–313. [Google Scholar]
  88. Wexley KN, Latham GP. Developing and training human resources in organizations. 3rd ed Prentice Hall; Upper Saddle River, NJ: 2002. [Google Scholar]
  89. Wilkinson L, the Task Force on Statistical Inference Statistical methods in psychology journals: Guidelines and explanations. American Psychologist. 1999;54:594–604. [Google Scholar]

RESOURCES