Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Oct 1.
Published in final edited form as: Account Res. 2020 Dec 8;28(7):428–455. doi: 10.1080/08989621.2020.1855149

Standards of Evidence for Institutional Review Board Decision-Making

David B Resnik 1
PMCID: PMC8184880  NIHMSID: NIHMS1657667  PMID: 33231115

Abstract

A standard of evidence is a rule or norm pertaining to the type or amount of evidence that is required to prove or support a conclusion. Standards of evidence play an important role in institutional review board (IRB) decision-making, but they are not mentioned in the federal research regulations. In this article, I examine IRB standards of evidence from a normative, epistemological perspective and argue that IRBs should rely on empirical evidence for making decisions, but that other sources of evidence, such as intuition, emotion, and rational reflection, can also play an important role in decision-making, because IRB decisions involve an ethical component which is not reducible to science. I also argue that an IRB should approve a study only if it has clear and convincing evidence that the study meets all the approval criteria and other relevant, ethical considerations; and that for studies which expose healthy volunteers to significant risks, an IRB should require that evidence be more than clear and convincing as a condition for approval. Additional empirical research is needed on how IRBs use evidence to make decisions and how standards of evidence influence IRB decision-making at the individual and group level.

Keywords: institutional review boards, decision-making, standards of evidence, empirical evidence, intuition, consistency, ethics, science

1. Introduction

The Institutional Review Board (IRB) is the focal point of oversight of research with human subjects in the US.1 Federal regulations require IRB oversight for research with human subjects that is federally funded, is being submitted to the Food and Drug Administration (FDA) to support an application for a regulated product (such as an investigational drug or medical device), or is being submitted to the Environmental Protection Agency (EPA) for consideration in its decision-making concerning pesticide use (Resnik 2018).2 The regulations define IRB membership, functions, and operations; describe record-keeping and reporting responsibilities; distinguish between full board, expedited, and limited review; and specify criteria for IRB approval of research. The IRB has the authority to approve, disapprove, audit, monitor, oversee, suspend, or terminate research with human subjects (Food and Drug Administration 2013; Environmental Protection Agency 2013; Department of Health and Human Services 2017). Most academic institutions in the US have at least one IRB or rely on an outside IRB for review of research involving human subjects (Klitzman 2015).

While the federal regulations clearly delineate the role of the IRB in human research oversight, they grant considerable discretion to IRBs concerning the approval of studies. Since the US does not have a national IRB system3, local IRBs must decide, by majority vote, whether studies meet the approval criteria, which address key ethical issues, such as minimization of risks, reasonableness of risks, equitable selection of subjects, and informed consent (Department of Health and Human Services 2017).4 When reviewing a study, individual IRB members must consider the information contained in the application and decide to whether it meets the approval criteria.

Notably absent from the federal research regulations is any mention of standards of evidence that IRB members should use in deciding whether a study meets the approval criteria.5 A standard of evidence for an IRB would be rule or guideline concerning the type or amount of evidence that is needed to make an approval decision.6 The regulations leave it to IRBs to decide whether they have sufficient evidence to approve a study.7

Not including standards of evidence for IRB determinations is an important omission from the federal regulations, because standards of evidence used in making a decision can dramatically impact its outcome. For example, under US criminal law, guilt must be proven beyond reasonable doubt. If the standard of evidence in criminal cases were weaker than beyond reasonable doubt, millions of people would be convicted of crimes who would not have been otherwise.

Standards of evidence can also affect the outcome of IRB deliberations, since the same research proposal might be approved by one IRB but not approved by another if they use different standards of evidence. Numerous studies have documented inconsistency and variability among IRB decisions (Goldman and Katz 1982, Hirshon et al. 2002; McWilliams et al. 2003; Shah et al. 2004; Dziak et al. 2005; Green et al. 2006; Peterson et al. 2012). For example, Green et al (2006), submitted the same observational health services study to 43 different IRBs as part of a collaborative research project. Ten IRBs reviewed the study on an expedited basis8; 31 gave the study full board review; one decided that the study was exempt9 from IRB review; and one refused to approve the study because it judged it to be too risky (Green et al. 2006).

There are some plausible explanations of inconsistency and variability of IRB review. Chief among these is that IRBs make different decisions about the same study because they interpret and apply key terms in the regulations, such as ‘risk,’ ‘benefit,’ and ‘consent,’ differently (Edwards et al. 2004; Shah et al. 2004; Klitzman 2011a). Other factors that may lead to inconsistent IRB review include differences related to moral and cultural values; scientific knowledge and expertise; and education and training related to human research regulations an oversight (Silberman and Kahn 2011; Resnik 2014; Friesen et al. 2019). However, another explanation, which cannot be ruled out at present, is that inconsistency and variability occur, in part, because IRBs use different standards of evidence for making decisions.

While inconsistency in IRB decision-making is not inherently unethical and may simply reflect the differences of opinion and interpretation that inevitably arise when human beings deal with complex issues at the intersection of science, law, and ethics (Edwards et al. 2004; Resnik 2014; Friesen et al. 2019), it may nevertheless be an indicator of more significant problems with IRB decision-making, such as under-protection or overprotection of human research subjects (Resnik 2014). Under-protection is an ethical and compliance10 problem because it can threaten the rights and welfare of research participants, and overprotection is an ethical problem because it can prevent socially valuable research from moving forward (Resnik 2014; Schneider 2015).11

A second reason for exploring the topic of standards of evidence in IRB decision-making in greater depth is that one might argue that IRB decisions related to risks and benefits should be based largely on empirical evidence12 rather than on subjective feelings, emotions, or intuitions.13 In 1979, the authors of the Belmont Report argued that oversight committees should use empirical data to assess risks and benefits systematically (National Commission for the Protection of Human Subjects in Biomedical and Behavioral Research 1979). Numerous other commentators have echoed this view (Meslin 1990; Weijer 2000; London 2006; Wendler et al. 2005; Rid et al 2010; Resnik 2018). However, several studies have shown that IRB members often make risk/benefit decisions based on feelings, emotions, intuitions, or personal experiences (Van Luijn et al. 2002; Stark 2012; Klitzman 2011, 2015). While it is neither possible nor desirable to eliminate subjective elements from IRB decision-making, one might argue that IRB decision-making should be more empirically-based than it currently is (Pritchard 2011; Anderson and DuBois 2012; Resnik 2018).14

A third reason for exploring the topic of standards of evidence in IRB decision-making in greater depth is that one might claim that different standards should apply to different decisions. Numerous commentators have argued that it would be ethical to conduct controlled human infection experiments (or challenge studies) on young, healthy adult volunteers to generate data to help accelerate the development of vaccines for acute respiratory syndrome–coronavirus 2 (SARS-CoV-2), the pathogen responsible for the COVID-19 pandemic (Eyal et al. 2020; Schaefer et al. 2020; World Health Organization 2020). While these experiments could potentially save tens of thousands of lives by reducing the time for vaccine development (Eyel et al. 2020), they have been ethically controversial because they would involve intentionally exposing human subjects to SARS-CoV-2 to test vaccine efficacy.15 While the mortality and morbidity risks associated with COVID-19 are low for the proposed study population, they are not trivial. According to Salje et al. (2020), for healthy adults, age 20–29, the risk of dying after becoming infected with SARS-CoV-2 is about 0.007% (or 7 deaths per 10,000 infections). Shah et al. (2020) argue that COVID-19 challenge studies be conducted only if they have demonstrated high social value, such that the knowledge they produce would likely be used by vaccine developers to move vaccines more quickly into Phase III clinical trials. While most people share the moral intuition that evidence for the expected social value for studies like these should be compelling, this raises the question of whether evidence for expected social benefits should be compelling for all studies involving human subjects. Should IRBs demand more evidence for approving a study that exposes human subjects to significant risks without offering them any direct benefits than they would for other types of research, such as minimal risk surveys or sample collection protocols?

For the three reasons elaborated above, I believe that standards of evidence for IRB decision-making is an important topic in human research ethics and oversight that merits further investigation. While there is a growing and informative literature on IRB decision-making,16 no published articles, to my knowledge, directly address the topic of standards of evidence used by an IRB or its members.

Investigations into IRB standards of evidence can be divided into descriptive and normative queries:17

  • Descriptive (psychological, behavioral) Questions: What standards of evidence do IRBs use? What standards of evidence do IRB members endorse or believe they ought to use? How often do IRBs make decisions based on empirical evidence? How often do IRBs make decisions based on feelings, emotions, or intuitions? How confident are IRB members in the decisions they make? How do IRB members respond to lack of evidence? Are there variations in standards of evidence? What factors are associated with variations? Do IRBs use different standards for different types of research? Do members on the same IRB use different standards? What is the relationship between standards of evidence and group dynamics related to decision-making? Normative (epistemological) Questions: What standards of evidence should IRBs use? Should IRB decision-making be empirically based? Should IRBs rely on emotion, feeling, or intuition in decision-making? How much evidence should IRBs require to make decisions? Should IRBs use different standards of evidence for different types of research?

In this article, I will focus on normative18 questions concerning standards of evidence for IRB review. Hopefully, my normative analysis will provide some impetus and insight for investigating the descriptive19 questions empirically. I will defend two main points: 1) IRBs should, for the most part, rely on empirical evidence when approving studies, but other sources of evidence, such as intuition, emotion, or rational reflection, can also play an important role in these determinations; 2) IRBs should use the clear and convincing evidence standard for making approval decisions, except when the study under review will expose healthy volunteers to significant risks, in which case they should approve the study only if evidence is more than clear and convincing.

2. Evidence and IRB Decision-Making

While an IRB is not a court of law, the legal analogy may be useful for thinking about evidence and IRB decision-making.20 Burden of proof is an essential concept in litigation and jurisprudence. In US criminal law, for example, the prosecution must produce evidence that shows the defendant committed the crime(s) he or she is charged with, and the defense only needs to raise reasonable doubts about the prosecution’s case. The trier of fact, in most cases a jury, must decide whether the prosecution has proven its case beyond a reasonable doubt. In the IRB setting, the investigator has the burden of proof because he or she must produce (or submit) evidence to convince the IRB to approve his or her study. Although the investigator does not have an adversary in the IRB room, IRB members may ask the investigator questions about his or her study and request the investigator to provide additional information if his or her application is incomplete or not adequately documented.

Most of the evidence that IRBs consider consists of information directly related to regulatory and ethical issues (Stark 2012). In the US, IRBs make approval decisions based on criteria stated in the federal regulations. The criteria stipulate necessary conditions for approval; that is, an IRB should not approve21 a study unless it determines that it meets all the criteria. The criteria require that: 1) risks to subjects are minimized; 2) risks to subjects are reasonable in relation to anticipated benefits to subjects or society; 3) selection of subjects is equitable; 4) informed consent will be sought from prospective subjects or appropriately waived; 5) informed consent will be appropriately documented or appropriately waived; 6) data will be monitored to ensure the safety of subjects (when appropriate); 7) privacy and confidentiality will be protected (when appropriate); and 8) additional safeguards are in place to protect vulnerable subjects (Department of Health and Human Services 2017).22

Since the regulations do not cover every issue that may impact the rights or welfare of human subjects, most IRBs go beyond their regulatory mandate23 and apply ethical guidelines, such as the Belmont Report’s principles24, to their decisions (Emanuel et al. 2000; Klitzman 2015; Resnik 2018). For example, many IRBs require researchers to protect third parties, such as family members or communities, from research risks, even though the regulations do not address risks to third parties (Resnik and Sharp 2006; Shah et al. 2018).

Investigators typically submit several documents to IRBs, including: the research protocol, consent form(s), investigators’ curriculum vitae, advertisements, survey instruments (for survey research), investigator’s brochure (for drug or device research), and approvals from other institutional committees, such as scientific review, radiation safety review, and conflict of interest review.

Most of the information needed to decide whether to approve a study will be contained in the protocol and consent form. A protocol25 usually includes information about previous research on the topic; the rationale for the current study; the research design, methods, procedures, test, interventions, and outcome measures or endpoints; study timelines; statistical considerations, such as sample size and statistical tests; inclusion and exclusion criteria for participating in the research; planned enrollment size; risks of the research and safety measures; benefits of the research; processes for recruitment, enrollment, and consent; protections for privacy and confidentiality; considerations related to vulnerable subjects: children, fetuses26, mentally disabled adults; withdrawal of participants; and payments to participants.

Consent forms27 summarize information about the study contained in the protocol and include additional information that a reasonable person would want to know, including information about the research subject’s legal rights. The US federal regulations specify nine types of information that consent documents must contain and nine types of information that they should contain, if appropriate (Department of Health and Human Services 2017). Consent forms should be written in language that is understandable to the subject and should not include any statements that require participants to waive their legal rights (Department of Health and Human Services 2017).

Before the meeting, IRB members read the documents submitted by the investigator. During the meeting, IRB members review and discuss the research. Most IRBs use a primary reviewer system. The primary reviewer summarizes the study for other IRB members, identifies regulatory or ethical problems or concerns, and makes recommendations. The IRB may also ask the investigator some questions about his or her study, if the investigator has been invited to the meeting (Klitzman 2015).

In addition to the information provided by the investigator, an IRB may consider other types of information relevant to its decision, such as approvals from other committees, including the radiation safety committee, scientific review committee, and conflict of interest committee. An IRB may also consider information pertaining to local laws, values, and traditions, and the needs, interests and concerns of the population or community being studied. Often, IRBs rely on community members on the board to provide them with information about the local population or community. IRBs may rely on scientific members to provide the board with information pertaining to the scientific merits of a study, and physician, nurse, or pharmacist member to provide the board with information concerning medical risks and benefits. If an IRB requires additional information, it may consult with an expert in the subject matter (Stark 2012).28

IRB members must consider and process all the types of information mentioned above when deciding whether to approve a proposed study. If the IRB does not approve a study, it may require the investigator to revise the protocol, consent form, or other documents as a condition for obtaining approval.

3. How Do IRBs Make Decisions?

As mentioned earlier, there is a growing literature that attempts to peer inside the “black box” of IRB decision-making.29 Robert Klitzman, for example, has published valuable studies of how IRBs make decisions about consent forms, undue influence, social risk, and scientific design (Klitzman 2012a,b; 2013a,b,c,d; 2014; 2015). Klitzman has also published studies of the social dynamics involved in IRB decision-making, such as interactions between the chair and IRB members, contributions of community members, and how IRBs deal with conflicts of interest (Klitzman 2011a,b; 2012b; 2015). Other noteworthy research includes Pritchard’s philosophical reflections and informed observations30 on IRB decision-making (2011), Tait et al.’s study of how IRB members make judgments concerning the decision-making capacity of prospective research participants; Stark’s (2012) ethnography of how IRB members interact with each other and the larger society; Clapp et al.’s (2017) analysis of IRB decision letters to investigators; and Grinnell et al.’s (2017) study of IRB members’ confidence pertaining to risk assessments.

While these and other empirical studies have shed a great deal of light on IRB decision-making, they have not focused on how standards of evidence affect IRB judgment and decision-making. To be sure, several studies (e.g. Van Luijn et al. 2002; Stark 2012; Klitzman 2015) have addressed the use of empirical evidence in IRB decision-making, but these studies have not focused on how reliance on empirical evidence functions as a normative standard in IRB decision-making or impacts the thinking of IRB members. How often do IRB members use empirical evidence in making decisions? Do they believe that they should use empirical evidence? How do IRB members incorporate non-empirical evidence, such feelings or intuitions, into their decision-making? How much evidence do IRB members require to make approval decisions? These are the types of psychological and sociological questions that an empirically-oriented investigation of standards of evidence should address.

As stated earlier, the questions I seek to address in this article are normative and epistemological, not descriptive and psychological or sociological. I am interested in what sorts of standards of evidence IRB members ought to use. To make some headway into answering these questions, it will be useful to says a few words about standards of evidence.

4. What Are Standards of Evidence?

A standard of evidence is a rule (or norm) pertaining to the type (or quality) or amount (or quantity) of evidence that is required to prove a conclusion (Chisholm 1989).31 Logically speaking, we can view each item of evidence as a premise in an inductive or deductive argument. Evidence in an inductive argument can show that a conclusion is highly probable, but it cannot show that a conclusion is certain; certainty can only be achieved by means of a valid, deductive argument (Copi et al. 2013).32 For example, suppose I flip a coin 10 times and it comes up heads 10 times. I might inductively conclude, based on this evidence, that the coin is biased in favor of heads. I cannot prove this conclusion with certainty, however, because the coin could come up tails 10 times if I flip it another 10 times.33 A standard of evidence could tell me how many times I must flip the coin to be justified in concluding that it is (or is not) biased (Goldman 1986; Giere et al. 2005). A standard of evidence could also tell me what type of evidence I may consider in deciding whether the coin is biased. For example, I could ask a metallurgist to examine the coin to determine whether it is heavier on one side. A standard of evidence could tell me whether I should rely on the metallurgist’s expert opinion to decide whether the coin is biased.

4.1. Legal Standards of Evidence

There are well-developed standards of evidence in the law, science, and medicine. In the law, there are many different standards of evidence that apply to different legal decisions (Loevinger 1992). Three commonly used standards pertaining to the amount of evidence needed to accept a conclusion are (in increasing order of stringency): preponderance of evidence, clear and convincing evidence, and beyond reasonable doubt (Rothstein et al. 2011). The preponderance of evidence standard means that the evidence shows that the conclusion (e.g. a defendant is at fault for negligence) is more likely true than not. Preponderance of evidence is used in civil lawsuits, such as contracts and torts, in which only money is at stake. In civil litigation that may involve restrictions of liberties (such as cases involving restraining orders, child custody, or involuntary commitment), courts use a stronger standard of evidence known as clear and convincing evidence. Evidence is clear and convincing if it shows that the conclusion is substantially more likely true than not. The beyond reasonable doubt standard is used in criminal law. For evidence to meet this standard, it must be so convincing that a reasonable person would doubt that the conclusion that the defendant is guilty (Rothstein et al. 2011).

One could, in theory, formulate these standards quantitatively in terms of probabilities. For example, preponderance of evidence would be probability > 50%; clear and convincing evidence, probability ≥ 75% and beyond reasonable doubt, probability ≥ 95% (Weinstein and Dewsbury 2006). However, viewing legal standards of evidence as quantitative is misleading because these standards are used to guide laypeople (e.g. judges and jurors) in decision-making, and laypeople may not have a firm grasp of the meaning of probability as it applies to legal cases. Accordingly, these standards of evidence should be understood as qualitative, not quantitative.

The law also includes standards of evidence concerning the type of evidence that may be used to prove a conclusion. These standards, known as rules of evidence, come into play when judges decide whether to admit evidence into court. Some of these rules address the admissibility of hearsay evidence, expert testimony, character evidence, confessions, testimony by spouses, and evidence that may be unfairly prejudicial or misleading (Loevinger 1992). The law also has rules concerning the burden of proof, but I will not characterize these rules as standards of evidence because they address the obligations of parties to produce evidence, not the standards that evidence must meet.34

Although legal evidence is not nearly as objective or rigorous as scientific evidence, it is, for the most part, empirical, because it is based on the testimony of witnesses who have observed events in the world or experts who have special knowledge that is derived from scientific methods (Loevinger 1992). Business, medical, law enforcement, or other documents that record observations are also frequently used as sources of evidence (Rothstein et al. 2011).

4.2. Scientific Standard of Evidence

Science has many different standards of evidence, depending on the type of discipline (e.g. biology, chemistry, physics, psychology) and the nature of methodology used (e.g. experimental, descriptive, quantitative, qualitative). One thing these standards all have in common is a commitment to empiricism, i.e. the idea that knowledge should be based on observation, testing, and experiment, not on subjective feelings, emotions, intuitions, or impressions (Kitcher 1993; Haack 2003; Giere et al. 2005). Thus, a general standard of evidence common to all sciences is that evidence used to support hypotheses or theories should be empirical.35

Scientists use a variety of quantitative methods to evaluate empirical evidence (Giere et al. 2005). Frequentist statistical testing involves sampling data from populations and using a statistical test decide whether to reject the null hypothesis, i.e. the hypothesis that there is no difference between sampled populations with respect to a variable of interest. The p-value (or significance value) for a statistical test is the probability that the test result is due to chance. A commonly used standard for rejecting the null hypothesis is a p-value of 0.05 or less. For example, suppose that an investigator conducts a randomized, controlled trial (RCT) that compares a new drug to a placebo and uses a Chi-square test to analyze the data. If the Chi-square test shows that the new drug is three times more effective than the placebo, with p = 0.05, the investigator could reject the null hypothesis and conclude that the drug is more effective than the placebo (Weiss 2011).

Despite the widespread popularity of the frequentist statistical testing, many scientists use other methods of data analysis. The reasons for this are twofold. First, in many areas of science frequentist statistical testing is neither practical nor helpful, due to the nature of the phenomena under investigation, the goals of research, or other factors. The hypothesis that the SARS-CoV-2 virus originated in horseshoe bats, for example, cannot be proven or disproven using frequentist statistical testing methods, because this is a single, historical event that is not likely to be repeated anytime soon. Scientists have obtained evidence relevant to this hypothesis by comparing the genome of SARS-CoV-2 to genomes of coronaviruses found in horseshoe bats (Lau et al 2020).

Second, many statisticians and scientists are critical of the use of p ≤ 0.05 as a criterion for hypothesis acceptance, because 1) 0.05 is an arbitrary number that may not accurately reflect the total weight of evidence for or against a hypothesis; 2) with a large enough dataset it is almost always possible to find statistically significant but scientifically meaningless associations; and 3) using 0.05 as standard for publication may discourage researchers from publishing interesting or important results where p > 0.05 and may encourage researchers to manipulate data analysis to obtain desirable p-values (a.k.a. “p-hacking”), which can lead to the publication of irreproducible results (Amrhein et al. 2019).36

In the last few decades, the Bayesian approach to evaluating evidence has emerged as an alternative to frequentist statistical testing. The Bayesian approach is based Bayes’ theorem, which is a formula for calculating conditional probabilities based information about other probabilities. For Bayesians, the probability of a hypothesis, given the evidence, is a function of the prior probability of evidence, given the hypothesis; the prior probability of the hypothesis; and the prior probability of the evidence (Howson and Urbach 1989).37 A hypothesis (H1) is compared not to the null hypothesis but to an alternate hypothesis (H2), so evidence will affect the probability of H1 relative to H2. For example, suppose that H1 is “SARS-CoV-2 originated in horseshoe bats” and that H2 is “SARS-CoV-2 originated human beings.” Suppose the evidence is that the SARS-CoV-2 genome has a 96% homology to coronavirus genomes found in the horseshoe bat and 75% homology to the human coronavirus genomes (Lau et al. 2020). One could use Bayes’ formula to calculate the probability of these alternative hypotheses, given the evidence and prior probabilities. Bayesians hold that scientists should update probabilities as they acquire new evidence, so that the probability of a hypothesis may change in response to the data (Howson and Urbach 1989).

Although many scientists have adopted the Bayesian approach to hypothesis testing, a well-known shortcoming of this method is that it relies on estimates of prior probabilities, which may not be known (Earman 1992). Many Bayesians hold that one can make subjective estimates (or educated guess) of unknown prior probabilities, but subjective estimates may be biased, so that final results could also be biased. For example, if one estimated that the prior probability of H1 was only 10%, the evidence might not increase its probability to more than 50%. The Bayesian reply to this objection is that repeated testing will eliminate biased estimates so that, in the long run, probabilities will reflect the evidence, not biases (Howson and Urbach 1989).

An entirely different approach to evaluating evidence is to supplement quantitative methods of data assessment with qualitative, epistemic criteria when deciding whether to accept or reject a hypothesis or theory (Kuhn 1977; Thagard 1988; Kitcher 1993; Haack 2004; Giere et al. 2005). One reason for using this approach is that science often involves the development of complex causal theories and models, which cannot be tested using frequentist or Bayesian methods (Giere et al. 2005). For example, the hypothesis that childhood exposure to allergens and pathogens can protect against asthma, allergies, and autoimmune diseases by teaching the immune system to distinguish between self and non-self, also known as the hygiene hypothesis, is a general, causal theory that cannot be proven or disproven using frequentist or Bayesian testing methods. To evaluate this theory, scientists must weigh and consider evidence from a variety of disciplines, including epidemiology, immunology, microbiology, pulmonology, and rheumatology (Stiemsma et al. 2015). Some of the epistemic criteria that scientists use to evaluate theories and models include: testability (is the theory testable?), empirical adequacy (does theory fit the data well?), simplicity (is the theory simple or parsimonious?), precision (does the theory make precise predictions?), generality (is the theory widely applicable?), robustness (is the theory supported by independent sources of evidence?), fruitfulness (does the theory open up new areas of inquiry?), and explanatory power (does the theory unify disparate phenomena?) (Kuhn 1977; Thagard 1988; Kitcher 1993). Scientists could use these criteria when deciding whether to accept, reject, or modify the hygiene hypothesis.

4.3. Medical Standards of Evidence

Finally, we should consider medical standards of evidence. Since the time of Hippocrates (460–370 BCE), medicine has aspired to be an empirical and scientific discipline. Although Hippocratic medicine was more scientific than the superstitious practices it replaced, it was for the most part observational, not experimental. During the Scientific Revolution, physicians realized that medicine did not have the degree of rigor found in the “hard” sciences such as physics, chemistry, and astronomy. Responding to the need to set medicine on a scientific footing, William Harvey (1578–1657) and Claude Bernard (1813–1878) pioneered the use of experimental and quantitative methods in medicine (Porter 1998). However, physicians have not always lived up to this high standard. Treatment recommendations have often been based on traditional ways practicing medicine learned during medical education and training, clinical observations gained during medical practice, and folk wisdom, not on scientific evidence derived from well-designed experiments (Sackett et al. 2000).

In the 1990s, physicians who were concerned that treatment recommendations in medicine were often not based on good evidence developed an approach to medical decision-making known as evidence-based medicine (EBM). According to EBM, medical decisions should be based on the best available evidence (Sackett 1989; Sackett et al. 2000; Burns et al. 2011). One of the pioneers of EBM, David Sackett (1989), distinguished between five levels of evidence (listed from best to worst): evidence from large RCTs with clear results, followed by: smaller RCTs with unclear results and cohort and case-control studies; historical cohort or case-control studies; and case series or studies with no controls (Sackett 1989). Others have modified Sackett’s approach to levels of evidence. According to the Oxford Centre for Evidence-Based Medicine, the highest level of evidence comes from systematic reviews of RCTs, followed by: RCTs or observational studies with dramatic effects; non-randomized controlled cohort studies; case series, case control, or historically controlled studies; and mechanism-based reasoning (Oxford Center for Evidence Based Medicine, Levels of Evidence Working Group 2011). Although EBM has become highly influential since it was conceived in the 1990s, physicians have been slow to adopt evidence-based practices for political, social, and economic reasons (Patashnik et al. 2017).

5. IRB Standards of Evidence

5.1. Type of Evidence

We now turn to question that is the central focus of this paper: what standards of evidence should IRBs use? Let’s begin with the question concerning standards related to the type of evidence. As noted earlier, several commentators have argued that IRB decisions should be based on empirical evidence, rather than intuition, feeling, or emotion (Meslin 1990; Rid et al 2010; Pritchard 2011; Anderson and DuBois 2012). There are several arguments for this position. First, many IRB decisions are grounded in descriptive claims relating to soundness of research design, feasibility of studies, minimization of risks, expected risks and benefits, adequacy of consent, and the effectiveness of the consent process and confidentiality protections. Since these are descriptive, factual claims, they should be supported by empirical evidence. For example, whether an experimental drug is likely to produce anemia as a side-effect is a descriptive statement best answered by empirical evidence, as is the question of whether iron supplements can minimize this risk. Whether a consent form would be understandable by an adult with a 7th grade reading level is a question that can be answered empirically, as is the question of the mental capacity needed to consent to study participation.

Second, relying on empirical evidence in decision-making can help reduce inconsistency and variability by making IRB determinations more objective and rational (Anderson and DuBois 2012). For example, if IRBs use empirical evidence to estimate the risks of skin biopsies less than 3cm in diameter, they are less likely to disagree about classifying these procedures as minimal risk or more than minimal risk.

Third, relying on empirical evidence can enhance the accountability and transparency of IRB decisions by allowing IRBs to cite clear, objective reasons for their decisions in communications with investigators and oversight agencies. Various commentators have faulted IRBs for their lack of accountability and transparency (see Schrag 2010 and Schneider 2015, for example), and making IRB decisions more empirically-based could help dispel this criticism.

While I find these arguments to be compelling, they have important limitations because IRB decision-making synthesizes scientific and ethical considerations. IRB decision-making includes a descriptive, factual component as well as a normative, ethical component (Resnik 2017). While empirical evidence can be extremely useful in addressing the scientific aspects of IRB decision-making, it cannot settle the ethical dilemmas related to those decisions because ethics is not reducible to empirical science. To be sure, empirical research can inform ethical decision-making by helping us understand the consequences and feasibility of different choices and the social practices that generate and reinforce value commitments, but it does not provide us with a hegemonic source of moral value (Kon 2009). To make ethical decisions, we must not only consider the facts and circumstances related to our choices, but also our moral values, principles, or frameworks, which may be derived, at least in part, from intuition, emotion, culture, religion, or rational reflection (Audi 2005; Beauchamp and Childress 2012; Greene 2013; Resnik 2017).38

Consider, for example, IRB decisions related to risks and prospective benefits. Claims about risks and prospective benefits are predictions about what is likely to happen, based on empirical evidence. Risk is a function of the probability and magnitude (or severity) of harm, and prospective a benefit is a function of the probability and magnitude (or worth) of the benefit (Levine 1988; Wendler et al. 2005; Rid et al. 2010; Resnik 2018). Science can provide us with evidence concerning probabilities, but it cannot tell us how to compare benefits and harms. Concerning the COVID-19 vaccine challenge study mentioned above, empirical evidence can tell us how likely it is that various adverse (or bad) outcomes (such as illness, hospitalization, or death) and desirable (or good) outcomes (such as shortening the length of time for vaccine development) will occur, but it cannot tell us how to decide whether the prospective benefits are worth the risks, i.e. are reasonable or acceptable, given the risks. Risk/benefit decisions have an inherently normative (or value) component that cannot be reduced to descriptive facts (Shrader-Frechette 1991; Hannson 2003).39

Decisions related to consent also involve value judgments. The regulations require that informed consent take place under conditions that minimize the potential for coercion and undue influence (Department of Health and Human Services 2017). IRB members are often concerned that offering prospective subjects too much money for research participation could be coercive or unduly influential (Largent et al. 2012). While empirical research can provide IRB members with evidence of how money affects judgments about research risks and the decision to enroll in research (for review of the evidence, see Largent and Fernandez-Lynch 2017), it cannot tell them whether (or when) financial influences on judgment and decision-making constitute coercion or undue influence, because this is a normative/philosophical issue, not an empirical one (Wertheimer and Miller 2008; Millum and Garnett 2019). Deciding whether an offer of money constitutes coercion or undue influence depends, in part, on our moral intuitions pertaining to these concepts (Resnik 2019).

I will also assert, but not argue in this paper, that IRB decisions concerning risk minimization, equitable subject selection, privacy and confidentiality protections, and safeguards for vulnerable subjects are similar to judgments about risk/benefit and consent because they involve a value component. In sum, while there are strong arguments that IRBs should rely on empirical evidence to make decisions, other sources of evidence, such as intuition, emotion, culture, or rational reflection, can also play an important role decision-making because IRB decisions are not purely descriptive or factual.

5.2. Amount of Evidence

Concerning the amount of evidence that IRBs need to make a decision, three arguments support using legal standards, as opposed to scientific or medical ones. First, the evidence that the IRB considers in making its determinations may not be completely quantifiable because, as discussed above, it is likely to include value judgments that are not reducible to scientific data or facts. As a result, quantitative methods of assessing evidence, such as those used in science and medicine, may not apply to these determinations. For example, evidence for a statement that “The risks of study X are reasonable in relation to the benefits to subjects or the importance of the knowledge expected to be gained” cannot be assessed using frequentists or Bayesian methods because it involves making a value judgment concerning the comparison of risks and benefits.40 Likewise, evidence for the statement that “Subjects selection for study X is equitable” cannot be assessed using quantitative methods because it involves values judgments concerning fairness or justice.

Second, even when evidence is quantifiable, the IRB may lack sufficient information to use quantitative methods to assess it, due to lack of published (or publicly available) research on the topic in question or insufficient time or resources to conduct a thorough search of the literature. For example, consider risk/benefit assessment for a Phase I study of a new drug. While preclinical studies on laboratory animals can provide investigators with some evidence concerning the probability that the drug will adverse effects (such as liver toxicity) on human subjects at a particular dose, the evidence is often not strong enough to support reliable estimate of probabilities related to these effects, using frequentists or Bayesian methods. Indeed, the main reason for conducting a Phase I study is to learn more about the risks of the drug in human beings. Animal data can provide some evidence of risk, but it often falls short of the mark, due to differences between animals and humans and other scientific issues (Kimmelman 2009). Furthermore, probability estimates concerning potential benefits of new drugs are usually highly speculative, due to lack of evidence. While Phase I safety trials of new drugs are widely regarded as socially beneficial because they are an essential step in drug testing and development, only about 10–12% of new drugs that enter Phase I testing are ultimately approved by the Food and Drug Administration (Seiffert 2016). Moreover, many of these drugs have marginal benefits because they are intended to treat conditions for which there is already an effective medication (Goozner 2004; Spector 2005).

Third, non-scientist members on the IRB may not be trained in quantitative methods of evaluating evidence used in the sciences or medicine. Although scientific members of the IRB may be familiar with these methods and use them in forming their own judgments concerning the approval criteria and conveying their opinions to the board, non-scientists members may not be able to understand these methods of use them appropriately. If we assume that all members of the board should use the same standards for assessing of evidence when making group decisions to avoid irresolvable conflicts and misunderstandings,41 it follows that the IRB should use legal standards for assessing the amount of evidence, because both scientists and non-scientists can understand and use these standards. Legal standards provide a qualitative, easily understandable way of assessing empirical evidence.

If IRBs should use legal standards for determining the amount of evidence needed to make a decision, which standards should they use: preponderance of evidence, clear and convincing evidence, or beyond reasonable doubt?

Preponderance of evidence would set too low of a standard for IRB decision-making, since it would allow an IRB to approve a study if it is convinced that the evidence shows that it is more likely than not that the study meets the criteria for approval. As noted above, the law uses standards more demanding than preponderance of evidence, such as clear and convincing or beyond reasonable doubt, when the decisions with significant implications for human rights. Since most of the determinations that an IRB makes have significant implications for the rights of research subjects, such as the right to consent to research, the right to privacy, and the right to not be exploited (Katz 1993), an IRB should, following the legal analogy, use a standard of evidence more demanding than preponderance of evidence. An IRB should not approve a study unless it determines that it is highly likely that risks will be minimized, that consent will be sought or appropriately waived, or that confidentiality and privacy will be protected.

While preponderance of evidence would set too low of a standard, beyond reasonable doubt would set too high of a standard, since one can often raise reasonable doubts about IRB determinations, especially those related to the reasonableness of risks (Grinnell at al. 2017). Investigators often do not know, with a high degree of confidence, that their research is likely to benefit participants or contribute significantly to the advancement of knowledge or otherwise benefit society. Most consent documents used in clinical trials include a phrase stating that “you may not benefit,” or some wording to that effect (Joffe et al. 2001).42 Expected benefits to society often fail to materialize because a study may not achieve its aims and objectives, due to difficulties with recruitment, implementation, or other problems; or because the research results are not applied to practical problems in medicine, public health, or other disciplines. Also, as noted above, physicians have been slow to implement EBM in medical practice, which diminishes the social benefits of medical research.

Since preponderance of evidence sets too low of a bar and beyond reasonable doubt sets too high of a bar, it follows that the clear and convincing evidence standard is the best standard for IRB decision-making, since it is neither too low and nor too high. As discussed previously, the clear and convincing evidence standard applies to civil litigation that involves significant human rights issues, which is certainly the case in IRB decisions. One could argue, therefore, that to adequately protect the rights of human research participants an IRB should approve a study only when it has clear and convincing evidence that the study meets all of the regulatory criteria for approval, as well as other, relevant ethical guidelines, such as the Belmont Report’s principles.

6. Variations in Standards of Evidence

In this section, will to return to a question I posed earlier: should an IRB require more evidence for some types of research?43 Recall the proposed COVID-19 vaccine challenge study described earlier. Should an IRB approve this study only if it has more than clear and convincing evidence that the study meets the regulatory approval and other ethical guidelines? One could argue that for a study like this one, an IRB should use a higher standard of evidence than it uses for other studies because so much is at stake for the human participants and the research enterprise. The participants are healthy volunteers who face significant risks, including the risk of serious illness, prolonged hospitalization, or even death. Moreover, they are not expected to derive direct, medical benefits from their participation.44 The research enterprise also has a great deal at stake because if healthy volunteers die as a result of their participation in the experiment, the public’s trust in the research enterprise in general, and vaccine research in particular, could be significantly eroded (Resnik 2012). While a study like this one could produce significant social benefits by accelerating vaccine development, one might argue that the evidence that the risks of research are reasonable in relation to its expected benefits should be more than clear and convincing.45 Likewise, evidence for other approval criteria, such as risk minimization, consent, and equitable subject selection, should also be more than clear and convincing.

The idea that an IRB should require more evidence to approve some types of studies may sound unorthodox to some, but it has analogs in the law and in the philosophy of science. As noted above, there are different legal standards of evidence depending on what is at stake: preponderance of evidence is used in civil litigation when only money is at stake, but clear and convincing evidence is used when human rights are impacted. Beyond reasonable doubt is used in criminal law, because a conviction can lead to imprisonment or even death (in some jurisdictions). Since the 1950s, philosophers of science have argued that standards of evidence in research should vary depending on what is at stake (Douglas 2000; Elliott 2017). The originator of this idea was Richard Rudner (1953), who argued that the degree or amount of evidence that is needed to accept a hypothesis should depend on the consequences of mistakenly accepting the hypothesis. We should require more evidence to accept a hypothesis concerning the safety of a new drug, for example, than to accept a hypothesis concerning the mating behavior of North American Mountain Goats, because the consequences of mistakenly accepting the former are potential worse than the consequences of mistakenly accepting the former.

7. Objections and Replies

Before concluding this article, I would like to respond to two objections to my view. The first objection is that my view is too idealistic and may be out of touch with the realities of IRB decision-making. It may be the case the IRBs seldom rely on empirical evidence or that their decisions often fall way short of the clear and convincing standard of evidence. Before proposing a normative standard, we should have a better understanding of how IRBs actually make decisions.

While I agree that a normative framework for making IRB decisions should be informed by descriptive data concerning how IRBs make decisions, it does not follow that one cannot develop normative recommendations independently of descriptive research studies. Normative and descriptive inquiries can proceed concurrently and collaboratively. Indeed, one reason why I have written this paper is to stimulate interest in descriptive, empirical investigations of IRB decision-making. That being said, the primary aim of normative epistemology is to establish ideal standards for forming beliefs and making judgments and decisions. While these standards should provide us with guidance that can improve our belief formation, judgment, and decision-making, these standards should not be so far removed from that they are impossible to fulfill (Goldman 1986, 1999).

A second objection to my view is that claiming that IRBs should more evidence for to approve some types of studies than others could lead to excessive variation in IRB decision-making and confusion about standards of evidence. I agree that this could be a problem but I think it can be managed if the higher standard of evidence is rarely used regulatory agencies and academic institutions provide clear guidance for using it.

8. Conclusion

In this article, I have examined IRB standards of evidence from a normative, epistemological perspective. I have argued that IRBs should, for the most part, rely on empirical evidence for making decisions, but that other sources of evidence, such as intuition, emotion, or rational reflection, can also play an important role in decision-making, because IRB decisions involve an ethical (or value) component which is not reducible to science. I have also argued that an IRB should approve a study only if it has clear and convincing evidence that the study meets all the approval criteria and other relevant, ethical guidelines; and that for studies which expose healthy volunteers to significant risks, an IRB should require that evidence be more than clear and convincing as a condition for approval. I believe that my article is the first to explore the topic of IRB standards of evidence from a normative perspective, and I encourage others to investigate this topic, especially from an empirical perspective. Additional research is needed on how IRBs make decisions based upon evidence and how standards of evidence influence IRB decision-making at the level of the individual and the group. Since the federal regulations do not address standards of evidence used in IRB decision-making, oversight agencies, such as OHRP and FDA, may consider providing additional guidance for investigators and IRBs.

Acknowledgments

This research was supported by the Intramural Program of the National Institute of Environmental Health Sciences (NIEHS), National Institutes of Health (NIH). It does not represent the views of the NIEHS, NIH, or US government. I am grateful to Emily Anderson, Ramin Karbasi, Elise Smith and David Wendler for helpful discussions. On November 9, 2020, I presented a version of this paper to the Consortium to Advance Effective Research Ethics Oversight. I am also grateful to members of this audience for helpful comments and discussions.

Footnotes

1

Other countries have institutional structures which are similar to IRBs but go by different names, such as research ethics boards (REBs) or research ethics committees (RECs).

2

In this article, I will cite regulations found in the Common Rule (45 CFR 46), which is accepted by 17 federal agencies. Regulations pertaining to IRBs adopted by the FDA and EPA are virtually identical to the Common Rule with respect to the IRB’s structure, function, and operations.

3

Over half of the nations in Africa have national IRBs or REBs (Klitzman 2012a).

4

An advantage of local IRB review is the local IRBs are likely to have better knowledge of the qualifications of investigators, the institutional research environment, and local values and cultural practices than national IRBs (Moon 2009).

5

As far as I know, the federal regulations are not unique in this regard. That is, I know of no major ethical regulation of guideline that discusses standards of evidence for approving human subjects research.

6

It is worth noting that the federal policy on research misconduct includes a standard of evidence. Research misconduct must be proven by preponderance of evidence. The federal policy applies to research funded by federal agencies or conducted to support application to the FDA (Office of Science and Technology Policy 2000).

7

In theory, oversight agencies, such as the Office of Human Research Protections (OHRP) and the FDA, could provide standards of evidence for IRBs by issuing interpretative guidance for applying approval criteria to proposed research. However, these agencies have not done so.

8

The federal regulations allow IRBs to make some decisions on an expedited basis. New studies can be reviewed on an expedited basis if they determined to be no more than minimal risk. Expedited reviews are conducted by the IRB chair or a designated IRB member (Department of Health and Human Services 2017).

9

The federal regulations state that some types of research are exempt from IRB review. Some common exemptions include research involving surveys, interviews, or focus groups, and research on de-identified data or samples (Department of Health and Human Services 2017).

10

The Office of Human Research Protections (OHRP), which oversees research funded by the Department of Health and Human Services, enforces the federal regulations by issuing determination letters to institutions for non-compliance. Institutions must take appropriate steps to comply with the regulations or OHRP may temporarily halt their funding. The FDA oversees research conducted on FDA-regulated products and enforces its regulations by issuing determination letters to IRBs. The FDA can enforce its regulations by withdrawing approval of IRBs that have registered with the agency. The FDA can also enforce its regulations by informing the sponsor that its research cannot be used in FDA decision-making because it is non-compliant (Resnik 2018).

11

In the study by Green et al. (2006) discussed above, it is likely that IRB that did not approve the health services research protocol was being overprotective.

12

By empirical evidence, I mean evidence from observation, testing, or experimentation (Goldman 1986; Chisholm 1989).

13

An intuition is a judgment or belief that is formed without conscious awareness of any reasoning process at work (Resnik 2017).

14

I use the term “empirically-based” rather than the more popular term “evidence-based” because evidence could, in theory, come from non-empirical sources, such as intuition, logical proofs, or rational arguments. Evidence in mathematics, for example, comes from logical proofs and mathematical intuitions (Resnik 2000).

15

In October 2020, the UK approved a COVID-19 vaccine human challenge trial to be conducted by Open Orphan, a commercial research organization. The UK government will also fund the study. The study is scheduled to begin in January 2021 (Callaway 2020).

16

I discuss some of this literature below.

17

For more on the relationship between descriptive and normative statements in ethics and epistemology, see Kon (2009), Goldman (1986, 1999).

18

By “normative” I also mean “evaluative” or “prescriptive.” A normative claim says what ought to be the case or what we ought to do.

19

By “descriptive” I also mean “factual,” “predictive,” or “explanatory.”

20

Schneider (2015) applies legal concepts to IRB decision-making and argues that IRBs do not adhere to important legal requirements, such as due process and accountability.

21

An IRB must apply the same regulatory criteria to approve new studies, to reapprove (or renew) previously approved studies, and to approve proposed changes (or amendments) to studies (Department of Health and Human Services 2017).

22

I am paraphrasing the regulations. For the exact wording, see Department of Health and Human Services (2017). I have omitted approval criteria for limited IRB review or broad consent.

23

Whether an IRB should exceed its regulatory mandate is ethically controversial. Some commentators argue that to avoid “mission creep” IRBs should stay within their regulatory mandate and make decisions based only on the approval criteria (Gunsalus et al. 2006; Schrag 2010). Others argue that to protect human subjects (and others) from research risks, IRBs sometimes need to make decisions based on rules or guidelines not explicitly stated in the federal regulations (Emanuel et al. 2000; Resnik 2018).

24

The Belmont Report’s principles are respect for persons, beneficence, and justice (National Commission for the Protection of Human Subjects in Biomedical and Behavioral Research 1979).

25

Many academic institutions have developed protocol templates. See Council for International Organizations of Medical Sciences (2016) for a detailed list of information to include in a protocol.

26

The federal research regulations include special protections for pregnant women and fetuses in research but do not classify pregnant women as vulnerable subjects (Department of Health and Human Services 2017).

27

Many academic institutions have consent templates. See, for example, Northwestern University Institutional Review Board (2020).

28

Interesting epistemological and ethical questions arise here concerning the IRB’s reliance on expert testimony, but I will not address them in this paper. See Selinger and Crease (2006) for further discussion of expertise.

29

Anderson and DuBois (2012) argue that IRB decision-making is like a black box because for many years it has been shrouded in secrecy since IRB meetings are confidential and many IRBs are reluctant to allow outside researchers to study their deliberations.

30

Pritchard has been a senior advisor to OHRP since 2004.

31

We could also say that we are justified in believing the conclusion, the evidence (Goldman 1986). A conclusion could be a statement or belief that supports a decision. For example, if an IRB concludes that a study meets all the approval criteria it could decide to approve a study.

32

Deductive arguments are common in the formal sciences, such as mathematics, logic, statistics, and decision theory, but not very common in the natural sciences, medicine, and engineering, because these disciplines obtain knowledge from observation and experiment rather than logical argumentation and proof (Resnik 2000). Since deductive arguments probably play only a minimal role in IRB decision-making, I will focus on inductive arguments in this paper.

33

We could construct a valid, deductive argument using the axioms of probability theory to estimate the probability of flipping a fair coin and getting heads ten times in a row. We assume that the probability of flipping the coin and getting heads is 0.5, and that each coin flip is independent of the other flips. We also know, from probability theory, that the probability of two independent events is the product of their individual probabilities (Weiss 2011). Given these premises, we can conclude, deductively, that the probability of flipping a fair coin and getting heads ten times in a row is: (0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5)(0.5) = 0.00098.

34

In US criminal cases, the prosecution has the burden of proof. In civil cases, the person bringing the matter before the court (i.e. the plaintiff) has the burden of proof (Loevinger 1992).

35

While there is little dispute that science is an empirical enterprise, non-empirical methods, such as mathematical modeling, logical and mathematical proof, conceptual arguments, and thought experiments have played and continue to play an important role in scientific theorizing and discovery (Kuhn 1977; Haack 2003). For example, in 1846 Urbain Le Verrier (1811–77) used information about the perturbations in the orbit of Uranus and mathematical equations from celestial mechanics to predict the position a new planet, later named Neptune. Johann Galle (1812–1910) pointed his telescope at the coordinates provided by Le Verrier and discovered the planet (Pannekoek 2011).

36

These same criticisms would apply to using p-values for statistical significance.

37

The formula is: p (H/E) = [p (E/H) x p (H)]/ p (E).

38

I realize I am making assumptions here about the relationship between science and moral values which are beyond the scope of this paper. Some philosophers, known as ethical naturalists, hold that there are moral facts which can be reduced to facts about the natural world and that we can obtain knowledge about moral facts in the same way that we acquire knowledge about scientific or natural facts (see, for example, Brink 1989; Jackson 1998; Foot 2001). Other philosophers, known as ethical non-naturalists, hold that moral facts cannot be reduced to scientific facts because there are no moral facts; or, if there are moral facts, they are facts about properties or phenomena that supervene on or transcend the natural world. We cannot, therefore, obtain moral knowledge (if there is such as thing) in the same way that we would obtain scientific knowledge. Non-naturalists hold that morality is based on intuition (Moore 1903; Audi 2005), emotion (Gibbard 1990; Greene 2013), culture (Harman 1996), religion (Adams 1987), or rational reflection (Kant 1981). I would describe my approach as a limited form of non-naturalism that recognizes that some of our moral values, including human life, freedom from pain, health, have a solid basis in nature but that others, such as justice and human rights, do not. For further discussion of naturalism and non-naturalism, see Lenman (2018) and Ridge (2019).

39

Wendler et al. (2005) attempt to quantify the concept of minimal risk for pediatric research in terms of the average daily life risks faced by US children. This attempt to quantify risk, even if successful, would not prove that risk/benefit determinations are quantifiable because these determinations involve judging the risks are reasonable (or morally acceptable) in relation to the benefits, and saying that a risk is reasonable is a value judgment (Resnik 2017).

40

See note 40.

41

A legal analogy may be useful here: all members of a jury, regardless of their background or education, are obligated to use the same standard of evidence.

42

Human subjects often still fail to understand that they may not benefit from participation, however (Appelbaum et al. 1987; Joffe et al. 2001.

43

One might also ask whether the IRB should require less evidence for some types of research, such as research that imposes minimal risks on human subjects. While it makes some sense to say that evidence for the reasonableness of risks not need to meet the clear and convincing standard, I would argue that evidence pertaining to other criteria, such as consent or equitable subject selection, would still need to meet the clear and convincing evidence standard, because these criteria involve issues related to human rights.

44

The research may offer these subjects other types of benefits, such as immunity to COVID-19, priority access to COVID-19 therapeutics if they become ill, or the satisfaction that they are contributing to a worthy cause, but these benefits are usually not considered to be direct benefits of the interventions under investigation (King 2000; Friedman et al. 2012). Moreover, these other benefits do not outweigh the risks to subjects, so the primary justification for this research is that it is likely to offer significant benefits to society (Shah et al. 2020).

45

I am tempted to suggest that evidence should meet the beyond reasonable doubt standard, but that would probably be too restrictive.

References

  1. Adams Robert M. 1987. The Virtue of Faith and Other Essays in Philosophical Theology. New York: Oxford University Press. [Google Scholar]
  2. Anderson EE, DuBois JM. 2012. IRB decision-making with imperfect knowledge: a framework for evidence-based research ethics review. Journal of Law, Medicine, and Ethics 40(4):951–969. [DOI] [PubMed] [Google Scholar]
  3. Amrhein V, Greenland S, McShane B. 2019. Retire statistical significance. Nature 567:305–307 [DOI] [PubMed] [Google Scholar]
  4. Audi R 2005. The Good in the Right: A Theory of Intuition and Intrinsic Value. Princeton, NJ: Princeton University Press. [Google Scholar]
  5. Appelbaum PS, Roth LH, Lidz CW, Benson P, Winslade W. 1987. False hopes and best data: consent to research and the therapeutic misconception. Hastings Cent Report 17(2):20–24 [PubMed] [Google Scholar]
  6. Beauchamp TL, Childress JF. 2012. Principles of Biomedical Ethics, 7th ed.New York: Oxford University Press. [Google Scholar]
  7. Brink DO. 1989. Moral Realism and the Foundations of Ethics. Cambridge, UK: Cambridge University Press. [Google Scholar]
  8. Burns PB, Rohrich RJ, Chung KC. 2011. The levels of evidence and their role in evidence-based medicine. Plastic Reconstructive Surgery 128(1):305–310. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Callaway E 2020. Dozens to be deliberately infected with coronavirus in UK ‘human challenge’ trials. Nature 586: 651–652. [DOI] [PubMed] [Google Scholar]
  10. Chisholm RM. 1989. Theory of Knowledge, 3rd ed.Englewood Cliffs, NJ: Prentice-Hall. [Google Scholar]
  11. Clapp JT, Gleason KA, Joffe S. 2017. Justification and authority in institutional review board decision letters. Social Science and Medicine 194:25–33. [DOI] [PubMed] [Google Scholar]
  12. Copi I, Cohen C, McMahon K. 2013. Introduction to Logic, 14th ed.New York, NY: Routledge. [Google Scholar]
  13. Council for International Organizations of Medical Sciences. 2016. International Ethical Guidelines for Health-related Research Involving Humans. Available at: https://cioms.ch/wp-content/uploads/2017/01/WEB-CIOMS-EthicalGuidelines.pdf.Accessed:June 17, 2020.
  14. Department of Health and Human Services. 2017. Protection of Human Subjects. 45 CFR 46. [Google Scholar]
  15. Douglas H 2000. Inductive risk and values in science. Philosophy of Science 67:559–579. [Google Scholar]
  16. Dziak K, Anderson R, Sevick MA, Weisman CS, Levine DW, Scholle SH. 2005. Variations among Institutional Review Board Reviews in a Multisite Health Services Research Study. Health Services Research 40(1):279–290. [DOI] [PMC free article] [PubMed] [Google Scholar]
  17. Earman J 1992. Bayes or Bust? A Critical Examination of Bayesian Confirmation Theory. Cambridge, MA: MIT Press. [Google Scholar]
  18. Edwards SJL, Ashcroft R, Kirchin S. 2004. Research ethics committees: differences and moral judgment. Bioethics 18(5):408–427. [DOI] [PubMed] [Google Scholar]
  19. Elliott K 2017. A Tapestry of Values: An Introduction to Values in Science. New York: Oxford University Press. [Google Scholar]
  20. Emanuel EJ, Wendler D, Grady C. 2000. What makes clinical research ethical? Journal of the American Medical Association 283(20):2701–2711. [DOI] [PubMed] [Google Scholar]
  21. Environmental Protection Agency. 2013. Protection of Human Subjects. 40 CFR 26. [Google Scholar]
  22. Eyal N, Lipsitch M, Smith PG. 2020. Human challenge studies to accelerate coronavirus vaccine licensure. The Journal of Infectious Diseases 11(1):1752–1756. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Food and Drug Administration. 2013. Institutional Review Boards. 21 CFR 56. [Google Scholar]
  24. Foot P 2001. Natural Goodness. Oxford, UK: Clarendon Press. [Google Scholar]
  25. Friedman A, Robbins E, Wendler D. 2012. Which benefits of research participation count as ‘direct’? Bioethics 26(2):60–67. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Friesen P, Yusof ANM, Sheehan M. 2019. Should the Decisions of Institutional Review Boards Be Consistent? Ethics & Human Research 41(4):2–14. [DOI] [PubMed] [Google Scholar]
  27. Gibbard A 1990. Wise Choices, Apt Feelings. Cambridge, MA: Harvard University Press. [Google Scholar]
  28. Giere RN, Bickle J, Mauldin R. 2005. Understanding Scientific Reasoning, 5th ed.Belmont, CA: Cengage Learning. [Google Scholar]
  29. Goldman A 1986. Epistemology and Cognition. Cambridge, MA: Harvard University Press. [Google Scholar]
  30. Goldman A Knowledge in a Social World. Oxford, UK: Clarendon Press. [Google Scholar]
  31. Goldman J, Katz MD. 1982. Inconsistency and institutional review boards. Journal of the American Medical Association 248(2):197–202. [PubMed] [Google Scholar]
  32. Goozner M 2004. The $800 Million Pill. Berkeley, CA: University of California Press. [Google Scholar]
  33. Green LA, Lowery JC, Kowalski CP, Wyszewianski L. 2006. Impact of institutional review board practice variation on observational health services research. Health Services Research 41(1):214–230. [DOI] [PMC free article] [PubMed] [Google Scholar]
  34. Greene J 2013. Moral Tribes: Emotion, Reason, and the Gap between Us and Them. New York, NY: Penguin. [Google Scholar]
  35. Grinnell F, Sadler JZ, McNamara V, Senetar K, Reisch J. 2017. Confidence of IRB/REC members in their assessments of human research risk: a study of IRB/REC decision making in action. Journal of Empirical Research in Human Research Ethics 12(3):140–149. [DOI] [PubMed] [Google Scholar]
  36. Gunsalus CK, Bruner EM, Burbules NC, Dash L, Finkin M, Goldberg JP, Greenough WT, Miller GA, Pratt MG. 2006. Mission creep in the IRB world. Science 312(5779):1441. [DOI] [PubMed] [Google Scholar]
  37. Haack S 2003. Defending Science within Reason. New York, NY: Prometheus Books. [Google Scholar]
  38. Hannson S 2003. Ethical criteria of risk acceptance. Erkenntnis 59(3):291–309. [Google Scholar]
  39. Harman G 1996. Moral Relativism. In: Harman G and Thompson JJ (eds.), Moral Relativism and Moral Objectivity. Cambridge MA: Blackwell Publishers, 3–64. [Google Scholar]
  40. Hirshon JM, Krugman SD, Witting MD, Furuno JP, Limcangco MR, Perisse AR, Rasch EK. 2002. Variability in Institutional Review Board Assessment of Minimal-Risk Research. Academic Emergency Medicine 9(12):1417–1420. [DOI] [PubMed] [Google Scholar]
  41. Holzer JK, Ellis L, Merritt MW. 2014. Why we need community engagement in medical research. Journal of Investigative Medicine 62:851–855. [DOI] [PMC free article] [PubMed] [Google Scholar]
  42. Howson C, Urbach P. 1989. Scientific Reasoning: A Bayesian Approach. New York, NY: Open Court. [Google Scholar]
  43. Jackson F 1998. From Metaphysics to Ethics: A Defence of Conceptual Analysis. Oxford, UK: Clarendon Press. [Google Scholar]
  44. Joffe S, Cook EF, Cleary PD, Clark JW, Weeks JC. 2001. Quality of informed consent in cancer clinical trials: a cross-sectional survey. Lancet 358(9295):1772–1777. [DOI] [PubMed] [Google Scholar]
  45. Journal of Empirical Research on Human Research Ethics. 2020. Journal description. Available at: https://journals.sagepub.com/description/JRE.Accessed:July 2, 2020.
  46. Kant I 1981. [1785]. Groundwork for the Metaphysics of Morals. Ellington JW (transl.). Indianapolis, IN: Hackett. [Google Scholar]
  47. Katz J 1993. Human experimentation and human rights. Saint Louis University Law Journal 38(7):7–54. [PubMed] [Google Scholar]
  48. Kimmelman J 2009. Gene Transfer and the Ethics of First-in-Human Research: Lost in Translation. Cambridge, UK: Cambridge University Press. [Google Scholar]
  49. King N 2000. Defining and describing benefit appropriately in clinical trials. Journal of Law, Medicine & Ethics 28:332–343. [DOI] [PubMed] [Google Scholar]
  50. Kitcher P 1993. The Advancement of Science. New York, NY: Oxford University Press. [Google Scholar]
  51. Klitzman RL. 2011a. The myth of community differences as the cause of variations among IRBs. American Journal of Bioethics Primary Research 2(2):24–33. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Klitzman RL. 2011b. “Members of the same club”: challenges and decisions faced by US IRBs in identifying and managing conflicts of interest. PLoS One 6(7):e22796. [DOI] [PMC free article] [PubMed] [Google Scholar]
  53. Klitzman RL. 2012a. US IRBs confronting research in the developing world. Developing World Bioethics 12(2):63–73. [DOI] [PMC free article] [PubMed] [Google Scholar]
  54. Klitzman RL. 2012b. Institutional Review Board community members: who are they, what do they do, and whom do they represent? Academic Medicine 87(7):975–981. [DOI] [PMC free article] [PubMed] [Google Scholar]
  55. Klitzman RL. 2013a. How IRBs view and make decisions about coercion and undue influence. Journal of Medical Ethics 39(4):224–229. [DOI] [PMC free article] [PubMed] [Google Scholar]
  56. Klitzman RL. 2013b. How IRBs view and make decisions about consent forms. Journal of Empirical Research on Human Research Ethics 8(1):8–19. [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Klitzman RL. 2013c. How IRBs view and make decisions about social risks. Journal of Empirical Research on Human Research Ethics 8(3):58–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  58. Klitzman RL. 2013d. How good does the science have to be in proposals submitted to Institutional Review Boards? An interview study of Institutional Review Board personnel. Clin Trials 10(5):761–766. [DOI] [PMC free article] [PubMed] [Google Scholar]
  59. Klitzman R 2014. How US institutional review boards decide when researchers need to translate studies. Journal of Medical Ethics 40(3):193–197. [DOI] [PMC free article] [PubMed] [Google Scholar]
  60. Klitzman RL. 2015. The Ethics Police? The Struggle to Make Human Research Safe. New York, NY: Oxford University Press. [Google Scholar]
  61. Kon AA. 2009. The role of empirical research in bioethics. American Journal of Bioethics 9(6–7):59–65. [DOI] [PMC free article] [PubMed] [Google Scholar]
  62. Kuhn TS. 1977. The Essential Tension: Selected Studies in Scientific Tradition and Change. Chicago, IL: University of Chicago Press. [Google Scholar]
  63. Largent EA, Grady C, Miller FG, Wertheimer A. 2012. Money, coercion, and undue inducement: attitudes about payments to research participants. IRB 34(1):1–8. [PMC free article] [PubMed] [Google Scholar]
  64. Largent E, Fernandez Lynch H. 2017. Paying research participants: the outsized influence of “undue influence.” IRB 39(4):1–9. [PMC free article] [PubMed] [Google Scholar]
  65. Lau S, Luk H, Wong A, Li K, Zhu L, He Z, Fung J, Chan TT, Fung KS, Woo PC. 2020. Possible bat origin of severe acute respiratory syndrome coronavirus 2. Emerging Infectious Diseases 26(7):1542–1547. [DOI] [PMC free article] [PubMed] [Google Scholar]
  66. Lenman J 2018. Moral naturalism. Stanford Encyclopedia of Philosophy. Available at: https://plato.stanford.edu/entries/naturalism-moral/#WhatMoraNatu.Accessed:July 1, 2020.
  67. Levine RJ. 1988. Ethics and Regulation of Clinical Research. New Haven, CT: Yale University Press. [Google Scholar]
  68. Loevinger L 1992. Standards of proof in science and law. Jurimetrics 32(3):323–344. [Google Scholar]
  69. London AJ. 2006. Reasonable risks in clinical research: a critique and a proposal for the integrative approach. Statistics in Medicine 25(17):2869–2885. [DOI] [PubMed] [Google Scholar]
  70. McWilliams R, Hoover-Fong J, Hamosh A, Beck S, Beaty T, Cutting G. 2003. Problematic variation in local institutional review of a multicenter genetic epidemiology study. Journal of the American Medical Association 290(3):360–366. [DOI] [PubMed] [Google Scholar]
  71. Meslin EM. 1990. Protecting human subjects from harm through improved risk judgments. IRB 12(1):7–10. [PubMed] [Google Scholar]
  72. Millum J, Garnett M. 2019. How payment for research participation can be coercive. American Journal of Bioethics 19(9):21–31. [DOI] [PubMed] [Google Scholar]
  73. Moon MR. 2009. The history and role of Institutional Review Boards: A useful tension. Virtual Mentor 11(4):311–316. [DOI] [PubMed] [Google Scholar]
  74. Moore GE. 1903. Principia Ethica. New York, NY: Cambridge University Press. [Google Scholar]
  75. National Commission for the Protection of Human Subjects of Biomedical or Behavioral Research. 1979. The Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research. Washington, DC: Department of Health, Education, and Welfare. [Google Scholar]
  76. Northwestern University Institutional Review Board. 2020. Templates and forms. Available at: https://www.irb.northwestern.edu/templates-forms-sops/.Accessed:June 17, 2020.
  77. Office of Science and Technology Policy. 2000. Federal research misconduct policy. Federal Register 65(235):76260–76264. [Google Scholar]
  78. Oxford Centre for Evidence-Based Medicine, Levels of Evidence Working Group. 2011. The 2011 levels of evidence. Available at: http://www.cebm.net/index.aspx?o=5653.Accessed:July 3, 2020.
  79. Patashnik EM, Gerber AS, Dowling CM. 2017. Unhealthy Politics: The Battle over Evidence-Based Medicine. Princeton, NJ: Princeton University Press. [Google Scholar]
  80. Petersen LA, Simpson K, Sorelle R, Urech T, Chitwood SS. How variability in the institutional review board review process affects minimal-risk multisite health services research. Annals of Internal Medicine 156:728–735. [DOI] [PMC free article] [PubMed] [Google Scholar]
  81. Porter R 1998. The Greatest Benefit to Mankind: A Medical History of Humanity. New York, NY: WW Norton. [Google Scholar]
  82. Pritchard IA. 2011. How do IRB members make decisions? A review and research agenda. Journal of Empirical Research in Human Research Ethics 6(2):31–46. [DOI] [PubMed] [Google Scholar]
  83. Resnik DB. 2012. Limits on risks for healthy volunteers in biomedical research. Theoretical Medicine and Bioethics 33(2):137–149. [DOI] [PMC free article] [PubMed] [Google Scholar]
  84. Resnik DB. 2014. Consistency in IRB review. Journal of Clinical Research Best Practices 10(12). Available at: http://www.firstclinical.com/journal/2014/1412_Consistency.pdf.Accessed:July 3, 2020. [Google Scholar]
  85. Resnik DB. 2017. The role of intuition in risk/benefit decision-making in human subjects research. Accountability in Research 24(1):1–29. [DOI] [PMC free article] [PubMed] [Google Scholar]
  86. Resnik DB. 2018. The Ethics of Research with Human Subjects: Protecting People, Advancing Science, Promoting Trust. Cham, Switzerland: Springer. [Google Scholar]
  87. Resnik DB. 2019. Coercion as subjection and the Institutional Review Board. Am Journal of Bioethics 19(9):56–58. [DOI] [PMC free article] [PubMed] [Google Scholar]
  88. Resnik DB, Sharp RR. 2006. Protecting third parties in research. IRB: Ethics & Human Research 28(4):1–7. [PMC free article] [PubMed] [Google Scholar]
  89. Resnik MD. 2000. Mathematics as a Science of Patterns. New York, NY: Oxford University Press. [Google Scholar]
  90. Rid A, Emanuel EJ, Wendler D. 2010. Evaluating the risks of clinical research. Journal of the American Medical Association 304(13):1472–1479. [DOI] [PubMed] [Google Scholar]
  91. Ridge M 2019. Moral non-naturalism. Stanford Encyclopedia of Philosophy. Available at: https://plato.stanford.edu/entries/moral-non-naturalism/#Int.Accessed:July 2, 2020.
  92. Rothstein P, Rader M, Crump D. 2011. Evidence, 6th ed.St. Paul, MN: West Publishing. [Google Scholar]
  93. Rudner R 1953. The scientist qua scientist makes value judgments. Philosophy of Science 20(1):1–6. [Google Scholar]
  94. Sackett DL. 1989. Rules of evidence and clinical recommendations on the use of antithrombotic agents. Chest 95(2 Supplement):2S–4S. [PubMed] [Google Scholar]
  95. Sackett DL, Richardson W, Rosenberg W, Haynes R. 2000. Evidence-Based Medicine: How to Practice and Teach EBM, 2nd ed.New York: Churchill Livingstone. [Google Scholar]
  96. Salje H, Tran Kiem C, Lefrancq N, Courtejoie N, Bosetti P, Paireau J, Andronico A, Hozé N, Richet J, Dubost CL, Le Strat Y, Lessler J, Levy-Bruhl D, Fontanet A, Opatowski L, Boelle PY, Cauchemez S. 2020. Estimating the burden of SARS-CoV-2 in France. Science 369(6500):208–211. [DOI] [PMC free article] [PubMed] [Google Scholar]
  97. Schaefer GO, Tam CC, Savulescu J, Voo TC. 2020. COVID-19 vaccine development: Time to consider SARS-CoV-2 challenge studies? Vaccine 38(33):5085–5088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  98. Schrag ZM. 2010. Ethical Imperialism: Institutional Review Boards and the Social Sciences, 1965–2009. Baltimore, MD: Johns Hopkins University Press. [Google Scholar]
  99. Schneider CE. 2015. The Censor’s Hand: the Misregulation of Human-subject Research. Cambridge, MA: MIT Press. [Google Scholar]
  100. Seiffert D 2016. Report suggests drug-approval rate now just 1-in-10. Boston Business Journal, May 25, 2016. Available at: https://www.bizjournals.com/boston/blog/bioflash/2016/05/report-suggests-drug-approval-rate-now-just-1-in.html.Accessed:May 26, 2020. [Google Scholar]
  101. Selinger E, Crease RP (eds.). 2006. The Philosophy of Expertise. New York, NY: Columbia University Press. [Google Scholar]
  102. Shah SK, Kimmelman J, Lyerly AD, Lynch HF, Miller FG, Palacios R, Pardo CA, Zorrilla C. 2018. Bystander risk, social value, and ethics of human research. Science 360(6385): 158–159. [DOI] [PubMed] [Google Scholar]
  103. Shah SK, Miller FG, Darton TC, Duenas D, Emerson C, Lynch HF, Jamrozik E, Jecker NS, Kamuya D, Kapulu M, Kimmelman J, MacKay D, Memoli MJ, Murphy SC, Palacios R, Richie TL, Roestenberg M, Saxena A, Saylor K, Selgelid MJ, Vaswani V, Rid A. 2020. Ethics of controlled human infection to address COVID-19. Science 368(6493):832–834 [DOI] [PubMed] [Google Scholar]
  104. Shah S, Whittle A, Wilfond B, Gensler G, Wendler D. 2004How do institutional review boards apply the federal risk and benefit standards for pediatric research? Journal of the American Medical Association 291(4):476–482. [DOI] [PubMed] [Google Scholar]
  105. Shrader-Frechette KS. 1991. Risk and Rationality: Philosophical Foundations for Populist Reforms. Berkeley, CA: University of California Press. [Google Scholar]
  106. Silberman G, Kahn KL. 2011. Burdens on research imposed by institutional review boards: the state of the evidence and its implications for regulatory reform. Milbank Quarterly 89(4):599–627. [DOI] [PMC free article] [PubMed] [Google Scholar]
  107. Spector R 2005. Me too drugs. Stanford Medicine Magazine, Summer 2005. Available at: http://sm.stanford.edu/archive/stanmed/2005summer/drugs-metoo.html.Accessed:May 26, 2020.
  108. Stark L 2012. Behind Closed Doors: IRBs and the Making of Ethical Research. Chicago, IL: University of Chicago Press. [Google Scholar]
  109. Stiemsma LT, Reynolds LA, Turvey SE, Finlay BB. 2015. The hygiene hypothesis: current perspectives and future therapies. ImmunoTargets and therapy 4:143–157. [DOI] [PMC free article] [PubMed] [Google Scholar]
  110. Tait RC, Chibnall JT, Iltis A, Wall A, Deshields TL. 2011. Assessment of consent capability in psychiatric and medical studies. Journal of Empirical Research on Human Research Ethics 6(1):39–50. [DOI] [PMC free article] [PubMed] [Google Scholar]
  111. Thagard P 1988. Computational Philosophy of Science. Cambridge, MA: MIT Press. [Google Scholar]
  112. Van Luijn HE, Musschenga AW, Keus RB, Robinson WM, Aaronson NK. 2002. Assessment of the risk/benefit ratio of phase II cancer clinical trials by Institutional Review Board (IRB) members. Annals of Oncology 13(8):1307–1313. [DOI] [PubMed] [Google Scholar]
  113. Weijer C 2000. The ethical analysis of risk. Journal of Law, Medicine and Ethics 28(4):344–361. [DOI] [PubMed] [Google Scholar]
  114. Weinstein JB, Dewsbury I. 2006. Comment on the meaning of ‘proof beyond a reasonable doubt’. Law, Probability and Risk 5:167–173. [Google Scholar]
  115. Weiss PA. 2011. Introductory Statistics, 9th ed.Upper Saddles River, NJ: Pearson. [Google Scholar]
  116. Wendler D, Belsky L, Thompson KM, Emanuel EJ. 2005. Quantifying the federal minimal risk standard: implications for pediatric research without a prospect of direct benefit. Journal of the American Medical Association 294(7):826–832. [DOI] [PubMed] [Google Scholar]
  117. Wertheimer A, Miller FG. 2008. Payment for research participation: a coercive offer? Journal of Medical Ethics 34(5):389–392. [DOI] [PubMed] [Google Scholar]
  118. World Health Organization. 2020. Key criteria for the ethical acceptability of COVID-19 human challenge studies. Available at: https://apps.who.int/iris/bitstream/handle/10665/331976/WHO-2019-nCoV-Ethics_criteria-2020.1-eng.pdf?ua=1.Accessed:May 27, 2020.

RESOURCES