Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2020 Aug 1.
Published in final edited form as: J Abnorm Psychol. 2019 Aug;128(6):528–540. doi: 10.1037/abn0000451

The Registration Continuum in Clinical Science: A Guide toward Transparent Practices

Stephen D Benning 1, Rachel L Bachrach 2, Edward A Smith 3, Andrew J Freeman 3, Aidan G C Wright 4
PMCID: PMC6677163  NIHMSID: NIHMS1035738  PMID: 31368732

Abstract

Clinical scientists can use a continuum of registration efforts that vary in their disclosure and timing relative to data collection and analysis. Broadly speaking, registration benefits investigators by offering stronger, more powerful tests of theory with particular methods in tandem with better control of long-run false positive error rates. Registration helps clinical researchers in thinking through tensions between bandwidth and fidelity that surround recruiting participants, defining clinical phenotypes, handling comorbidity, treating missing data, and analyzing rich and complex data. In particular, registration helps record and justify the reasons behind specific study design decisions, though it also provides the opportunity to register entire decision trees with specific endpoints. Creating ever more faithful registrations and standard operating procedures may offer alternative methods of judging a clinical investigator’s scientific skill and eminence because study registration increases the transparency of clinical researchers’ work.

Keywords: preregistration, coregistration, postregistration, transparency, error control, false positive, flexibility


General Scientific Summary: Study registration allows clinical scientists to make their work more credible and transparent to fellow researchers and the general public. We describe the dimensions of disclosure and timing relative to data collection and analysis on which researchers can register their study designs, how registration makes for better science, and the kinds of issues for which registration is particularly helpful in clinical research. We also show how registration permits researchers to flexibly register specific individual decisions or complex decision trees to deal with potential problems in a study.


Registration is the act of formally recording the components of a scientific study (e.g., aims, hypotheses, methods, data analytic strategy) in an official capacity (e.g., to a journal for peer review or a public repository). The goals of registration include increasing research’s transparency (Gernsbacher, 2018) and maintaining the diagnostic value and credibility of a study’s confirmatory analyses (Wagenmakers, Wetzels, Borsboom, van der Maas, & Kievit, 2012). Registration has gained steam in the last half decade following concerns regarding research practices that encouraged blurring the lines between prediction and postdiction (Nosek, Ebersole, DeHaven, & Mellor, 2018). This scientific ecology has led to an unfortunate plethora of studies whose findings could not be replicated in psychology (Open Science Collaboration, 2015) and other clinical disciplines (e.g., Begley & Ellis, 2012). Registration poses unique challenges for clinical science (Tackett et al., 2017), a discipline that is uniquely poised to benefit people’s lives and ameliorate suffering directly through research.

Registration is applicable to a broad range of clinical science from bench to bedside to help produce replicable work. Furthermore, not all registrations of scientific work must – or can – be accomplished before the first participant is run. We describe a continuum of registration approaches clinical scientists can use, from preregistered plans filed in anticipation of commencing a study to postregistrations of analyses involving complex archival datasets that have accrued over decades. We also answer questions about what registration will do, how registration can benefit clinical researchers, what aspects of clinical study design might be registered, and what steps researchers can take to register their own studies. We focus on the benefits of registration to provide a stronger evidential foundation for the field rather than using registration as a cudgel to tamp down scholarly creativity (Fiedler, 2018).

What Dimensions Underpin Registrations of Clinical Studies?

Time of registration relative to data collection and analysis to increase control and credibility.

As diagrammed in Figure 1, study registrations can be plotted along two dimensions. The first bipolar dimension represents the time at which the registration is performed relative to the data collection and data analysis. Registration can occur generally at three timepoints – prior to data collection, during data collection, and after data analysis begins. When registration occurs earlier in time, credibility is enhanced as the study’s design and analytical plan are most plausibly independent of the data (Srivastava, 2018). Furthermore, the diagnosticity of p values is preserved, as researchers’ analyses represent the focal confirmatory tests they proposed unalloyed with alterations that convert them into exploratory analyses (Wagenmakers et al., 2012). Therefore, earlier registration is ideal, but any registration is better than no registration.

Figure 1.

Figure 1.

Locations of example registrations in the time-disclosure continuum. Preregistrations occur before data are collected, coregistrations after data collection starts but before data analysis begins, and postregistrations after data analysis has begun. Describing more features of a study’s design in a registration would earn it a higher score on the disclosure axis.

Preregistration describes a registration that occurs before data collection commences. Preregistrations should be uploaded to a timestamping site the researcher does not directly control, such as https://clinicaltrials.gov, https://osf.io, https://github.com, or https://sliksvn.com. Preregistration permits researchers the greatest control over a study’s design, as researchers are free to preregister any aspects of a study to increase transparency. Preregistration should minimally consist of study aims/hypotheses, data collection methods, and an analytic plan. Preregistrations require the most effort upfront; however, they also have the maximum credibility in making explicit what design considerations were made before a single observation was conducted. Transparency allows for revisions to a study’s procedures to be made prior to data collection and disclose the study’s scientific process. For example, if pilot testing reveals that a specific procedure is unworkable, the preregistration can be updated to a new version detailing the nature and rationale for the changes.

Researchers who conduct clinical trials are familiar with some facets of preregistration. While initially intended to improve patient access to clinical trials, clinical trial registration has evolved as a mechanism to prevent publication bias, loss of knowledge from unpublished clinical trials, and outcome switching (Dickersin & Rennie, 2003). Clinical trial preregistration consists of methodological registration (e.g., number of participants, primary and secondary outcomes, eligibility criteria, and trial methodology). It is required for some types of clinical science and must be followed to comply with regulations and publication requirements (National Library of Medicine, 2018). However, clinical trial registration is one special case of a broader rubric of preregistration that can be applied to clinical science studies. Preregistration encourages thorough and explicit recording of details in study descriptions (e.g., analytic plan, sharing of materials) beyond that available in clinical trial registrations.

The field has not yet advanced terminology for registrations that occur after data collection has begun. We propose the term coregistration to describe a registration that occurs contemporaneously with data collection but before the data have been analyzed. Coregistration of analytic plans have the advantage of maintaining independence of decisions based on a study’s data (Srivastava, 2018), though knowledge of incidents that occur in the data collection process may influence researchers’ decisions. Coregistration may be preferred when time is of the essence and data must be collected as researchers crystallize their analytic plan. As an example, the first author’s study of the psychological effects of the Route 91 shooting in Las Vegas (coregistered at https://osf.io/va27m/) developed its data analytic plan while initial data were collected. In this case, preregistration was impractical, as data had to be collected immediately to assess the transition from acute to post-traumatic stress. Coregistration may also be useful for student projects that must be accomplished in a truncated timeframe to meet program deadlines or when data may only be collected at specific timepoints, such as during the academic year.

Coregistration also occurs when the protocol for a preregistered study is altered during the data collection process. Changing a preregistered study’s protocol is admissible and in large clinical studies is likely common (e.g., adding a new measure to the protocol). Coregistration requires the same information as preregistration with disclosures to maintain transparency. Any changes made to the study during data collection but prior to the registration being posted should be described along with the rationale for those changes in a new timestamped version. Some changes might occur due to method error, such as a software program not exporting variables of interest or delivering stimuli as expected in ways pilot testing did not detect. Other changes might occur with the introduction of new measures during data collection to test other aims or augment a specific aim. However, coregistration implies that changes to the protocol or data analytic plan occurred without peeking at the aggregate data. Investigators should verify the integrity of data collection by inspecting individual raw data files (e.g., double-entry of data; cleaning psychophysiological signals). The number and type of files examined should be coregistered to clarify the extent of peeking. Only aggregate data analysis should be avoided before a coregistration is uploaded to a repository.

Postregistration denotes a registration that occurs when data have been analyzed previously for other focal research questions, which is common in clinical research programs. As clinical science adopts open science practices, datasets that take years to collect will be posted online and freely available for secondary analysis. In fact, many already are. Large public data resources – like the National Epidemiological Survey on Alcohol and Related Conditions (pubs.niaaa.nih.gov/publications/arh29-2/74-78.htm), the Pittsburgh Life Studies (www.lifehistorystudies.pitt.edu), or Midlife in the United States (midus.wisc.edu) – provide indispensable data for hundreds of researchers to ask complex clinically relevant questions extending beyond what a single lab could attempt.1 Postregistration provides a mechanism by which confirmatory secondary data analysis of large clinical studies could occur.

Postregistration requires acknowledgement and special tools (see Weston, Ritchie, Rohrer, & Przybylski, 2018, and https://osf.io/djgvw/ for a working template). A considerable challenge is that the idiosyncrasies of an existing dataset are known to researchers in a way that could bias future hypotheses and analyses, whereas yet-to-be-collected data do not suffer this same problem (Nosek et al., 2018). The concern is that, by virtue of knowledge with what has already emerged from a dataset, a researcher can make an educated guess, informed not just by theory and prior work, but by the peculiarities of the data themselves. Thus, these kinds of registrations have the strongest likelihood of including data-dependent decisions that might biasa study’s outcomes (Srivastava, 2018). For instance, if researchers knew there were a paradoxical finding, such that men had higher depression scores than women in a given dataset, a researcher might predict that men may have higher generalized anxiety than women. This would confirm little more than the same idiosyncratic dataset was used. Existing data may have already had explorers trample through them, and therefore it is not a pristine snow that a new investigation would set out upon, but rather one with clearly discernible tracks leading down various paths.

None of this necessarily invalidates postregistration of existing datasets. These concerns simply require additional attention to what serves as the basis for hypothesis generation, and whether subsequent work is fully confirmatory or involves a lesser degree of discovery. A published example of a postregistered study (at https://osf.io/8s8c5/) used five existing datasets to study the structure of narcissistic personality disorder criteria (Aslinger, Manuck, Pilkonis, Simms, & Wright, 2018). One of the samples was treated as the exploratory sample and used to develop a best-fitting model from a number of plausible alternatives. The findings from this first sample determined which models would be estimated and predicted to have the best fit in the other four datasets. The authors clearly reported that the first sample was exploratory and then postregistered the follow-up confirmatory analyses for subsequent samples.

Disclosure of a study’s design to increase transparency.

As seen in Figure 1, the second dimension indexes a unipolar continuum of disclosure that reflects how much of the study’s plan is registered. Table 1 details features that might be disclosed across the three main registration epochs. Among the features that could be registered are the sample size and sampling plan, specific measures to be used to characterize participants, either experimental stimuli and tasks or therapeutic interventions and fidelity checks, study procedures, statistical plans to address the research questions, and analytic code that detail precisely how the analyses will be conducted. The more of these features that are disclosed in a registration, the greater the study’s transparency.

Table 1.

Study Design Elements to Register for Each Registration Type

Domain Preregistration Coregistration Postregistration
Sample
  • Catchment area

  • Sampling method

  • Targeted sizes of each group (with power or sensitivity analyses)

  • Inclusion criteria

  • Exclusion criteria

  • Plans for participant replacement

As in preregistration and
  • Tweaks to sample catchment, size, or method based on feasibility

  • Methods and sources for obtaining data

  • Extent of database to use

  • Rules defining specific cases to use in existing dataset

Measures
  • Description of all measures in study

  • Frequency of assessment

  • Identification of primary and secondary outcomes

  • Identification of covariates

As in preregistration and
  • Measures dropped due to infeasibility

  • Measures added to address concerns during data collection

  • Codebooks with variable summaries

  • Focal measures to use

  • Psychometrics of measures

Manipulation or intervention
  • Experiment stimuli

  • Tasks to use

  • Nature of interventions

  • Nature and allotment of participants to assignment arms

As in preregistration and
  • Stimuli or tasks requiring replacement

  • Alterations to interventions or arm assignment

  • Rules for using only certain observations

  • Definitions of sufficient fidelity to manipulations or interventions

Study procedures
  • Participant allocation

  • Masking of conditions

  • Task/measure order

  • Data collection and reduction parameters

  • Debriefing scripts and procedures

As in preregistration and
  • Number of raw data files inspected to verify data integrity

  • Alterations to data collection and reduction

  • Exclusions based on failures of masking

  • Whether to use only data collected under final procedures or use phase of collection as factor in analyses

  • Methods for dealing with missing data

Data cleaning
  • Minimum data quality standard descriptions

  • Recoding steps

  • Laboratory standard operating procedures

As in preregistration and
  • Exclusion rules for subsets of data based on study progress (e.g., program failure, fidelity problems during study)

  • Pre-existing data cleaning scripts

  • Differences between previous cleaning steps and those for this study

  • Laboratory standard operating procedures

Statistical analyses
  • Verbal descriptions

  • Methods of dealing with attrition and other missing data

  • Multiple comparison adjustments

  • Pseudo-code

  • Simulated analyses

As in preregistration and
  • Analytic code based on variable names

  • Citations to methods papers updating details of analyses

  • Details of previous analyses informing study

  • Results of analyses performed on similar data or subsets of current data

  • A priori specifics of statistical models

The disclosure dimension provides checkpoints for investigators to let them know when they are sticking to the plan and when they have deviated from it. For others, improving the disclosure dimension provides an avenue for direct replication of a study. For the sake of transparency, it is better to register as much of a study’s methodology as is possible, recognizing that not all aspects of a study design or analysis are knowable at the outset. Partial registration is better than no registration. Any signposts that focus the researcher and the field toward a consistent plan represent an improvement over the status quo.

The timing-disclosure continuum.

Figure 1 diagrams various kinds of registrations in this continuum. Registrations occurring further to the left are more credibly data independent, and those higher in the figure disclose more design details transparently. Moving clinical science toward the upper left quadrant permits the strongest confirmatory tests of theories. Registered reports entail submitting a study’s entire introduction and method (including proposed analyses) for peer review and in-principle acceptance before data collection begins (Chambers, 2013). Thus, they are preregistrations that have enough detail to permit peer reviewers judge whether studies are “methodologically valid, detailed, replicable, and which address an important scientific question” (Chambers, Dienes, McIntosh, Rotshtein, & Willmes, 2015; p. A1). Students’ thesis and dissertation proposals involving original data collections could likewise be uploaded as preregistrations like registered reports, especially after an examination committee’s recommendations are incorporated. Publicly uploaded grant proposals could form the basis of a preregistration, though these proposals usually have less methodological detail than other kinds of preregistrations. Even item selection criteria, factor structures, and patterns of nomological net associations can be preregistered as a psychometric instrument is validated (for examples, see the first author’s preregistration of one wave of measure’s development using an OSF template at https://osf.io/9emsb/register/565fb3678c5e4a66b5582f67 and a freeform registration of another measure’s development detailed in Tackett, Brandes, & Reardon, 2019).

When circumstances mitigate against detailed preregistration, researcher degrees of freedom (Simmons, Nelson, & Simonsohn, 2011) can still be restricted by preregistering a set of standard operating procedures for a lab detailing how data are to be collected and analyzed. Once developed, they can be referenced for the details of data analysis in any more specific preregistration. As with all other preregistrations, these can be treated flexibly and departures should be allowed, but any changes would need to be disclosed as deviations from the protocol.

Figure 1 shows how other kinds of registrations might fall across this continuum. Thesis and dissertation proposals on existing data may be coregistered or postregistered with the same level of detail as those involving original data. A comprehensive psychometric development effort to refine a scale’s items and correlate scale scores with criterion variables could coregister its sampling plan, item selection strategy, and criterion analytic framework if data were in the process of being collected but had not been inspected. Likewise, an ongoing psychophysiological study could coregister its rationale, program design, data reduction parameters, and scoring windows. In this case, the psychophysiological study would have a higher score on the disclosure axis, as it details more elements of the study. To maximize its credibility, a postregistered study may need to have even higher levels of disclosure than preregistered or coregistered studies. Providing detailed, version-controlled data cleaning and analytic scripts along with specific descriptions of the data sources, focal variables from a publicly available codebook, inclusion and exclusion rules for cases to be considered, and the previous analyses performed on these data may assuage skeptical readers that the postregistration represents the investigator’s original plan. At the bottom right of Figure 1 is the typical publication in clinical science, which represents a relatively weak test of its theory. Little to no information is disclosed outside of the publication, making it difficult to reproduce; the timing of the data analysis relative to the stated hypothesesis also often unknown, making unclear whether the hypotheses were predicted or postdicted.

Potential difficulties in registration.

Verification of the details of registration (outside of registered reports) during peer review requires more effort than typical studies. For clinical scientists, registration of a clinical trial does not necessarily prevent misreporting of the clinical trial (Goldacre et al., 2019). Automated text matching tools might help reviewers ascertain the similarities between a registration and manuscript. However, reviewers may still need to take an extra 15–30 minutes checking the correspondence of a registration and a manuscript to ensure that confirmatory and exploratory work is reported appropriately. Clinical researchers may be concerned about other researchers abusing the transparency that registration offers by taking their registered ideas. In all cases, researchers may choose to make their registrations private (either through an embargo or by granting access only to reviewers) to alleviate these concerns. Additionally, the investigator with the earliest timestamped registration could always claim primacy of the idea, which would serve as a deterrent for others taking credit (Klein et al., 2018).

How Can Registration Benefit Clinical Science (and Scientists)?

Strong, powerful, and efficient tests of theory with particular methods.

Registering a study’s design allows researchers to specify – transparently and in advance – the theoretical bases of their studies and their plans to test their theories. Registration permits researchers to specify the smallest effect sizes of interest they want to detect to make null results more informative (Harms & Lakens, 2018), like a clinically meaningful difference between groups or therapeutic response. With such plans in hand, researchers guard against hypothesizing after results are known (HARKing; Kerr, 1998). Registration also allows researchers to formalize the auxiliary hypotheses that give rise to data exclusion, transformation, and analytic strategies (Meehl, 1978). This procedure ensures that researchers adhere to their plans as closely as possible so that psychodiagnostic variables, conditions, or moderators that “didn’t work” are transparently disclosed instead of opaquely dropped or altered (Simmons, Nelson, & Simonsohn, 2012). Registration encourages researchers to approach their science carefully and comprehensively.

Furthermore, registration establishes upfront what moderators of a study’s key effects, covariates, and other analytic features investigators view as plausible to protect investigators from reviewers criticizing a study after results are known (CARKing; Nosek & Lakens, 2014). Analyses that reviewers request can instead be meaningfully separated out as additional exploratory analyses that likely should be accorded less weight. Rather than making themselves and reviewers exert substantial repetitive effort to assess salami-sliced least registerable units, thoroughly registered study plans allow researchers to propose rich, meaningful stories. Though this process might reduce publication counts, it could also incentivize conducting studies that cover all the reasonable bases for a line of inquiry (Vazire, 2018).

Registration by itself cannot counteract poor experimental, measurement, or analytic techniques. However, the process of registration enhances the likelihood that these issues may be detected before a study commences. For instance, researchers may recognize that a proposed study has insufficient power given typical effect sizes in the clinical literature or that the psychodiagnostic measures they intend to use exhibit poor typical internal consistency or factorial validity. Registration may also reveal experimental designs that confound task difficulty and clinical severity, or that the assumptions of proposed statistical techniques are unlikely to be met given the typical distribution of clinical measures. In a reversal of longstanding problems in the field (Ferguson & Heene, 2012; Rosenthal, 1979), fixing such problems prior to a study makes null results more publishable, as they are less likely to result from an uninterpretable failure to detect meaningful effects. Therefore, registration serves as mechanism that could improve the quality of an individual study as well as a mechanism for providing transparency.

Better control of long-run false positive and false negative error rates.

Registration assures that reported results were planned in advance and represent confirmatory statistical tests of hypotheses that motivated the study. Thus, the long-run error rate is controlled at α instead of inflated to an unknown degree, which represents a methodological advance over the false positive-rich status quo. If multiple focal comparisons are performed – even if they are planned and unrelated (Frane, 2015) – the statistical threshold should be adjusted if investigators use the typical frequentist statistics. Tests that were not registered may still be reported as long as they are clearly set apart from the registered results, though they likely require more stringent error control, as they represent a larger family of exploratory results than the confirmatory registered findings. As a result, registered analyses have more power to detect effects (and thereby generate fewer false negative errors) than non-registered analyses.2

Easier manuscript writing.

Registrations can seed the introduction and method sections of journal articles (akin to how theses or dissertations seed entire journal articles), so the upfront work of registering a study pays dividends when writing manuscripts (Tackett et al., in press). Registrations can be written vigorously, with compelling scientific narratives framing the study. If multiple versions of a registration exist, authors can use the most recent one as the basis for the manuscript. Even authors who advocate reconceptualizing a study’s framework and method after data have been explored thoroughly (Bem, 2004) recognize writing from registrations saves time. Therefore, clinical scientists who register studies may increase their productivity.

What Kinds of Clinical Complexities Pose Challenges in Registration?

The logistics of clinical science require tradeoffs between study bandwidth and fidelity. Researchers must navigate constant tensions between recruiting comprehensively versus feasibly, between capturing the totality of a clinical condition versus the variance within it most relevant to a study’s hypotheses, and between assessing broadly representative versus narrowly considered constructs. They must also balance testing the bandwidth of plausible models in a dataset with maintaining fidelity to the subset of results most relevant for addressing a clinical research question. We detail how registration can aid researchers in resolving these tensions.

Clinical recruitment.

Clinical research participants can be scarce and challenging to access, which creates barriers to powering studies alongside multiple threats to a study’s sampling strategies (Tackett et al., 2017). For instance, finding participants with particular traits, symptoms, diagnosis, or phase of illness may not be feasible in an investigator’s catchment area. Successfully identified individuals may be unwilling or socioeconomically unable to participate. Even those who participate may have cognitive or educational impairments that preclude them from understanding the experimental tasks or survey instruments. Consequently, investigators are likely to face the nearly ubiquitous hurdle of recruitment delays or shortfalls (Huang et al., 2018). These will affect their power to detect an effect, even with the most compelling power curves or sensitivity analyses.

Registration gives researchers the opportunity to consider in advance how to surmount these problems to preserve the fidelity of their design and their statistical power. For example, it may spur researchers to contact and mentor community partners in recruitment efforts (Johnson et al., 2018), which strengthens ties between bench and bedside. Registration also allows the ingenuity of a research team’s ideas about how to obtain additional participants to be recorded during the excitement of study generation for later use rather than assembled hastily during a fevered scramble to deal with an immediate recruitment crisis. However, despite researchers’ best registration efforts, they may encounter unforeseen barriers that require protocol modifications during recruitment so that enough participants are recruited. As long as these deviations are reported transparently and honestly, researchers still stand on solid ground.

Furthermore, many clinical studies conduct extensive diagnostic, cognitive, or physiological prescreenings to characterize participants. These costly measures often impel investigators to recruit only as many participants as are necessary. Some researchers determine stopping points by statistical significance (i.e., recruitment until results are significant). However, this approach does not control error rates. Sequential analyses provide a framework for researchers to conduct principled peeks at their data to determine whether to continue collecting data (Lakens, 2014). In sequential analysis, researchers register specific points based on number of observations at which results are examined. At each point, they specify the smallest alpha level or largest effect size at which data collection will stop due to a result that either passes the statistical threshold or is too small to be worth studying, respectively. However, the alpha at each step is lower than that of the entire sample, so researchers pay a power penalty for each peek. Unless the effects fall between the thresholds, the study can be stopped in a defensible way before the full sample is collected. Adaptive study design – often used in drug trials – goes a step further and allows clinical investigators to adjust features of the study design at prespecified points (Pallmann et al., 2018). For instance, an ineffective treatment arm may be discontinued so that participants can be assigned to other arms with potential benefits. Though these techniques are not optimal for precisely estimating parameters and effect sizes, they give clinical scientists tools to employ scarce participants wisely and ethically.

Clinical phenotypes and comorbidity.

Clinical scientists make a wealth of decisions about how to measure any particular condition. Many psychological measures purporting to assess the same condition can emphasize different underlying constructs within a condition. For example, the seven most popular depression measures together capture 52 unique symptoms of depression, but only 12% of these symptoms are present in all the measures, with 40% appearing in only one of the scales (Fried, 2017). Which symptoms are most important, and which are likely to remain important to the field as new or updated rubrics for assessing depression unfold? The answers to these questions depend on a researcher’s understanding of the literature, theoretical stances, and goals for a study. Using a theoretically superior measure or a brief version of a construct might be preferred when space is limited in a battery. In contrast, administering multiple measures permits creating a latent variable to assess common construct variance in addition to variance unique to each measure. In any case, a registration will document an investigator’s positions on important aspects of a clinical phenotype prior to a study being conducted or reviewed. Additionally, registration prevents outcome switching from occurring. Registration crystallizes otherwise implicit beliefs regarding the phenotype, demonstrating to reviewers that these issues in study design were thoughtfully approached beforehand instead of thrown together to support a “just so” story in a particular dataset.

Clinical scientists must also consider whether to conceptualize their constructs as categorical diagnoses or continuous dimensions. Traditionally, psychopathology research has used diagnostic categories (often as embodied in the DSM or ICD systems), which are also the types of data most commonly available from clinical partners in the community. Furthermore, advertisements targeted toward community members may be more easily understood when framed around diagnoses. However, dimensional approaches to psychopathology typically have greater reliability and validity coefficients than categorical approaches (Markon, Chmielewski, & Miller, 2011). Consequently, dimensional systems of conceptualizing psychopathology like the Research Domain Criteria (RDoC, which uses observable behaviors, neurobiological measures, and other general constructs to classify mental disorders; Cuthbert, 2014) may afford greater statistical power by reducing measurement error and increasing effect sizes. For a given study, which approach should be adopted? Registration helps researchers defend adopting one framework over another. In either one, registering whether a study will analyze particular diagnostic subtypes of a disorder or particular facets within a given dimension preserves researchers’ power and provides additional theoretical heft to their work.

Both categorical and dimensional approaches must contend with comorbidity. In the categorical framework, nearly half of individuals meeting criteria for a DSM diagnosis also meet criteria for another (Kessler, Chiu, Demler, & Walters, 2005). To avoid the confounding influence of other disorders, individuals with multiple diagnoses are commonly excluded from clinical samples. Though doing so results in samples of high-fidelity “pure” cases (e.g., only depression or schizophrenia) that increase the internal validity of a study, this sort of sample sterilization limits the external validity and the bandwidth of research findings’ generalizability. Which diagnostic neighbors are sufficiently distinct from each other as to warrant a participant’s exclusion from a study? Registering the logic behind researchers’ answers may suppress subsequent reviewer requests for alternative methods of dealing with diagnostic comorbidity. In a similar vein, the hierarchical organization of psychopathology (Kotov et al., 2017) implies that scores on one dimension share variance with others. As an example, negative affect is associated with many psychopathologies. Is this shared variance a critical part of the dimension of interest, or does it represent noise that should be parsed from the relevant dimension in a study? Having a registered answer to this question allows researchers to stick with a particular method for analyzing data that maintains fidelity with their theoretical expectations.

Psychosocial correlates.

Beyond issues of comorbidity, clinical researchers must confront how the correlations between clinical conditions and other psychosocial variables of interest impact their inferences. For example, there is a moderately negative relationship between schizophrenia and premorbid IQ, as well as an observed decline in IQ once psychosis appears (Woodberry, Giuliano, & Seidman, 2008). Additionally, adverse experiences (such as abuse) during childhood predict depression in adulthood, but they do not account for all depression cases, nor do they even predict a particular subtype of depression (Chapman et al., 2004).

In a registration, a researcher can describe how these possible confounds will be addressed in their analyses – or justify why they will not be addressed. To the extent that researchers view them as nuisance variables, confounds may be treated as covariates in a model to filter out extraneous variability in scores (Rohrer, 2018). However, to the extent those confounding variables reflect meaningful components of a clinical condition, it would be theoretically consistent not to control for them to maintain fidelity to the construct (G. A. Miller & Chapman, 2001). Registering the methods for addressing psychosocial correlates of any focus of clinical attention ensures both that investigators’ theoretical rationales for doing so can be respected and that power is maintained while false positive rates are reduced.

Extensive data.

Many clinical studies result in large amounts of data. Data structures become complex due to extensive measurement batteries, multi-factorial experimental paradigms, or ambulatory assessment data even in single studies. It may seem as if preregistration of analysis plans in these situations might create barriers to finding the correct model or create undue burden by requiring a priori consideration of many potential scenarios. We argue that as data become more complex and the modeling decisions increase, registration not only is more necessary (to avoid testing many different conditions that inflate false positive errors), but this may be the scenario in which registration offers the clearest benefits.

Consider an experiment some of the authors are conducting, which involves five within-subject factors (picture content, reward amount, reward probability, injury amount, injury probability) and one between-subject factor (delay) – before involving any clinically relevant data. The authors coregistered (at https://osf.io/bd5p6/) that they would only examine main effects and two-way interactions in that study to assess only theoretically interpretable effects. Without registration, the number of analytic forks threaten to multiply so extensively as to virtually guarantee false positive results. Alternatively, a recent study by the last author involved ambulatory assessment of interpersonal situations over 21 days (Wright et al., 2017). The data were analyzed using multilevel structural equation modeling, and some of the planned models could have included up to 5 random intercepts, 3 random slopes, two between-person covariates, and up to 10 cross-level moderation effects. Few of these modeling combinations would be of theoretical interest. However, if the initial model yielded null or unexpected effects, the investigator likely would have been tempted to try alternatives or “tweaks” without the constraints afforded through registration.

These scenarios demonstrate both that it is easy to attempt modeling until a significant result is found and that doing so can be an exhausting and time-consuming endeavor. By registering the subset of interesting models, the investigator can run these and be done, with strong disincentives for further analyses. However, additional analyses remain possible; they would be reported in an exploratory section of results that should likely be pursued with sensitivity analyses (Thabane et al., 2013) or multiverse analyses (Steegen, Tuerlinckx, Gelman, & Vanpaemel, 2016) to examine the robustness of results against post hoc tweaks to the analytic strategy. To illustrate, compare two analyses of large-scale datasets. One report described in alarmist terms relationships between symptoms of depression and social media use in adolescent girls (Twenge, Joiner, Rogers, & Martin, 2018). Subsequent specification curve analyses on those datasets indicated that these relationships range widely depending on the models tested, with a median effect size accounting for less than 0.1% of the variance shared among those measures (Orben & Przybylski, 2019).

Attrition.

Clinical science must contend with attrition in both intervention and longitudinal assessment studies. Attrition may bias effect estimates in assessment studies and may compromise fundamental aspects of randomization in clinical trials. In either type of study, registration allows researchers to disclose transparently which method of addressing attrition is most suited to their data. Methods that simply drop incomplete observations (e.g., listwise or pairwise deletion) or perform single imputations (e.g., last observation carried forward, mean or median imputation) likely bias parameter estimates though may be justified in limited situations (Schafer & Graham, 2002). More advanced techniques, including multiple imputation or maximum likelihood estimation, are typically more appropriate to address clinical attrition. However, the technical details of how maximum likelihood (Jakobsen, Gluud, Wetterslev, & Winkel, 2017) and multiple imputation (McNeish, 2017) will be carried out should be registered. Furthermore, some studies might opt to use refreshment samples to address dropout with additional data representative of the study’s population (Deng, Hillygus, Reiter, Si, & Zheng, 2013). In these cases, the refreshment strategy should be detailed along with the methods used to combine refreshment and core data. Refreshment samples are more plausible to use with assessment studies, as intervention studies often require all participants to have specific doses or frequencies of intervention over prescribed time periods.

How Can Registration Help Manage the Complexities of Clinical Research?

Registering one decision out of many possibilities.

Researchers may differ on how to manage a variety of decisions in clinical research. Registering these decisions allows researchers to avoid the “garden of forking paths” (Gelman & Loken, 2014) by constraining the multifaceted malleability of study design, implementation, and analysis (Wicherts et al., 2016). Table 2 provides examples of issues that are often encountered in clinical research and that may have multiple acceptable resolutions, any one of which can be registered. By so doing, researchers make explicit the theoretical frameworks influencing their choices regarding participants, measures, paradigms, and data analysis.

Table 2.

Examples of Specific Decision Alternatives That Can be Registered in Clinical Research

Issue Alternative 1 Alternative 2 Alternative 3
A participant’s data is missing in some (but not all) assessments. Exclude that participant from only the analyses including the missing data (pairwise deletion). Impute a specific value to missing data (e.g., last observation carried forward; mean or median imputation to prorate scores). Use a specific technique to impute missing data (e.g., maximum likelihood, multiple imputation) if the data are at least missing at random.
A participant’s score on recruitment measure changes substantially at test. Exclude that participant from all analyses. Use the change from recruitment to test scores as a co-variate in all analyses. Analyze only scores at test; do not include recruitment scores in any analyses.
Groups differ on theoretically important characteristics. Do not include any of these characteristics as covariates, as the differences may reflect theoretically meaningful differences among the groups. Use the characteristics that differ as covariates in all analyses. Create subgroups of participants matched on the differing characteristics.
Many measures exist to assess a particular construct of interest. Employ only the most com pelling of these measures based on a prespecified theoretical justification for that measure’s superiority. Administer several distinct prespecified exemplar measures of the construct and analyze results separately for each measure. Create a latent variable from scores on several prespecified measures of a construct and use that variable in all analyses.
Multiple comparisons are conducted in a study. Use a multiple comparison correction to maintain a prespecified acceptable family-wise error rate (e.g., Benjamini-Hochberg method). Register sets of tests comprising conceptual families of research questions; correct for multiple comparisons within each family (e.g., Holm-Bonferroni method). Choose a statistical episte- ‘ mology that is not concerned with the frequency of comparisons being conducted (e.g., Bayesian statistics).
Outliers may be present in data. Exclude all data that passes beyond a prespecified cutpoint (e.g., > 2.5 SDs more than a participant’s mean score). Fence in (Winsorize) all data that passes beyond a prespecified cutpoint (e.g., > |3| SDs beyond a participant’s mean score). Model the data with a prespecified distribution that is congruent with the mechanism that generates the data (including ostensibly outlying values).
Data appear non-normally distributed. Use a prespecified rule of thumb (e.g., |skewness| >1, kurtosis > 1) to determine when to apply a specific transformation (e.g., natural log). Test non-normality statistically (e.g., with a Shapiro-Wilk test); employ a prespecified transformation if results exceed a pre-specified threshold. Use a prespecified analysis modeling non-normal distributions (e.g., generalized linear models) or makes no assumptions about the distribution’s shape.
Data violate other assumptions of a registered statistical test. Use the registered statistical test if is relatively robust to violations of its assumptions (e.g., Welch’s t test). Employ a correction to account for the violation (e.g., the Huyhn-Feldt correction in repeated-measures ANOVA). Use a prespecified test modeling assumption violations (e.g., multi-level modeling) or does not make the assumption (e.g., non-parametric Friedman test).

For instance, what should happen if participants’ scores change from screening to test or preselected groups end up differing on important characteristics? These situations often occur in clinical research, and there are many defensible methods for handling them. Registering each relevant decision (along with the theoretical or empirical literature behind that decision) commits researchers to a specific path in the garden to reduce the rates of false positive findings. It also provides the strongest inferential test of a single research question, in which power is not compromised through adjustments for multiple comparisons. However, in situations in which multiple focal tests of hypotheses are performed, investigators’ registrations of their preferred multiple comparison correction highlight the kinds of errors they are keen to control. As a result, they are free to justify their balance of rigor in detecting effects and reducing false positive findings (Lakens et al., 2018). Likewise, when researchers register each method they would use to handle common statistical problems inherent in clinical research, they protect the integrity of their analytical pipeline, preserve their power against a multiplicity of comparisons, and prevent p-hacking their way to erroneous discoveries.

Registering decision trees.

One misconception about registration is that researchers must commit to a single plan with no room for deviations or alterations (Brainerd & Reyna, 2018). However, researchers can use registered flexibility (a general case of adaptive preregistration; Srivastava, 2018) to plan for contingencies that may arise over the course of a study. For example, a team may not want to commit to a blanket rule of excluding a participant’s data from a set of analyses. Rather, they may believe that the appropriate course of action depends on how many data are missing. If there are “enough” data to use (e.g., 50%), then the participant would be retained; with more missing data than that threshold, the participant would be excluded. Table 3 displays examples of registered flexibility points that may be encountered in clinical research. Some points of registered flexibility may have even more links in the chain, some fewer, and others still may be arranged as parallel branches of a decision tree than a series of chain links.

Table 3.

Examples of Decision Chains Employing Registered Flexibility in Clinical Research

Action Point First branch Second branch Third branch
PARTICIPANT-FOCUSED DECISIONS
Recruitment goals for clinical groups are difficult to meet. Allot 3 years for the research team to meet recruitment targets. If 1/3 of the sample is not recruited within 1 year, relax a prespecified recruitment criterion (e.g., no comorbidity). If 2/3 of the sample is not recruited in 2 years, relax other prespecified recruitment criteria (e.g., age ranges, recruitment sites and methods).
Participants report strong levels of discomfort during an assessment or treatment session. Assess whether discomfort is too strong to continue the current session. If so, offer methods to reduce participant discomfort that are unlikely to compromise the integrity of the assessment or treatment (e.g., creating a 5 minute break between tasks). If these methods do not reduce participant discomfort, discontinue participation or using methods that might compromise integrity of the assessment or treatment (e.g., bringing in a supportive other person).
Participant discontinues or provides unusable data in a crosssectional study. Remove participant from analyses after having planned for a particular rate of attrition. If number of participants removed is greater than planned, recruit additional participant to match attrited participant. If matching participant cannot be obtained, recruit non-matching participant after updating registration about how to deal with non-matching characteristics.
Participant does not complete at least one wave of longitudinal collection. Use multiple contact methods (e.g., phone, email, text) to contact participant at a prespecified frequency. If participant can be contacted and completes a certain prespecified number of data collection waves, use a prespecified strategy to account for missing data. If participant cannot be contacted or does not complete a certain prespecified number of data collection waves, exclude participant from analyses.
STUDY QUALITY DECISIONS
The effectiveness of an experimental manipulation must be verified. Ensure that scores on each item in a target condition exceed those in a control condition. If one item fails this check, drop it and analyze the remaining data as expected. If more than one item fails, use adaptive study design to replace target items with similar content.
Physiological data quality must be assessed. Measure the maximum variation in a prespecified window before an event. If that variation exceeds a prespecified amount (e.g., 100 μV), mark the trial as excessively noisy and discard. If that variation is below the prespecified amount but the peak response within a relevant window is too low (e.g., < 1 μV), set peak response to 0 and use trial.
Clinicians’ fidelity to a treatment protocol must be ascertained. Examine whether all clinicians complete at least 90% of steps on a checklist using an external rater of video-recorded sessions. If a clinician does not achieve that adherence rate, have the clinician undergo remedial training involving roleplays. If the clinician still cannot achieve that adherence rate, replace clinician and recruit more participants to replace those under clinician’s care.
ANALYTIC DECISIONS
A measure’s factor structure must be confirmed. Perform a confirmatory factor analysis (CFA) using a prespecified correlation matrix (e.g., polychoric) and estimator (e.g., diagonally weighted least squares). If CFA results do not meet specified levels of fit (e.g., TLI > .95 or RMSEA < .05), allow theoretically sound item residuals to correlate. If CFA results still do not meet specified levels of fit, perform an exploratory factor analysis with specified rotation (e.g., promax) and loading cutoffs (e.g., >|.40|) to create new factors.
The structure among repeated measures must be modeled. Assume compound symmetry (e.g., repeated-measures ANOVA). If a prespecified statistical criterion indicates that compound symmetry is not met (e.g., a value of epsilon < .8), use an autoregressive covariance matrix in generalized estimating equations. If a prespecified statistical criterion indicates that an unstructured covariance matrix fits better (e.g., by a difference of 10 in QIC), use an unstructured instead of an autoregressive matrix.

In this way, investigators can lay out in advance reasonable steps to take if recruitment does not proceed as expected at different stages of a study or experimental procedures cause unexpected discomfort to participants. Likewise, they can establish tentative plans to deal with scores on study measures exhibiting floor, ceiling, or other effects that were not revealed in pilot testing. Data processing contingencies may be detailed, and steps to ensure treatment fidelity can be made plain to the broader scientific community in addition to the research team. The dependencies of various analytic decisions on model diagnostics can even be illustrated to avoid the power penalties required when exploring alternative models without planning (unless the researcher uses Bayesian statistics; Rubin, 2017). In any report of a study, authors can point to the registered decision chains and trees to bolster confidence in the validity of their findings.

Demonstrating research competence through expanding contingency management.

The lessons learned from one registered study can improve the registrations of future studies. Novice researchers often underestimate the number of decisions that must be made before collecting data (Clark & Mulligan, 2011), so a researcher conducting a first study will likely need to make substantial amendments to a registration as a study progresses. Perhaps 30%−50% of the planned sampling or analytic strategies will end up being feasible. The resultant deviations will require substantial adjustments in the method and extensive detailing of non-registered analyses in the results. Furthermore, the justifications that undergirded a registration may shift with theoretical developments or the introduction of new data analytic techniques to the literature. In any of these circumstances, researchers should be free to use new or non-registered techniques with appropriately cited justifications of the changes to report their work transparently.

As the researcher gains expertise with a particular population, method, or analysis, new registered studies will justify updated and cited decisions and contingencies. A second related study might feature a majority of purely confirmatory methods and results. With increasing experience, an investigator may follow 80%−90% of a registration, though continuing evolution from the time a study is registered to the time it is written up may preclude perfect adherence to any registration. Researchers can crystallize these annotated procedures into laboratory documents, supplementing a lab’s procedural memory (Cohen & Bacdayan, 1994) with collective declarative memories about how to conduct science. Contrary to those who posit that registration reduces the depth of processing of scientific procedures (Brainerd & Reyna, 2018), registration offloads a substantial working memory burden from investigators. The resultant cognitive space may free them to conduct their science more fluently and talk with their teams about a study’s big picture – just as using surgical checklists both reduces procedural error rates and increases intra-team communication (Fudickar, Hörle, Wiltfang, & Bein, 2012). Furthermore, codifying lab procedures can hasten the development of competence in other labs that use similar techniques (Zander & Kogut, 1995), thereby improving the field’s procedures. If a lab’s procedures prove especially useful, they may be incorporated into professional guidelines, such as those for collecting and analyzing electroencephalographic data (Keil et al., 2014), interpreting model fit (Hu & Bentler, 1999), and ensuring treatment fidelity (Gearing et al., 2011).

Researchers who successfully register a variety of contingencies demonstrate competence in the research enterprise. They transform a garden of forking paths into a map of well-trod grounds with fewer and fewer spaces of methodological terra incognita. Their ability to do so could be considered another index of scientific excellence along with citation counts, grant funding, students graduated, and other metrics. This ability could be demonstrable through an ever-growing ratio of registered to non-registered material in the method and results sections of a series of related papers. Consequently, researchers’ scientific skill could be judged at a bird’s-eye level by peers who may not be experts in particular topics or methods. Furthermore, to the extent that a researcher’s standard operating procedures for dealing with the complexities of clinical research are widely used in the field, the ability to register flexibility successfully may represent a marker of scientific eminence (Simonton, 2016).

What Can Researchers Do Now?

1. Report registered versions of methods and analyses alongside any modifications.

For journals that do not accept registered reports, maintain transparency and mention “registered” analyses first when discussing your results and your justification for trying another approach in a separate “subsequent”, “exploratory”, or “non-registered” results section. This helps dispel accusations of altering hypotheses to better coincide with changes in data analysis. Furthermore, document deviations from your registered methods so that readers can understand how the study changed from its conceptualization to its realization.

2. Consider alternative sampling strategies, designs, and analyses before conducting a study.

Even if you do not use the registered report format, after adequately justifying your proposed methods, have you also explained why you did not employ another method that might be appropriate in other circumstances? Have you anticipated the potential reviewer’s comment of “Why didn’t you just do X?” This is especially pertinent when using a technique not normally used or widely accepted in a particular body of research.

3. Publicize and teach standard operating procedures.

Generate a base set of procedures that all persons involved with data collection in each of your studies follow. Include these instructions as part of your registration. For those in positions to teach the next generation of clinical researchers, these documents can even be used in research methods courses to bring students up to speed on current best practices in study design (and the reasons behind those design choices).

4. Start registering studies.

Take advantage of templates provided by the Open Science Foundation or https://aspredicted.org to jumpstart publicizing your intended analyses and initial hypotheses for new studies. Create a set of predictions from a study you may be conducting currently before you inspect the data. Upload thesis, dissertation, grant, or institutional review board proposals verbatim. Any of these steps will help clinical scientists dip their toes in the preregistration pool; the water’s fine!

Acknowledgments

Aidan Wright’s effort on this project was supported by the National Institute of Mental Health (L30 MH101760). The opinions expressed are solely those of the authors and not those of the funding source.

Footnotes

1

Clinical datasets are often densely packed with information requiring systematic description for secondary analyses. A codebook that details what the variables in a dataset signify, displays what possible (and missing) value codes exist for each variable, and gives variable-level summary statistics eases new researchers into an existing dataset (Vardigan, Heus, & Thomas, 2008). With this level of detail, a researcher can register analytic questions before looking at the observations within a dataset. Automated tools permit researchers to create codebooks that facilitate postregistrations. For example, running the view(dfSummary(nameOfDataframe)) commands from the summarytools package (Comtois, 2019) on an R dataframe generates an indexed, labeled HTML codebook with descriptive statistics and histograms for each variable. Armed with these summaries, researchers can register a variety of exclusionary rules, data transformations, and other analytic decisions. However, analytic plans will be influenced by knowledge of even summary statistics (e.g., knowing whether data have bimodal distributions might prompt investigators to seek groups explaining that bimodality), entailing a tradeoff between rigorously data-independent decision making and making reasonable use of pre-existing information necessary to test models properly.

2

Bayesian statistics do not require adjustments for multiple comparisons, as Bayesian epistemology focuses on adjusting the probability of belief, not on long-run error control. Nevertheless, the probability of registered analyses can be compared to that of alternative exploratory analyses with a Bayes factor to examine whether the exploratory tweaks appreciably change the plausibility of the results.

Contributor Information

Stephen D. Benning, University of Nevada, Las Vegas

Rachel L. Bachrach, VA Pittsburgh Healthcare System

Aidan G. C. Wright, University of Pittsburgh

References

  1. Aslinger EN, Manuck SB, Pilkonis PA, Simms LJ, & Wright AGC (2018). Narcissist or narcissistic? Evaluation of the latent structure of narcissistic personality disorder. Journal of Abnormal Psychology, 127(5), 496–502. 10.1037/abn0000363 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Begley CG, & Ellis LM (2012). Drug development: Raise standards for preclinical cancer research. Nature, 483, 531–533. 10.1038/483531a [DOI] [PubMed] [Google Scholar]
  3. Bem DJ (2004). Writing the empirical journal article In Darley JM, Zanna MP, & Roediger HL III (Eds.), The Compleat Academic: A Career Guide (2nd ed., pp. 185–219). Washington, DC: American Psychological Association. [Google Scholar]
  4. Brainerd CJ, & Reyna VF (2018). Replication, registration, and scientific creativity. Perspectives on Psychological Science, 13(4), 428–432. 10.1177/1745691617739421 [DOI] [PubMed] [Google Scholar]
  5. Chapman DP, Whitfield CL, Felitti VJ, Dube SR, Edwards VJ, & Anda RF (2004). Adverse childhood experiences and the risk of depressive disorders in adulthood. Journal of Affective Disorders, 82(2), 217–225. 10.1016/j.jad.2003.12.013 [DOI] [PubMed] [Google Scholar]
  6. Clark GT, & Mulligan R (2011). Fifteen common mistakes encountered in clinical research. Journal of Prosthodontic Research, 55(1), 1–6. 10.1016/j.jpor.2010.09.002 [DOI] [PubMed] [Google Scholar]
  7. Cohen MD, & Bacdayan P (1994). Organizational routines are stored as procedural memory: Evidence from a laboratory study. Organization Science, 5(4), 554–568. [Google Scholar]
  8. Cuthbert BN (2014). The RDoC framework: Facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry, 13(1), 28–35. 10.1002/wps.20087 [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Dickersin K, & Rennie D (2003). Registering clinical trials. JAMA, 290(4), 516–523. 10.1001/jama.290.4.516 [DOI] [PubMed] [Google Scholar]
  10. Ferguson CJ, & Heene M (2012). A vast graveyard of undead theories: Publication bias and psychological science’s aversion to the null. Perspectives on Psychological Science, 7(6), 555–561. 10.1177/1745691612459059 [DOI] [PubMed] [Google Scholar]
  11. Fiedler K (2018). The creative cycle and the growth of psychological science. Perspectives on Psychological Science, 13(4), 433–438. 10.1177/1745691617745651 [DOI] [PubMed] [Google Scholar]
  12. Frane AV (2015). Planned hypothesis tests are not necessarily exempt from multiplicity adjustment. Journal of Research Practice, 11(1), 2. [Google Scholar]
  13. Fried EI (2017). The 52 symptoms of major depression: Lack of content overlap among seven common depression scales. Journal of Affective Disorders, 208, 191–197. 10.1016/j.jad.2016.10.019 [DOI] [PubMed] [Google Scholar]
  14. Fudickar A, Hörle K, Wiltfang J, & Bein B (2012). The effect of the WHO Surgical Safety Checklist on complication rate and communication (19.10.2012). Deutsches Ärzteblatt International, 109(42), 695–701. 10.3238/arztebl.2012.0695 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Gearing RE, El-Bassel N, Ghesquiere A, Baldwin S, Gillies J, & Ngeow E (2011). Major ingredients of fidelity: A review and scientific guide to improving quality of intervention research implementation. Clinical Psychology Review, 31(1), 79–88. 10.1016/j.cpr.2010.09.007 [DOI] [PubMed] [Google Scholar]
  16. Gelman A, & Loken E (2014). The statistical crisis in science. American Scientist, 102(6), 460–465. 10.1511/2014.111.460 [DOI] [Google Scholar]
  17. Gernsbacher MA (2018). Writing empirical articles: Transparency, reproducibility, clarity, and memorability. Advances in Methods and Practices in Psychological Science, 2515245918754485. 10.1177/2515245918754485 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Goldacre B, Drysdale H, Dale A, Milosevic I, Slade E, Hartley P, … Mahtani KR (2019). COMPare: a prospective cohort study correcting and monitoring 58 misreported trials in real time. Trials, 20(1), 118 10.1186/s13063-019-3173-2 [DOI] [PMC free article] [PubMed] [Google Scholar]
  19. Harms C, & Lakens D (2018). Making “null effects” informative: Statistical techniques and inferential frameworks. Journal of Clinical and Translational Research, 3(S2), 7 10.18053/jctres.03.2017S2.007 [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Hu L, & Bentler PM (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1–55. 10.1080/10705519909540118 [DOI] [Google Scholar]
  21. Huang GD, Bull J, Johnston McKee K, Mahon E, Harper B, & Roberts JN (2018). Clinical trials recruitment planning: A proposed framework from the Clinical Trials Transformation Initiative. Contemporary Clinical Trials, 66, 74–79. 10.1016/j.cct.2018.01.003 [DOI] [PubMed] [Google Scholar]
  22. Johnson MR, Kenworthy-Heinige T, Beck DJ, Asghar A, Broussard EB, Bratcher K, … Planeta BM (2018). Research site mentoring: A novel approach to improving study recruitment. Contemporary Clinical Trials Communications, 9, 172–177. 10.1016/j.conctc.2018.01.011 [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Keil A, Debener S, Gratton G, Junghöfer M, Kappenman ES, Luck SJ, … Yee CM (2014). Committee report: Publication guidelines and recommendations for studies using electroencephalography and magnetoencephalography. Psychophysiology, 51(1), 1–21. 10.1111/psyp.12147 [DOI] [PubMed] [Google Scholar]
  24. Kessler RC, Chiu WT, Demler O, & Walters EE (2005). Prevalence, severity, and comorbidity of 12-Month DSM-IV disorders in the National Comorbidity Survey Replication. Archives of General Psychiatry, 62(6), 617–627. 10.1001/archpsyc.62.6.617 [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Kotov R, Krueger RF, Watson D, Achenbach TM, Althoff RR, Bagby RM, … Zimmerman M (2017). The Hierarchical Taxonomy of Psychopathology (HiTOP): A dimensional alternative to traditional nosologies. Journal of Abnormal Psychology, 126(4), 454–477. 10.1037/abn0000258 [DOI] [PubMed] [Google Scholar]
  26. Lakens D (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology, 44(7), 701–710. 10.1002/ejsp.2023 [DOI] [Google Scholar]
  27. Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ, Argamon SE, … Zwaan RA (2018). Justify your alpha. Nature Human Behaviour, 2(3), 168–171. 10.1038/s41562-018-0311-x [DOI] [Google Scholar]
  28. Markon KE, Chmielewski M, & Miller CJ (2011). The reliability and validity of discrete and continuous measures of psychopathology: A quantitative review. Psychological Bulletin, 137(5), 856–879. 10.1037/a0023678 [DOI] [PubMed] [Google Scholar]
  29. Meehl PE (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. [Google Scholar]
  30. Miller GA, & Chapman JP (2001). Misunderstanding analysis of covariance. Journal of Abnormal Psychology, 110(1), 40–48. 10.1037/0021-843X.110.1.40 [DOI] [PubMed] [Google Scholar]
  31. Nosek BA, Ebersole CR, DeHaven AC, & Mellor DT (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(11), 2600–2606. 10.1073/pnas.1708274114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. 10.1126/science.aac4716 [DOI] [PubMed] [Google Scholar]
  33. Orben A, & Przybylski AK (2019). The association between adolescent well-being and digital technology use. Nature Human Behaviour, 3(2), 173–182. 10.1038/s41562-018-0506-1 [DOI] [PubMed] [Google Scholar]
  34. Pallmann P, Bedding AW, Choodari-Oskooei B, Dimairo M, Flight L, Hampson LV, … Jaki T (2018). Adaptive designs in clinical trials: why use them, and how to run and report them. BMC Medicine, 16(1), 29 10.1186/s12916-018-1017-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  35. Rohrer JM (2018). Thinking clearly about correlations and causation: Graphical causal models for observational data. Advances in Methods and Practices in Psychological Science, 1(1), 27–42. 10.1177/2515245917745629 [DOI] [Google Scholar]
  36. Rosenthal R (1979). The “file drawer problem” and tolerance for null results. Psychological Bulletin, 638–641. [Google Scholar]
  37. Rubin M (2017). An evaluation of four solutions to the forking paths problem: Adjusted alpha, preregistration, sensitivity analyses, and abandoning the Neyman-Pearson approach. Review of General Psychology, 21(4), 321–329. 10.1037/gpr0000135 [DOI] [Google Scholar]
  38. Simmons JP, Nelson LD, & Simonsohn U (2012). A 21-word solution. Dialogue, 26(2), 4–7. 10.2139/ssrn.2160588 [DOI] [Google Scholar]
  39. Simonton DK (2016). Giving credit where credit’s due: Why it’s so hard to do in psychological science. Perspectives on Psychological Science, 11(6), 888–892. 10.1177/1745691616660155 [DOI] [PubMed] [Google Scholar]
  40. Srivastava S (2018). Sound inference in complicated research: A multi-strategy approach. 10.31234/osf.io/bwr48 [DOI] [Google Scholar]
  41. Steegen S, Tuerlinckx F, Gelman A, & Vanpaemel W (2016). Increasing transparency through a multiverse analysis. Perspectives on Psychological Science, 11(5), 702–712. 10.1177/1745691616658637 [DOI] [PubMed] [Google Scholar]
  42. Tackett JL, Brandes CM, & Reardon KW (in press). Leveraging the Open Science Framework in clinical psychological assessment research. Psychological Assessment. 10.1037/pas0000583 [DOI] [PubMed] [Google Scholar]
  43. Tackett JL, Lilienfeld SO, Patrick CJ, Johnson SL, Krueger RF, Miller JD, … Shrout PE (2017). It’s time to broaden the replicability conversation: Thoughts for and from clinical psychological science. Perspectives on Psychological Science, 12(5), 742–756. 10.1177/1745691617690042 [DOI] [PubMed] [Google Scholar]
  44. Thabane L, Mbuagbaw L, Zhang S, Samaan Z, Marcucci M, Ye C, … Goldsmith CH (2013). A tutorial on sensitivity analyses in clinical trials: The what, why, when and how. BMC Medical Research Methodology, 13(1), 92 10.1186/1471-2288-13-92 [DOI] [PMC free article] [PubMed] [Google Scholar]
  45. Twenge JM, Joiner TE, Rogers ML, & Martin GN (2018). Increases in depressive symptoms, suicide-related outcomes, and suicide rates among U.S. adolescents after 2010 and links to increased new media screen time. Clinical Psychological Science, 6(1), 3–17. 10.1177/2167702617723376 [DOI] [Google Scholar]
  46. Vazire S (2018). Implications of the credibility revolution for productivity, creativity, and progress. Perspectives on Psychological Science, 13(4), 411–417. 10.1177/1745691617751884 [DOI] [PubMed] [Google Scholar]
  47. Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HLJ, & Kievit RA (2012). An agenda for purely confirmatory research. Perspectives on Psychological Science, 7(6), 632–638. 10.1177/1745691612463078 [DOI] [PubMed] [Google Scholar]
  48. Weston SJ, Ritchie SJ, Rohrer JM, & Przybylski AK (2018). Recommendations for increasing the transparency of analysis of pre-existing datasets. 10.31234/osf.io/zmt3q [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Wicherts JM, Veldkamp CLS, Augusteijn HEM, Bakker M, Aert V, M RC, … M MAL (2016). Degrees of freedom in planning, running, analyzing, and reporting psychological studies: A checklist to avoid p-hacking. Frontiers in Psychology, 7 10.3389/fpsyg.2016.01832 [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Woodberry KA, Giuliano AJ, & Seidman LJ (2008). Premorbid IQ in schizophrenia: A meta-analytic review. American Journal of Psychiatry, 165(5), 579–587. 10.1176/appi.ajp.2008.07081242 [DOI] [PubMed] [Google Scholar]
  51. Zander U, & Kogut B (1995). Knowledge and the speed of the transfer and imitation of organizational capabilities: An empirical test. Organization Science, 6(1), 76–92. [Google Scholar]

RESOURCES