Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Nov 22.
Published in final edited form as: Pers Relatsh. 2020 Nov 13;27(4):907–938. doi: 10.1111/pere.12343

Reflections on a registered report replicating a body of dyadic cross-sectional research

Zachary G Baker 1, Ersie-Anastasia Gentzis 1, Emily M Watlington 1, Sabrina Castejon 1, Whitney E Petit 1, Maggie Britton 1, Sana Haddad 1, Angelo M DiBello 2,3, Lindsey M Rodriguez 4, Jaye L Derrick 1, C Raymond Knee 1
PMCID: PMC9681012  NIHMSID: NIHMS1825901  PMID: 36419735

Abstract

This article reflects on a new kind of registered report (RR) that replicated the work of an early career researcher. The research items targeted in this RR were peer-reviewed, cross-sectional, dyadic studies to which the first author of this RR had contributed. The findings being replicated are not noteworthy for their prestige or representativeness of the wider field. Instead, this method of replication may have several benefits and less desirable qualities for the researcher and research team whose work is being replicated, for science more broadly, and for relationship science specifically, reviewed herein. The authors hope that this reflection inspires researchers to improve upon their methodology by incorporating replication of their work early and often into their own research process.

Keywords: dyadic data analysis, other, surveys

1 |. INTRODUCTION

Many large-scale registered replication reports have been conducted in the fields of social psychology, cognitive psychology, and the humanities more generally (Camerer et al., 2018; Ebersole et al., 2016; Klein et al., 2014, 2015; Open Science Collaboration, 2015). Indeed, there are journals devoted to this methodology with nearly 100 published registered reports (Hardwicke & Ioannidis, 2018). Although methodology for smaller-scale registered reports may be more varied, methodology for choosing the studies to be included in registered replication reports often seems to include, systematically, items from particular journals (Camerer et al., 2018; Open Science Collaboration, 2015), of which researchers are skeptical (The Religious Replication Project, n.d.); that others have failed to replicate in the past (Baumeister & Vohs, 2016); or that have had a disproportionate impact on the field (Hagger & Chatzisarantis, 2016b). Although “registered reports should not be seen as a one-shot cure for reproducibility problems in science.” (Chambers, Dienes, McIntosh, Rotshtein, & Willmes, 2015, p. A2), there is widespread agreement that they are a positive step forward in the production of rigorous science (Lindsay, 2015; Nosek, Ebersole, DeHaven, & Mellor, 2018) and may be fundamental to the philosophy of science (Popper, 2005). The present registered report (RR) pursues a different means of selecting studies for replication. Specifically, we aim to replicate all studies (a) that have passed some form of peer review (i.e., acceptance for conference proceedings, receipt of a request for revision and resubmission, or acceptance for publication); (b) on which the first author of this RR has served as an author; and (c) that were reports of dyadic, cross-sectional samples. In doing so, we propose a method of replication that encourages deep self-inspection of one’s own research through an example using a type of data that is ubiquitous in relationship research but that has rarely been featured in replication studies.

To our knowledge, the present RR is the first to attempt to replicate an author’s entire body of findings using a specific methodology (in this case, all dyadic, cross-sectional findings that have passed some form of peer review). However, this work follows a positive tradition of researchers attempting to replicate their own work (Baumeister & Vohs, 2016; Open Science Collaboration, 2015; Schweinsberg et al., 2016), efforts that have been positively received in the past (Hagger & Chatzisarantis, 2016a). In designing and conducting the present RR, several benefits and costs of the process were encountered, fitting into three primary domains: (a) implications for the researcher and research team whose work is being replicated, (b) implications for science more broadly, and (c) implications for relationship science specifically. The primary contribution of this manuscript is a reflection on a new type of RR; however, to fully explore the value of this methodology, we also provide full preregistration of our methodology and data analytical strategy, as well as brief descriptions of all effects we attempted to replicate. We hope that our efforts aid future researchers in furthering the open science movement at large and in benefitting their own work more specifically.

1.1 |. Implications for the researcher and research team whose work is being replicated a meaningful and safe way for early career researchers to engage in open science

Among the key strengths of the present RR method is that it is particularly well-suited for replicating the work of early career researchers (ECRs). More established researchers may have more findings to replicate than are practical to test in a single RR. Conversely, the earlier in their career a researcher is, the fewer findings they will likely have to replicate, making it more feasible for them to conduct a replication of a body of work, as is proposed here. Moreover, if an ECR makes the effort to replicate a large body of work and continues to replicate new research findings every few years, they could conceivably replicate their entire body of work throughout their career. In addition to this methodology being particularly practical for ECRs, it provides benefits to them that may be particularly useful: the opportunity to build their career on strong foundations without the potential for offending established researchers.1

Although replication is undoubtedly good for science (Lindsay, 2015), and may indeed be fundamental to the philosophy of science (Popper, 2005), it is also good for those building a career in science. One challenge for an ECR is to define their program of research coherently. Although they may explore several ideas and avenues throughout graduate school and postdoctoral training, successful ECRs often continue a single line of research established during their training (Zacks & Roediger III, 2004). Often, this continuation will mean giving up one or more promising lines of research to concentrate more deeply on another. If an ECR has an RR of his or her work to help him or her make that decision, it is likely to send him or her down a productive path. Rather than attempting to build on hard-to-replicate work or work without practically meaningful effect sizes, the ECR can choose to build on more robust and replicable work with effect sizes that are more consequential, a choice that will almost surely be more fruitful. In addition, by including research across the pipeline (e.g., conference posters and symposia, as-yet unpublished manuscripts),2 some work may not be replicated, and thus, potentially spurious findings may be prevented from entering the literature.

In addition to the decision of which line of research to pursue, ECRs may be at a unique crossroad for embracing the open science movement. By embracing open science and conducting an RR of their own work, ECRs may be more strongly inclined to follow open science practices compared to a similar researcher who has not conducted an RR. This inclination might be driven by dissonance induced by not continuing to follow open science practices (Tavris & Aronson, 2007) or self-perception effects (Bem, 1972). More simply, the associated review of the open science literature that comes with preparing a manuscript following the present methodology might help ECRs to better understand the benefits open science practices offer. Although several authors of the present RR were reasonably informed consumers of open science research prior to conducting the present work, preparing this manuscript markedly increased our understanding of the movement and knowledge of what has been done in the field of open science. Indeed, as academics, it is often difficult to find time to pursue tasks not directly related to obtaining tangible outcomes that will benefit our careers (e.g., grants and papers), but the present format appears to align these tangible outcomes directly with conducting better science.

The principles of sound science can also serve future career goals, such as obtaining jobs and being awarded grants. Between 2018 and 2019, a list of more than 20 job offers that require or suggest an open science statement from applicants was compiled (Schönbrodt, Mellor, & Bergmann, 2018). This list is likely to grow given current trends toward greater adoption of open science practices (Hardwicke & Ioannidis, 2018). Likewise, the National Institutes of Health (NIH) now requires rigorous design for robust and unbiased results. One of NIH’s example funded applications for good rigor stated that “Their results showed good reproducibility when replicated” (Enhancing Reproducibility through Rigor and Transparency | grants.nih.gov, n.d.).

The benefits we have discussed so far for building one’s career through this RR format have focused on positive reinforcements (i.e., the benefits this method can bring), rather than negative reinforcements (i.e., the removal of costs). One major cost that might exist for other replication attempts is the possibility of offending more established researchers by trying to replicate their work. Replicating one’s own work may reduce the potential professional dangers of engaging in traditional replications for ECRs. For example, advanced researchers might have the opportunity to retaliate against ECRs who publish failed replications of their work (e.g., as editors, reviewers of papers, grant reviewers, etc.). One particularly compelling element of this project from its inception was that the first author’s name was the one at most risk of being sullied.3 It is still possible that more senior researchers might be displeased by failures to replicate extensions of their work. However, when replicating one’s own work, this becomes a considerably smaller risk because the senior researcher’s name is much less directly tied to the work. Although there have been some anecdotal examples of senior scientists retaliating against ECRs, we were unable to find compelling frequency information. Not all senior scientists do or would engage in these behaviors, and it is entirely possible that these phenomena are exceedingly rare. Moreover, thoughtful critiques of replication studies are essential to good science and should not be confused with uncouth retaliatory behavior.

In addition to career benefits, this RR format has the potential for costs. For instance, the process of conducting replications takes time away from other, more novel research. Although we feel this is a reasonable tradeoff given the benefits discussed above, our field does, as a whole, still tend to favor novel findings despite efforts to change that incentive structure (Chambers et al., 2015; Nosek, Spies, & Motyl, 2012). If one is able to obtain a publication in the process, perhaps through journals that specifically target registered reports (for recent reviews, see Hardwicke & Ioannidis, 2018; Mehlenbacher, 2019) or a perspective piece, then this work does provide tangible benefits in line with conducting more novel work (i.e., a line on one’s CV). Still, our experience demonstrates that coauthors, reviewers, and editors had trouble envisioning what a final report of an RR in this format would look like. Although there was near-universal agreement that the idea of replicating one’s own line of research was interesting and beneficial for ECRs, it was difficult to find the coherent thread necessary for a strong manuscript, given the variety of research questions being replicated.

1.2 |. Implications for science

When examining the broader scientific implications of ECR RRs, it is worth considering the context of the ECR’s training. Specifically, the present ECR RR is that of a researcher who was academically trained during the replication debate. Indeed, the student’s first undergraduate psychology course coincided with the year of acceptance of Daryl Bem’s (in)famous article on the topic of Psi (Bem, 2011). Although this article was certainly not the origin of the replication debate, it has been credited as the single moment that “set off the ‘replication crisis’ in psychology,” arousing considerably greater attention than replication had previously received (“Daryl Bem Proved ESP Is Real. Which Means Science Is Broken.,” 2017). In that same year, one of the first large-scale protocols was created for a registered replication report to estimate the reproducibility of psychological science (Open Science Collaboration, 2015). Likewise, the first year of this author’s graduate study was the same year that the first three scientific journals began to offer a registered report format (Chambers & Sala, 2013), including the first two journals in psychology (Hardwicke & Ioannidis, 2018; “Registered Reports and Replications in Attention, Perception, & Psychophysics,” 2013; cf. Al-Marzouki, Roberts, Evans, & Marshall, 2008). An interesting question that may be answered through large-scale adoption of this methodology is whether and to what extent these major field events altered the practices of individual researchers. If these field events, or responses to them, did significantly alter research practices, this might suggest optimism for the replicability of future psychological science.

A notable bias of several replication attempts is that they have, at times, favored feasibility over systematic replication.4 Several recent replication reports have aimed explicitly to replicate studies that did not take long to administer (Ebersole et al., 2016; Klein et al., 2014), were of fairly simple designs (Camerer et al., 2018; Klein et al., 2014), and were predominantly laboratory-based (Hardwicke & Ioannidis, 2018). We do not suggest that these are fatal flaws, nor do we argue that our design is without a set of biases of its own. Instead, we suggest that complementary sets of biases may be helpful in creating a more revealing picture of the state of our science. If the results of the present RR converge with estimates of reproducibility in science from other replication attempts that have different biases, it may suggest that the biases uniquely present in our RR do not systematically influence replicability. For instance, if our dyadic, online results align with those registered replication reports that used individual, laboratory-based designs, it would provide some evidence that a finding’s replicability may not depend on whether it is laboratory-based or online, nor whether it is individual versus dyadic. On the other hand, if our results diverge from estimates of reproducibility in science from replication attempts that have different biases than the present work, it would suggest that the biases uniquely present in our RR are important. For instance, if our dyadic online results deviate from those replication attempts that used individual, laboratory-based designs, it would provide some evidence that a finding’s replicability may depend on whether it is laboratory-based versus online, whether it is individual versus dyadic, or both.5 If other ECRs begin conducting RRs of their work that uses other methodologies, we might further discern many important methodological elements that alter rates of replicability.

Another focus of some replication attempts has been the replication of findings that are published in high-impact factor journals (Camerer et al., 2018; Open Science Collaboration, 2015). The present methodology includes all works by an author that use a given methodology, including findings published in journals that have lower impact factors and findings that were presented at conferences but never (or not yet) published. Again, if replication rates and effect sizes in systematic ECR RRs diverge from other replication attempts, it suggests that the prestige and outlet in which findings are presented is a bias that is worth exploring further.

Due to a need to focus on the feasibility of assessing a large number of findings, some replication attempts have been forced to use measures that had particularly poor reliabilities (Ebersole et al., 2016), limiting the possibility of finding true associations between constructs of interest. One of the noteworthy benefits of having fewer findings to attempt to replicate is that replications can make use of longer, more reliable instruments. If the use of more reliable instruments is associated with greater replicability, it would lead to adjustments in estimates of replicability across the field and reinforce the need for strong measurement in future research.

In addition to the scientific opportunities we have discussed, there are several potential scientific drawbacks associated with the impact of the work being replicated. Because this method specifically targets ECRs, the work being replicated will be, by definition, new to the field. Past registered replication reports have pointed out that identifying findings for replication that are of a certain age is important so that indicators of citation impact can be interpreted meaningfully (Open Science Collaboration, 2015). In the present example, the oldest research item was published just 3 years ago and has altogether been cited fewer than 50 times (at the time of initial submission of this RR). Three years is hardly enough time to adequately assess the impact of the work on the field. Indeed, following the original work with an RR so quickly may substantially affect its impact (positively or negatively).

Likewise, the value of any given RR from an ECR may be limited. Perhaps the most notable weakness of RRs of ECR work is a lack of representativeness. As we noted above, although one ECR RR may hint at wider trends, it is at best only convincingly generalizable to the research of the author whose work is being replicated.6 This is a valid criticism that should not be taken lightly. The value of the findings that are being replicated in a given RR for the sake of theory or even meta-science is questionable given this issue of generalizability. Instead, it is the widespread adoption of RRs by ECRs that will provide the most valuable data and have the greatest benefit for science because this widespread adoption will give better indicators of wider trends.

Another drawback of this methodology is that allowing individuals to replicate their own work may enable conscious or unconscious biases that a third-party replicator might not possess. One way we attempted to resolve this bias was to complete a preregistration form and to write all the syntax to analyze the RR prior to data collection, which we hope will serve to remove as many researcher degrees of freedom as possible. Still, it is unlikely that we will be able to resolve all the motivated reasoning that accompanies attempts to replicate our own work. Perhaps future collaborations might exist in which ECRs replicate each other’s findings, rather than their own. This could increase evidentiary value and may not increase burden if the volume of work is matched between collaborators.

1.3 |. Implications for relationship science

The present RR focuses on dyadic cross-sectional data, which are widely featured in the field of close relationships but have rarely been featured in RRs. Many of the broad implications of conducting an RR of ECRs’ work for science also apply to relationship science. Still, several considerations arose in the design of the present RR because it focuses on dyadic cross-sectional findings, which may be more unusual to see in scientific domains outside of relationship science. We therefore turn to an exploration of these considerations as they affect relationship science specifically.

Perhaps most poignantly, we think the collection of the data for the present RR will contribute to burgeoning efforts in relationship science to answer big questions through large bodies of data (A Scientific Approach to Living in Love, n.d.). In this vein, we intend to share the data collected through this RR once they have been deidentified. There are a number of considerations in sharing dyadic data that are pertinent (see Joel, Eastwick, & Finkel, 2018 for a discussion), but we have determined that the benefits of sharing these data outweigh the potential costs.

In addition to data sharing, sharing our materials, including syntax for data cleaning, replication analyses, and power analyses, may be useful to future researchers learning to handle these kinds of data. Although examples of syntax for performing each of these tasks already exist in individual laboratories and tutorial articles, the diverse array of findings examined here may help future researchers to understand contexts that may apply to their own designs. Researchers without access to specialized laboratories (e.g., those collecting and analyzing dyadic data) might use our syntax as a learning tool before collecting and analyzing their own data. The syntax from this RR is likely more varied than what could be shared in tutorial manuscripts due to journal space concerns and thus might provide a useful supplement to those tutorials.

Learning about power in dyadic studies, what affects it, and how to estimate it were also useful exercises for us as a research team and may be useful to future ECRs using this methodology. Power for simpler designs such as those reported in many registered replication reports is easier to calculate than for the present dyadic design. For example, a careful reading of a well-known article on power (Cohen, 1992) and some freely available software (Faul, Erdfelder, Lang, & Buchner, 2007) will likely be sufficient to estimate power for a two-condition experiment. On the other hand, estimating dyadic power is relatively more complex (Campbell, Loving, & LeBel, 2014). Free, user-friendly tools are available (Ackerman & Kenny, 2016), but these tools can only handle the simplest cases of dyadic analysis with a single predictor and a single outcome for both actors and partners. In more complex cases, Monte Carlo simulations are required to accurately estimate power, for which there is a somewhat steeper learning curve (Lane & Hennes, 2018). Perhaps the most notable takeaway from this exercise for the author team was just how underpowered most of our own dyadic studies are. Although this limitation has been reported elsewhere (Ackerman, Ledermann, & Kenny, 2015), it became particularly noticeable when examining our own data and effects of interest. As such, we suspect that the proposed sample size is considerably larger than most dyadic sample sizes in the literature.7

We also think that collecting dyadic data for a systematic RR of an ECR’s work carries with it some inherent difficulties. For instance, when conducting an RR with dyadic data, some original studies may be empirically distinguishable by sex or gender, whereas others may not be empirically distinguishable. Should the replications of each study follow original reporting? Perhaps they should err on the side of distinguishability. Perhaps they should err on the side of nondistinguishability. Moreover, if the data are to be treated as distinguishable, considerably more dyads are needed for appropriate power (Ackerman et al., 2015). What if the original results were distinguishable, but a test of distinguishability in the replication revealed no evidence of distinguishability? These issues will have to be navigated carefully both in the planning and review process. We recommend preregistering these decisions, or even the processes for making these decisions, when planning the RR.

Another consideration for RRs in relationship science is how samples were collected. For instance, all our samples were collected from dyads in which at least one member was a university student, but our RR will not recruit university students. As a result, the current RR will not be a direct replication. We deemed this deviation to be a good thing because our findings were, and are expected to be, generalizable to romantic relationships more broadly. Still, this expectation may not be the case for all researchers. It seems likely that some researchers will want direct replication of their original samples (e.g., married dyads, couples experiencing major relationship events). Given that costs are already high for collecting a sample as unrestrictive as our own, researchers should decide whether they have or can obtain resources devoted to collecting a new sample for their RR efforts.

1.4 |. This study

Although the primary contribution of the present work is to reflect upon the method of replicating a researcher’s body of work, we provide our method and results from this RR below by way of illustration and so that future researchers may use or improve upon our reports if they choose to employ this method.

2 |. THE METHODOLOGY EMPLOYED FOR OUR RR

2.1 |. Study selection procedure

Notably, there is not a single coherent theoretical thread that ties these studies together. Instead, studies were selected for the present RR based on the following selection criteria: (a) passed at least one round of peer review (i.e., received a revise and resubmit; had been accepted to be presented at a conference; was published in a journal); (b) included the present manuscript’s first author as a first or coauthor; and (c) included only cross-sectional, dyadic findings. In instances where a finding passed peer review in multiple locations (e.g., as a poster at a conference and as a manuscript), manuscripts took precedence over paper presentations, which took precedence over posters, with the expectation that these parameters reflected greater rigor of peer review.

2.2 |. Selected studies

Of a total of 47 research items (i.e., published manuscripts, manuscripts that had received revise and resubmits, conference papers, and conference posters), 15 research items included findings that fit all three study selection criteria. Of these 15 eligible research items, 7 items were unique (i.e., the other 8 were overlapping: for instance, a finding that was both presented as a poster and published in a manuscript). The seven research items, in turn, contributed 116 dyadic cross-sectional estimates from 12 separately reported studies (Baker, Chopik, & Nguyen, under review; Baker, Nguyen, Knee, & Petit, 2018; Haddad et al., 2016; Hadden, Baker, & Knee, 2018; Hadden, Rodriguez, Knee, DiBello, & Baker,; Nguyen, Baker, & Knee, 2018; Rodriguez et al., 2019).

2.3 |. Power

Power analyses for each finding were conducted in line with procedures presented by Lane and Hennes (2018) using Monte Carlo simulations in Mplus Version 8.1. Simulations were conducted with observed predictor means/variances/covariances, residual variances, and estimates of outcomes regressed onto predictors. For each simulation, 1,000 replications were requested. When data were missing, the number of observations (i.e., individuals) and “csizes” (i.e., dyads and number of individuals within a dyad) were specified as the number of complete dyads from the sample. Sensitivity analyses were then conducted to determine the minimum sample size to detect significant regression estimates with power = 0.80 based upon observed effects. All simulations are presented in the online supplementary materials (https://osf.io/4x3bc/).

2.4 |. Participants

This study was approved by the University of Houston Institutional Review Board, and data were collected in January 2020. Both members of a romantic dyad were recruited for the study. We aimed to collect 335 dyads (670 participants) to allow sufficient power (0.80) to detect our smallest, originally significant effects of interest. This sample size provided far greater than conventional levels of power for many of our other effects, which was prudent given that replications consistently report effect sizes 33–77% smaller than those of original reports (Camerer et al., 2018). All participants were residents of the United States. In line with similar online recruitment methods, we estimated that the average participant would be approximately 32 years of age (Buhrmester, Kwang, & Gosling, 2011). All samples from the original studies included at least one university student (Rodriguez et al., 2019; Study 1 also required that dyads be married and at least one member of the dyad be an alcohol consumer). As a result, the current RR was not a direct replication because it used a slightly different sample. Still, none of the research items sought to examine findings only in university samples. Instead, all of the findings were and are expected to be generalizable to romantic relationships more broadly.

2.5 |. Procedure

Participant responses were collected through Qualtrics Panels, and participants were compensated $15 for their time. The survey was estimated to take 31.8 min per partner based upon the Qualtrics’ ExpertReview feature. Participants responded to a series of questionnaires online in the location of their choosing, at a time of their choosing. Questionnaires and items within questionnaires were randomly ordered, with every participant receiving all questionnaires. Participants included both members of a romantic dyad.

2.6 |. Data sharing plan

Sharing dyadic data has different implications than sharing other kinds of data (Finkel, Eastwick, & Reis, 2015; Joel et al., 2018). Because we do not believe that our variables include particularly sensitive questions (e.g., intimate partner violence), we have shared the data from this RR on the Open Science Framework (https://osf.io/4x3bc/). We specifically shared aggregated variables only and identifiers necessary to analyses (e.g., couple identifiers). We did not include personal identifiers, individual item responses, or demographic variables given concerns that these may be used by people trying to identify their partners’ responses. In addition to the present purpose, these data may allow for additional analyses by researchers who have an interest in any of the diverse constructs examined here.

3 |. RESULTS

3.1 |. Analyses

3.1.1 |. Preregistered analyses

Whenever possible, we matched analyses as closely to those of the original study as possible. To do so, we consulted the original code that was used to describe the peer-reviewed findings. In some cases, the results did not include measures of effect size. In those instances, we calculated the effect sizes. The preregistration for our analyses may be found on the Open Science Framework (https://osf.io/4x3bc/). A successful replication was preregistered to be one in which the finding of interest (e.g., autonomy regressed onto partner attachment avoidance) is statistically significant at p < .05 and in the same direction as the original finding.8 We note here that, because of the levels that we have set for α (.05) and β (.20), it should be expected that some of our findings might be nonsignificant even if they are tests of true effects, and some findings could be significant even if they are tests that do not represent true effects. Thus, the present RR should not be considered a definitive indication of which of these findings are true effects and which are not but might be one of several indicators of true effects. All models were estimated using multilevel modeling in SAS Proc Mixed. The two studies that include the estimation of indirect effects had the constituent estimates and standard errors of those effects entered into RMediation (Tofighi & MacKinnon, 2011). All R and SAS syntax have been written prior to data collection and have been preregistered (https://osf.io/4x3bc/). There were 28 distinct multilevel models that were estimated in SAS and 16 indirect effects that were estimated in R. In addition to our preregistered syntax, we uploaded our final syntax for the results presented here along with any deviations from the preregistered syntax highlighted in annotations.9

3.2 |. Sample

Participants included 350 romantic dyads (total n = 700). In line with preregistered criteria, those who did not accurately answer at least three-fourth of the quality control questions were excluded from analyses (10 dyads failed to meet these criteria), leaving a final sample of 340 romantic dyads (total n = 680). Zero additional participants reported relationship lengths under 3 months; therefore, no participants were excluded based on this criterion. Participants ranged in age from 19 to 81 years (M = 44.35, SD = 15.06). The sample was 76.47% White/Caucasian, 11.32% Black/African American, 5.59% Asian, 3.82% Multiethnic or Other, 2.35% Native American/American Indian, and 0.44% Native Hawaiian/Pacific Islander. Hispanic/Latino ethnicity was reported in 8.86% of participants. Average relationship duration was 14.34 years (SD = 13.39). The entire sample reported being heterosexual and were predominantly married (58.68%), exclusively dating (16.76%), or nearly engaged (12.79%), with the remainder reporting engaged (8.82%) or casually dating (2.94%).

3.3 |. Misreporting and deviations from preregistered protocol

Errors and deviations from the preregistered protocol fell into three categories: (a) errors in preregistered analysis syntax or survey programming, (b) errors in extraction from original studies, and (c) changes to the design due to Qualtrics Panels data collection. A short description of each of these issues and how we dealt with them is presented here, and further details can be found in the “Summary of Pre-Registration Changes” document, in our syntax where we provided annotations directly alongside any amended code, and in Tables 1 and 2. Syntax and the changes document may be found on the Open Science Framework (https://osf.io/4x3bc/).

TABLE 1.

Description of studies being replicated

Citation Study/sample number Sample size Effect size Significance Hypothesis/research question Variable - scale (items; citation) Other notes
Baker et al. (under review) 1 375 dyads (750 individuals) H1a: β = .55
H1b: β = −.26
H1c: β = .10
RQ2a: β = .004
RQb: β = −.08
RQc: β = .05
H1a: p < .001
H1b: p < .001
H1c: p < .001
RQa: p = .899
RQb: p = .002
RQc: p = .076
H1: Actor (a) satisfaction, (b) quality of alternatives, and (c) investments will positively (a and c) and negatively (b) predict commitment.
RQ1: Explore the roles of partners’ (a) satisfaction, (b) quality of alternatives, and (c) investments as predictors of actor commitment.
Satisfaction, Quality of Alternatives, Investment, and Commitment - Investment Model Scale (5, 5, 5, and 7 items, respectively; Rusbult, Martz, & Agnew, 1998)
Baker et al. (2018) 1 375 dyads (750 individuals) H1: β = .61
H2a: β = .31
H2b: β = .08
RQ1: β = .10
RQ2a: β = −.04
RQ2b: β = .14
RQ3a.1: β = .19
RQ3b.1: β = −.004
RQ3c.1: β = −.02
RQ3d.1: β = .03
RQ3a.2: β = .05
RQ3b.2: β = .01
RQ3c.2: β = .08
RQ3d.2: β = .01
H1: p < .001
H2a: p < .001
H2b: p = .048
RQ1: p = .001
RQ2a: p = .313
RQ2b: p = .001
RQ3a.1: [0.14, 0.24]
RQ3b.1: [−0.01, 0.004]
RQ3c.1: [−0.07, 0.02]
RQ3d.1: [0.01, 0.05]
RQ3a.2: [0.001, 0.10]
RQ3b.2: [0.004, 0.04]
RQ3c.2: [0.03, 0.13]
RQ3d.2: [<0.001, 0.002]
H1: Actor self-determination will predict actor empathy.
H2: Actor empathy will predict actor affection (a) given and (b) received.
RQ1: Partner self-determination will predict actor empathy.
RQ2: Partner empathy will predict actor affection (a) given and (b) received.
RQ3: Test the (a) actor–actor, (b) partner–partner, (c) actor–partner, and (d) partner–actor indirect effects for (.1) affection given and (.2) affection received
Self-Determined Motivation - Couples Motivation Questionnaire (18 items; Blais, Sabourin, Boucher, & Vallerand, 1990)
Empathy - Basic Empathy Scale (20 items; Jolliffe & Farrington, 2006)
Affection - Trait Affection Given and Received (16 items, Floyd, 2002)
95% confidence intervals for indirect effects are given because RMediation was used to estimate indirect effects
Haddad et al., 2016 1 132 dyads (264 individuals) H1: β = .12
H2: β = .18
H1: p = .062
H2: p = .006
H1: Actor fear of being single will predict actor drinking to cope.
H2: Partner fear of being single will predict actor drinking to cope.
Fear of Being Single - Fear of Being Single Scale (6 items; Spielmann et al., 2013)
Drinking to Cope - Drinking Motivations Questionnaire, Coping Subscale (5 items; Cooper, 1994)
These analyses were conducted controlling for drinks per week.
Hadden et al. (2018) 3 200 dyads (400 individuals) H1a: β = .18
H1b: β = .30
H2a: β = .24
H2b: β = .21
H1a: p = .008
H1b: p < .001
H2a: p < .001
H2b: p = .001
H1: Actor relationship autonomy will positively predict actor (a) forgiveness, (b) accommodation, (c) willingness to forgive, and (d) forgiveness for a specific transgression.
H2: Partners relationship autonomy will positively predict actor (a) forgiveness, (b) accommodation, (c) willingness to forgive, and (d) forgiveness for a specific transgression.
H3: Partners relationship autonomy will predict actor perceptions of their partners’ (a) willingness to forgive and (b) forgiveness for a specific transgression.
Self-Determined Motivation - Couples Motivation Questionnaire (18 items; Blais et al., 1990)
Forgiveness - EVLN Forgiveness Scale (16 items; Rusbult, 2000)
Accommodation - EVLN Accommodation Scale (16 items; Rusbult, Verette, Whitney, Slovik, & Lipkus, 1991)
The subscales of the forgiveness and accommodation measures were also tested in the manuscript but are not presented here. We also controlled for commitment in all analyses.
4 238 dyads (476 individuals) H1c: β = .23
H1d: β = .15
H2c: β = .21
H2d: β = .12
H3a: β = .15
H3b: β = .04
H1c: p < .001
H1d: p = .010
H2c: p < .001
H2d: p = .036
H3a: p = .014
H3b: p = .548
Self-Determined Motivation - Couples Motivation Questionnaire (18 items; Blais et al., 1990)
Willingness to Forgive - Tendency to Forgive Scale (15 items; Rusbult, 2000)
Specific Transgression Forgiveness - Transgression Related Interpersonal Motivations Scale (12 items; McCullough et al., 1998)
Perception of Partner’s Willingness to Forgive - Tendency to Forgive Scale (15 items; Rusbult, 2000)
Perception of Partner’s Specific Transgression Forgiveness - Transgression Related Interpersonal Motivations Scale (12 items; McCullough et al., 1998)
Hadden, Rodriguez, Knee, DiBello, and Baker (2016) 1 78 dyads (156 individuals) H1a: β = .20
H1b: β = −.14
H1c: β = −.13
H2: β = −.20
H3a.1: β = .02
H3a.2: β = −.27
H3b.1: β = −.03
H3b.2: β = −.02
H3c.1: β = −.18
H3c.2: β = .07
H1a: p < .001
H1b: p = .030
H1c: p = .049
H2: p < .001
H3a.1: p = .707
H3a.2: p < .001
H3b.1: p = .608
H3b.2: p = .764
H3c.1: p = .006
H3c.2: p = .319
H1: Partner anxious attachment will (a) positively predict actor relatedness, (b) negatively predict actor autonomy, and (c) negatively predict actor competence.
H2: Partner avoidant attachment will negatively predict actor relatedness.
H3: Actor (.1) anxious and (.2) avoidant attachment will negatively predict actor (a) relatednesss, (b) autonomy and (c) competence
Attachment - Experiences in Close Relationships Scale - Short Form (12 items; Wei, Russell, Mallinckrodt, & Vogel, 2007)
Need Fulfillment - Basic Psychological Needs Scale (9 items; La Guardia, Ryan, Couchman, & Deci, 2000)
We controlled for the other needs in all analyses. For example, when examining autonomy as the outcome, we controlled for both competence and relatedness.
2 132 dyads (264 individuals) H1a: β = .08
H1b: β = −.09
H1c: β = .002
H2: β = −.14
H3a.1: β = −.12
H3a.2: β = −.31
H3b.1: β = −.01
H3b.2: β = −.31
H3c.1: β = .03
H3c.2: β = .02
H1a: p = .074
H1b: p = .072
H1c: p = .998
H2: p = .017
H3a.1: p = .010
H3a.2: p < .001
H3b.1: p = .802
H3b.2: p < .001
H3c.1: p = .530
H3c.2: p = .803
Nguyen et al. (2018) 1 204 dyads (408 individuals) H1a: β = .58
H1b: β = .13
H2a: β = .52
H2b: β = .09
RQ1a: β = .30
RQ1b: β = .01
RQ1c: β = .05
RQ1d: β = .07
H1a: p < .001
H1b: p = .001
H2a: p < .001
H2b: p = .036
RQ1a: p < .001
RQ1b: p = .077
RQ1c: p = .038
RQ1d: p = .002
H1: (a) actor and (b) partner internal motivation will predict greater need fulfillment.
H2: (a) actor and (b) partner need fulfillment will predict greater trust.
RQ1: Test the (a) actor–actor, (b) partner–partner, (c) actor–partner, and (d) partner–actor indirect effects.
Self-Determined Motivation - Couples Motivation Questionnaire (18 items; Blais et al., 1990)
Need Fulfillment - Basic Psychological Needs Scale (9 items; La Guardia, Ryan, Couchman, & Deci, 2000)
Trust - Trust in Partners Scale (17 items; Rempel, Holmes, & Zanna, 1985)
2 376 dyads (757 individuals) H1a: β = .71
H1b: β = .16
H2a: β = .56
H2b: β = .09
RQ1a: β = .40
RQ1b: β = .01
RQ1c: β = .06
RQ1d: β = .09
H1a: p < .001
H1b: p < .001
H2a: p < .001
H2b: p = .066
RQ1a: p < .001
RQ1b: p = .077
RQ1c: p = .066
RQ1d: p < .001
Self-Determined Motivation - Couples Motivation Questionnaire (18 items; Blais et al., 1990)
Need Fulfillment - Basic Psychological Need Satisfaction and Frustration Scale (24 items; Chen et al., 2014)
Trust - Faith in Partner’s Love and Closeness (16 items; Murray et al., 2009)
Rodriguez et al. (2019) 1 123 dyads (246 individuals) H1a.1: β = −.07
H1a.2: β = −.47
H2a.1: β = .01
H2a.2: β = −.25
H3a.1: β = −.15
H3a.2: β = −.28
H4a.1: β = .11
H4a.2: β = .26
H1a.1: p = .245
H1a.2: p < .001
H2a.1: p = .832
H2a.2: p < .001
H3a.1: p = .019
H3a.2: p < .001
H4a.1: p = .085
H4a.2: p < .001
H1: Actor attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive.
H2: Partner attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive.
H3: Actor attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive than they report being.
H4: Partner attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as more (a) satisfied and (b) committed than they report being.
H4c: Partner attachment (.1) anxiety will predict perceiving partners as less responsive than they report being. Partner attachment avoidance (.2) will predict perceiving partners as more responsive than they report being.
Attachment - Experiences in Close Relationships Scale - Short Form (12 items; Wei et al., 2007)
Satisfaction - Relationship Assessment Scale (7 items; Hendrick, 1988)
All outcome variables assessed in this study were used in their traditional forms for H1 and H2. For H3 and H4, analog measures assessing perceptions of the partner were also administered. From these two measures, standardized discrepancy scores were created. Specifically, all scale scores for individual and partner perception measures were converted to z-scores; then, standardized scores of partner reports were subtracted from standardized scores of actor perception (e.g., Z_diff = Z_(actor perception of partner satisfaction)-Z_(partner-reported satisfaction)).
2 78 dyads (156 individuals) H1a.1: β = −.12
H1a.2: β = −.38
H1b.1: β = −.14
H1b.2: β = −.39
H2a.1: β = .16
H2a.2: β = −.38
H2b.1: β = .16
H2b.2: β = −.28
H3a.1: β = −.21
H3a.2: β = −.04
H3b.1: β = −.26
H3b.2: β = −.19
H4a.1: β = .16
H4a.2: β = .13
H4b.1: β = .11
H4b.2: β = .44
H1a.1: p = .132
H1a.2: p < .001
H1b.1: p = .080
H1b.2: p < .001
H2a.1: p = .045
H2a.2: p < .001
H2b.1: p = .057
H2b.2: p = .001
H3a.1: p = .025
H3a.2: p = .666
H3b.1: p = .004
H3b.2: p = .040
H4a.1: p = .083
H4a.2: p = .189
H4b.1: p = .209
H4b.2: p < .001
Attachment - Experiences in Close Relationships Scale - Short Form (12 items; Wei et al., 2007)
Satisfaction - Quality of Marriage Index (7 items; Norton, 1983)
Commitment - Investment Model Scale (7 items; Rusbult, Martz, & Agnew, 1998)
3 132 dyads (264 individuals) H1a.1: β = .03
H1a.2: β = −.44
H1b.1: β = .05
H1b.2: β = −.50
H1c.1: β = .03
H1c.2: β = −.54
H2a.1: β = −.04
H2a.2: β = −.17
H2b.1: β = .08
H2b.2: β = −.27
H2c.1: β = −.04
H2c.2: β = −.06
H3a.1: β = .01
H3a.2: β = −.20
H3b.1: β = −.06
H3b.2: β = −.20
H3c.1: β = .11
H3c.2: β = −.39
H4a.1: β = −.04
H4a.2: β = .27
H4b.1: β = .10
H4b.2: β = .20
H4c.1: β = −.18
H4c.2: β = .40
H1a.1: p = .600
H1a.2: p < .001
H1b.1: p = .350
H1b.2: p < .001
H1c.1: p = .594
H1c.2: p < .001
H2a.1: p = .459
H2a.2: p = .008
H2b.1: p = .169
H2b.2: p < .001
H2c.1: p = .472
H2c.2: p = .308
H3a.1: p = .945
H3a.2: p = .017
H3b.1: p = .411
H3b.2: p = .024
H3c.1: p = .128
H3c.2 p < .001
H4a.1: p = .574
H4a.2: p = .002
H4b.1: p = .221
H4b.2: p = .018
H4c.1: p = .012
H4c.2: p < .001
Attachment - Experiences in Close Relationships Scale - Short Form (12 items; Wei et al., 2007)
Satisfaction - Investment Model Scale (5 items; Rusbult, Martz, & Agnew, 1998)
Commitment - Investment Model Scale (7 items; Rusbult, Martz, & Agnew, 1998)
Responsiveness - Responsiveness Given Scale (6 items; Canevello & Crocker, 2010)

Note: Deviations from what was reported in the registered report are highlighted in italics. These include the following: a) in Rodriguez et al. (2019), there were hypotheses for the effects of partner attachment on responsiveness discrepancies that were not extracted or preregistered for analysis; b) changing reporting from partially standardized estimates for the discrepancy outcomes in the original data extracted from Rodriguez et al. (2019) to fully standardized estimates; and c) adjusting estimates and p-values for several erroneously extracted original studies. None of these changed the sign nor significance of estimates with the exception of H2B from Study 2 of Nguyen et al. (2018), which was originally reported as statistically significant. These deviations (along with all study deviations) are also reported in the “Summary of Pre-Registration Changes” document and the main text of the manuscript.

TABLE 2.

Results of registered report

Citation Study/sample number Original effect size Original significance RR effect size RR significance Replication rates: Replicating estimates/significant estimates from original reports (%) Replication rates: Replicating estimates/not statistically significant estimates from original reports (%) Hypothesis/research question Other notes
Baker et al. (under review) 1 H1a: β = .55
H1b: β = −.26
H1c: β = .10
RQ2a: β = .004
RQb: β = −.08
RQc: β = .05
H1a: p < .001
H1b: p < .001
H1c: p < .001
RQa: p = .899
RQb: p = .002
RQc: p = .076
H1a: β = .54
H1b: β = −.31
H1c: β = .18
RQ2a: β = −.01
RQb: β = −.09
RQc: β = −.04
H1a: p < .001
H1b: p < .001
H1c: p < .001
RQa: p = .707
RQb: p < .001
RQc: p = .220
4/4 (100%) 2/2 (100%) H1: Actor (a) satisfaction, (b) quality of alternatives, and (c) investments will positively (a and c) and negatively (b) predict commitment.
RQ1: Explore the roles of partners’ (a) satisfaction, (b) quality of alternatives, and (c) investments as predictors of actor commitment.
Baker et al. (2018) 1 H1: β = .61
H2a: β = .31
H2b: β = .08
RQ1: β = .10
RQ2a: β = −.04
RQ2b: β = .14
RQ3a.1: β = .19
RQ3b.1: β = −.004
RQ3c.1: β = −.02
RQ3d.1: β = .03
RQ3a.2: β = .05
RQ3b.2: β = .01
RQ3c.2: β = .08
RQ3d.2: β = .01
H1: p < .001
H2a: p < .001
H2b: p = .048
RQ1: p = .001
RQ2a: p = .313
RQ2b: p = .001
RQ3a.1: [0.14, 0.24]
RQ3b.1: [−0.01, 0.004]
RQ3c.1: [−0.07, 0.02]
RQ3d.1: [0.01, 0.05]
RQ3a.2: [0.001, 0.10]
RQ3b.2: [0.004, 0.04]
RQ3c.2: [0.03, 0.13]
RQ3d.2: [<0.001, 0.002]
H1: β = .45
H2a: β = .34
H2b: β = .03
RQ1: β = .25
RQ2a: β = −.02
RQ2b: β = .24
RQ3a.1: β = .15
RQ3b.1: β = −.01
RQ3c.1: β = −.01
RQ3d.1: β = .08
RQ3a.2: β = .01
RQ3b.2: β = .06
RQ3c.2: β = .11
RQ3d.2: β = −.01
H1: p < .001
H2a: p < .001
H2b: p = .493
RQ1: p < .001
RQ2a: p = .583
RQ2b: p < .001
RQ3a.1: [0.11, 0.20]
RQ3b.1: [−0.03, 0.01]
RQ3c.1: [−0.05, 0.03]
RQ3d.1: [0.05, 0.12]
RQ3a.2: [−0.02, 0.05]
RQ3b.2: [0.04, 0.09]
RQ3c.2: [0.07, 0.15]
RQ3d.2: [−0.05, 0.03
8/11 (72.73%) 3/3 (100%) H1: Actor self-determination will predict actor empathy.
H2: Actor empathy will predict actor affection (a) given and (b) received.
RQ1: Partner self-determination will predict actor empathy.
RQ2: Partner empathy will predict actor affection (a) given and (b) received.
RQ3: Test the (a) actor-actor, (b) partner-partner, (c) actor-partner, and (d) partner-actor indirect effects for (.1) affection given and (.2) affection received
95% confidence intervals for indirect effects are given because RMediation was used to estimate indirect effects
Haddad et al. (2016) 1 H1: β = .12
H2: β = .18
H1: p = .062
H2: p = .006
H1: β = .24
H2: β = .07
H1: p < .001
H2: p = .054
0/1 (0%) 0/1 (0%) H1: Actor fear of being single will predict actor drinking to cope.
H2: Partner fear of being single will predict actor drinking to cope.
These analyses were conducted controlling for drinks per week.
Hadden et al. (2018) 3 H1a: β = .18
H1b: β = .30
H2a: β = .24
H2b: β = .21
H1a: p = .008
H1b: p < .001
H2a: p < .001
H2b: p = .001
H1a: β = .29
H1b: β = .34
H2a: β = .17
H2b: β = .16
H1a: p < .001
H1b: p < .001
H2a: p = .002
H2b: p = .001
4/4 (100%) 0/0 (N/A) H1: Actor relationship autonomy will positively predict actor (a) forgiveness, (b) accommodation, (c) willingness to forgive, and (d) forgiveness for a specific transgression.
H2: Partners relationship autonomy will positively predict actor (a) forgiveness, (b) accommodation, (c) willingness to forgive, and (d) forgiveness for a specific transgression.
H3: Partners relationship autonomy will predict actor perceptions of their partners’ (a) willingness to forgive and (b) forgiveness for a specific transgression.
The subscales of the forgiveness and accommodation measures were also tested in the manuscript but are not presented here. We also controlled for commitment in all analyses.
4 H1c: β = .23
H1d: β = .15
H2c: β = .21
H2d: β = .12
H3a: β = .15
H3b: β = .04
H1c: p < .001
H1d: p = .010
H2c: p < .001
H2d: p = .036
H3a: p = .014
H3b: p = .548
H1c: β = .36
H1d: β = .21
H2c: β = .09
H2d: β = .21
H3a: β = .22
H3b: β = .20
H1c: p < .001
H1d: p < .001
H2c: p = .047
H2d: p < .001
H3a: p < .001
H3b: p < .001
5/5 (100%) 0/1 (0%)
Hadden et al. (2016) 1 H1a: β = .20
H1b: β = −.14
H1c: β = −.13
H2: β = −.20
H3a.1: β = .02
H3a.2: β = −.27
H3b.1: β = −.03
H3b.2: β = −.02
H3c.1: β = −.18
H3c.2: β = .07
H1a: p < .001
H1b: p = .030
H1c: p = .049
H2: p < .001
H3a.1: p = .707
H3a.2: p < .001
H3b.1: p = .608
H3b.2: p = .764
H3c.1: p = .006
H3c.2: p = .319
H1a: β = .05
H1b: β = −.09
H1c: β = .02
H2: β = −.11
H3a.1: β = −.03
H3a.2: β = −.14
H3b.1: β = −.07
H3b.2: β = −.12
H3c.1: β = −.04
H3c.2: β = −.09
H1a: p = .034
H1b: p = .002
H1c: p = .482
H2: p < .001
H3a.1: p = .316
H3a.2: p < .001
H3b.1: p = .017
H3b.2: p = .001
H3c.1: p = .122
H3c.2: p = .006
4/6 (66.67%) 1/4 (25%) H1: Partner anxious attachment will (a) positively predict actor relatedness, (b) negatively predict actor autonomy, and (c) negatively predict actor competence.
H2: Partner avoidant attachment will negatively predict actor relatedness.
H3: Actor (.1) anxious and (.2) avoidant attachment will negatively predict actor (a) relatedness, (b) autonomy, and (c) competence
We controlled for the other needs in all analyses. For example, when examining autonomy as the outcome, we controlled for both competence and relatedness.
2 H1a: β = .08
H1b: β = −.09
H1c: β = .002
H2: β = −.14
H3a.1: β = −.12
H3a.2: β = −.31
H3b.1: β = −.01
H3b.2: β = −.31
H3c.1: β = .03
H3c.2: β = .02
H1a: p = .074
H1b: p = .072
H1c: p = .998
H2: p = .017
H3a.1: p = .010
H3a.2: p < .001
H3b.1: p = .802
H3b.2: p < .001
H3c.1: p = .530
H3c.2: p = .803
See S1 3/4 (75%) 2/6 (33.33%)
Nguyen et al. (2018) 1 H1a: β = .58
H1b: β = .13
H2a: β = .52
H2b: β = .09
RQ1a: β = .30
RQ1b: β = .01
RQ1c: β = .05
RQ1d: β = .07
H1a: p < .001
H1b: p = .001
H2a: p < .001
H2b: p = .036
RQ1a: p < .001
RQ1b: p = .077
RQ1c: p = .038
RQ1d: p = .002
H1a: β = .54
H1b: β = .28
H2a: β = .51
H2b: β = .10
RQ1a: β = .27
RQ1b: β = .03
RQ1c: β = .05
RQ1d: β = .14
H1a: p < .001
H1b: p < .001
H2a: p < .001
H2b: p = .005
RQ1a: [0.22, 0.32]
RQ1b: [0.01, 0.05]
RQ1c: [0.02, 0.09]
RQ1d: [0.10, 0.18]
7/7 (100%) 0/1 (0%) H1: (a) actor and (b) partner internal motivation will predict greater need fulfillment.
H2: (a) actor and (b) partner need fulfillment will predict greater trust.
RQ1: Test the (a) actor-actor, (b) partner-partner, (c) actor-partner, and (d) partner-actor indirect effects.
2 H1a: β = .71
H1b: β = .16
H2a: β = .56
H2b: β = .09
RQ1a: β = .40
RQ1b: β = .01
RQ1c: β = .06
RQ1d: β = .09
H1a: p < .001
H1b: p < .001
H2a: p < .001
H2b: p = .066
RQ1a: p < .001
RQ1b: p = .077
RQ1c: p = .066
RQ1d: p < .001
H1a: β = .53
H1b: β = .31
H2a: β = .51
H2b: β = .11
RQ1a: β = .27
RQ1b: β = .03
RQ1c: β = .06
RQ1d: β = .16
H1a: p < .001
H1b: p < .001
H2a: p < .001
H2b: p = .016
RQ1a: [0.22, 0.33]
RQ1b: [0.01, 0.06]
RQ1c: [0.01, 0.11]
RQ1d: [0.12, 0.20]
5/5 (100%) 0/3 (0%)
Rodriguez et al. (2019) 1 H1a.1: β = −.07
H1a.2: β = −.47
H2a.1: β = .01
H2a.2: β = −.25
H3a.1: β = −.15
H3a.2: β = −.28
H4a.1: β = .11
H4a.2: β = .26
H1a.1: p = .245
H1a.2: p < .001
H2a.1: p = .832
H2a.2: p < .001
H3a.1: p = .019
H3a.2: p < .001
H4a.1: p = .085
H4a.2: p < .001
H1a.1: β = −.08
H1a.2: β = −.46
H2a.1: β = −.03
H2a.2: β = −.29
H3a.1: β = −.04
H3a.2: β = −.30
H4a.1: β = .003
H4a.2: β = .34
H1a.1: p = .011
H1a.2: p < .001
H2a.1: p = .291
H2a.2: p < .001
H3a.1: p = .437
H3a.2: p < .001
H4a.1: p = .957
H4a.2: p < .001
4/5 (80%) 2/3 (66.67%) H1: Actor attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive.
H2: Partner attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive.
H3: Actor attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as less (a) satisfied, (b) committed, and (c) responsive than they report being.
H4: Partner attachment (.1) anxiety and (.2) avoidance will predict perceiving partners as more (a) satisfied and (b) committed than they report being.
H4c: Partner attachment (.1) anxiety will predict perceiving partners as less responsive than they report being. Partner attachment avoidance (.2) will predict perceiving partners as more responsive than they report being.
All outcome variables assessed in this study were used in their traditional forms for H1 and H2. For H3 and H4, analog measures assessing perceptions of the partner were also administered. From these two measures, standardized discrepancy scores were created. Specifically, all scale scores for individual and partner perception measures were converted to Z-scores; then, standardized scores of partner reports were subtracted from standardized scores of actor perception (e.g., Z_diff = Z_(actor perception of partner satisfaction)-Z_(partner-reported satisfaction)).
2 H1a.1: β = −.12
H1a.2: β = −.38
H1b.1: β = −.14
H1b.2: β = −.39
H2a.1: β = .16
H2a.2: β = −.38
H2b.1: β = .16
H2b.2: β = −.28
H3a.1: β = −.21
H3a.2: β = −.04
H3b.1: β = −.26
H3b.2: β = −.19
H4a.1: β = .16
H4a.2: β = .13
H4b.1: β = .11
H4b.2: β = .44
H1a.1: p = .132
H1a.2: p < .001
H1b.1: p = .080
H1b.2: p < .001
H2a.1: p = .045
H2a.2: p < .001
H2b.1: p = .057
H2b.2: p = .001
H3a.1: p = .025
H3a.2: p = .666
H3b.1: p = .004
H3b.2: p = .040
H4a.1: p = .083
H4a.2: p = .189
H4b.1: p = .209
H4b.2: p < .001
H1a.1: β = −.06
H1a.2: β = −.39
H1b.1: β = −.05
H1b.2: β = −.40
H2a.1: β = −.02
H2a.2: β = −.28
H2b.1: β = .09
H2b.2: β = −.45
H3a.1: β = −.10
H3a.2: β = −.19
H3b.1: β = −.07
H3b.2: β = −.24
H4a.1: β = .04
H4a.2: β = .24
H4b.1: β = .04
H4b.2: β = .21
H1a.1: p = .058
H1a.2: p < .001
H1b.1: p = .140
H1b.2: p < .001
H2a.1: p = .653
H2a.2: p < .001
H2b.1: p = .005
H2b.2: p < .001
H3a.1: p = .037
H3a.2: p = .001
H3b.1: p = .154
H3b.2: p < .001
H4a.1: p = .461
H4a.2: p < .001
H4b.1: p = .356
H4b.2: p < .001
7/9 (77.78%) 4/7 (57.14%)
3 H1a.1: β = .03
H1a.2: β = −.44
H1b.1: β = .05
H1b.2: β = −.50
H1c.1: β = .03
H1c.2: β = −.54
H2a.1: β = −.04
H2a.2: β = −.17
H2b.1: β = .08
H2b.2: β = −.27
H2c.1: β = −.04
H2c.2: β = −.06
H3a.1: β = .01
H3a.2: β = −.20
H3b.1: β = −.06
H3b.2: β = −.20
H3c.1: β = .11
H3c.2: β = −.39
H4a.1: β = −.04
H4a.2: β = .27
H4b.1: β = .10
H4b.2: β = .20
H4c.1: β = −.18
H4c.2: β = .40
H1a.1: p = .600
H1a.2: p < .001
H1b.1: p = .350
H1b.2: p < .001
H1c.1: p = .594
H1c.2: p < .001
H2a.1: p = .459
H2a.2: p = .008
H2b.1: p = .169
H2b.2: p < .001
H2c.1: p = .472
H2c.2: p = .308
H3a.1: p = .945
H3a.2: p = .017
H3b.1: p = .411
H3b.2: p = .024
H3c.1: p = .128
H3c.2 p < .001
H4a.1: p = .574
H4a.2: p = .002
H4b.1: p = .221
H4b.2: p = .018
H4c.1: p = .012
H4c.2: p < .001
H1a.1: β = −.04
H1a.2: β = −.38
H1b.1: β = see S2
H1b.2: β = see S2
H1c.1: β = −.10
H1c.2: β = −.34
H2a.1: β = −.01
H2a.2: β = −.35
H2b.1: β = see S2
H2b.2: β = see S2
H2c.1: β = .06
H2c.2: β = −.33
H3a.1: β = −.06
H3a.2: β = −.17
H3b.1: β = see S2
H3b.2: β = see S2
H3c.1: β = −.13
H3c.2: β = −.10
H4a.1: β = .03
H4a.2: β = .19
H4b.1: β = see S2
H4b.2: β = see S2
H4c.1: β = −.06
H4c.2: β = .23
H1a.1: p = .231
H1a.2: p < .001
H1b.1: p = see S2
H1b.2: p = see S2
H1c.1: p = .005
H1c.2: p < .001
H2a.1: p = .827
H2a.2: p < .001
H2b.1: p = see S2
H2b.2: p = see S2
H2c.1: p = .088
H2c.2: p < .001
H3a.1: p = .200
H3a.2: p = .001
H3b.1: p = see S2
H3b.2: p = see S2
H3c.1: p = .007
H3c.2 p = .054
H4a.1: p = .501
H4a.2: p = .001
H4b.1: p = see S2
H4b.2: p = see S2
H4c.1: p = .191
H4c.2: p < .001
10/12 (83.33%) 8/12 (66.67%)

Note: When multiple original studies reported using the same measures, those measures were only assessed once in the replication study. In order to make this clear, results were not presented more than once in the “RR Effect Size” and “RR Significance” columns. Instead, there are references to the place in which the results of the replication study were first reported. The first “Replication Rates” column reflects the proportion and percentage of originally statistically significant estimates that were statistically significant and in the same direction in the replication study. This was our preregistered means of assessing replication. The second “Replication Rates” column reflects the proportion and percentage of originally statistically null estimates that were statistically null in the replication study. This means of assessing replication was not preregistered. RR = Registered report. Deviations from what was reported in the registered report prior to data collection are highlighted in italics. These include the following: a) In Rodriguez et al. (2019), there were hypotheses for the effects of partner attachment on responsiveness discrepancies that were not extracted or preregistered for analysis; b) changing reporting from partially standardized estimates for the discrepancy outcomes in the original data extracted from Rodriguez et al. (2019) to fully standardized estimates; and c) adjusting estimates and p-values for several erroneously extracted original studies. None of these changed the sign nor significance of estimates with the exception of H2B from Study 2 of Nguyen et al. (2018), which was originally reported as statistically significant. These deviations (along with all study deviations) are also reported in the “Summary of Pre-Registration Changes” document and the main text of the manuscript.

1. Errors in preregistered analysis syntax or survey programming

There were five errors with respect to preregistered analysis syntax or survey programming. First, 22 participants in the actor survey and all participants in the partner survey saw response options for the first quality control question but did not see the text of the question and therefore responded randomly instead of as instructed. All participants who did not see the question text were given credit for answering the quality control question correctly. Second, we included exclusions based on check questions and relationship length in our preregistration document but not our preregistered code. We applied these exclusions in line with our preregistration document. Third, perception of partner willingness to forgive had an extra response option, which we coded to reflect the response option that was included twice. Fourth, item 11 of the basic empathy scale should be reverse-coded but was not specified as such in the preregistered syntax. We reverse coded that item for the present report. Fifth, the Quality of Marriage Index is meant to be scored as a simple sum, but we preregistered its scoring by standardizing each variable and taking the mean. We created the simple sum score for the present report.

2. Errors in extraction from original studies

There were two errors with respect to extraction from the original studies. First, Rodriguez et al. (2019) included hypotheses for the effects of partner attachment on discrepancies in responsiveness scores, but they were not included in our preregistration document or syntax. Those hypotheses have been included in the present report. Second, p-values and standardized betas were misreported for some effects from original studies. We have systematically checked those studies and updated the values. One p-value change affected significance: H2B from Study 2 of Nguyen et al. (2018) was originally reported as statistically significant (p = .006) but should have been reported as null (p = .066). No changes altered whether estimates were positive or negative. These are all highlighted in italics in Tables 1 and 2.

3. Changes in the design due to Qualtrics Panels data collection

There were two changes with respect to design due to Qualtrics Panels data collection. First, the number of dyads collected was larger than intended. To the best of our knowledge, this was intentional on the part of the Qualtrics Panels to account for people who did not meet inclusion criteria. Second, couple data were collected in a single survey programmed into Qualtrics with a separation in the middle, instead of separate surveys for actors and partners as we had planned. This resulted in changes to syntax for cleaning data and dataset transformation (i.e., data needed to be translated from dyad format to pairwise format instead of individual format to dyad format as planned).

3.4 |. Preregistered replication summary

Some of the results replicated here included multiple tests in the original reports but only single tests in this replication attempt (Hadden, Rodriguez, Knee, DiBello, & Baker, 2016; Rodriguez et al., 2019). For instance, Hadden et al. (2016) reported two separate studies that tested identical associations between attachment and need satisfaction using the same measures. In the present replication, these measures were only administered to participants once but were used to estimate replication rates for both studies in Hadden et al. (2016). This report treats these as unique tests (i.e., the number of tests reported reflects the number of statistically significant effects from the original reports). Of 73 statistically significant results (i.e., p < .05) in original reports, 61 were statistically significant and in the same direction in the present replication (83.56%; see Table 2). Of 12 results that could not be replicated, all 12 (100%) did not reach statistical significance (i.e., null results). None of the results that could not be replicated were statistically significant and in the opposite direction of original findings.

3.5 |. Non-preregistered exploratory summary

3.5.1 |. Null result and combined replication rates

Of 43 statistically null results (i.e., p ≥ .05) in original reports, 22 were not statistically significant in the present replication (51.16%; see Table 2). When combining these two estimates we see that, of 116 estimates from original reports (both statistically significant and not statistically significant), 83 either continued to be statistically significant (and in the same direction) or remained null (71.55%). Correlations between original report effect sizes and replication report effect sizes were high (rpearson = 0.90, rspearman = 0.89).10

4 |. DISCUSSION

The aim of the present project was to present and reflect on what is (to our knowledge) a new method of replication effort: replicating a large body of one’s own work that used a particular methodology in an RR format. The original results that were replicated here came from any formal research dissemination (posters, symposia, and manuscripts) of a cross-sectional dyadic sample that had passed any form of peer review and on which the present manuscript’s first author was part of the authorship team. Below, we highlight the experience of this replication method in terms of its implications for the research team performing the replication, for science broadly, and for relationship science specifically.

4.1 |. Research team

One advantage of the current approach (replicating our own work) is that most of the authors from the original work participated in the current replication effort. Thus, this collective effort minimized the potential of reprisals from researchers whose work is being replicated in other forms. Thoughtful critiques and reprisals are not the same thing and should not be confused. Nonetheless, risk of reprisals is possible, and replicating one’s own work is an intuitive way to reduce that risk.

Another reason for replicating our own work was to identify findings that we would like to build upon in the future. This is particularly salient for the more junior members of the authorship team as ECRs may have more laterality to choose exactly what directions their career-spanning research will take. Although the findings of this replication effort were well-powered (0.8 for the smallest originally observed effects but much higher for many original estimates), this attempt should not be viewed as the final word on whether these effects exist. Instead, these findings should be interpreted along with the original studies and other relevant work. This is particularly true in cases where multiple original studies replicated the same effects. Still, this replication effort clearly indicates that some findings may be more easily and readily replicable, and the increased power for this attempt provides considerably more precise estimates of effect size.

Beyond deciding what specific effects to build on, we believe the present effort has encouraged an increased meta-theoretical focus on better scientific practices among the research team. To conduct research and publish any manuscript, the research team must carry out a great deal of writing, reading, discussing, and presenting at conferences. Given the focus of this manuscript, the research team made up of undergraduate students through full professors has committed at least a portion of this writing, reading, discussing, and presenting to good scientific practice, which we think has likely percolated into our work on other projects. As one exemplar, the team has discussed needing to collect much larger dyadic samples than we typically did prior to this work. Furthermore, the process of conducting, and results associated with, this replication represents a model for researchers to follow. The members of this team were committed to this replication effort, regardless of the ultimate replication or nonreplication of results. This point reminds us that, as scientists, it is our job to be critical consumers of not only the work of others but also of our own.

Of course, the implications of this work are not purely positive for the research team. For instance, the project did take time away from other, more novel, theory-focused work. From the beginning of our first draft to presentation, the project took approximately 22 months. Although this may not be a particularly long time to go through the conceptualization, data collection, writing, and peer review process for a manuscript, it may be considered a rather long time to complete a single cross-sectional study.

4.2 |. Implications for science

One thing that makes this method interesting is that it selects studies in a way that is fundamentally different from the ways that many past replication efforts have selected studies. Our work differed from this past work in that it did not make use of laboratory experiments (Hardwicke & Ioannidis, 2018), was not particularly brief to administer (Ebersole et al., 2016; Klein et al., 2014), and did not make use of a particularly simple design (Camerer et al., 2018; Ebersole et al., 2016). Instead, it took place online, was nonexperimental, required nearly an hour, and included data from both members of a romantic dyad. It is difficult to generalize the findings of this registered report to the wider field, but the rates of replication (and the correlation between original and replicated effect sizes) did appear to be quite a bit higher (larger) than several large-scale replication efforts (Ebersole et al., 2016; Open Science Collaboration, 2015). At least one replication effort that included involvement of the original authors appeared to find rates of replication that were still smaller than, but closer to, the present rates (Camerer et al., 2018). However, other replication efforts that included the involvement of the original authors found poorer replication rates (Hagger & Chatzisarantis, 2016b; Klein et al., 2019). The involvement of the original authors was not as extensive as in the present investigation; therefore, we cannot rule out extensive original author involvement as the driver of our high replication rates.

Other differences between rates of replication in our effort and the efforts of others may suggest more moderators of replicability worth future study. Among these factors are methodology (e.g., dyadic; outside the laboratory), field (relationship science), and context of training for the ECR whose work was replicated (i.e., becoming a psychology student as Bem, 2011 was published and the first large-scale replication effort in psychological science was created; starting graduate school as journals focused on registered reports were being launched). If the last factor contributed substantially to higher replication rates, it could hold tremendous positive implications for the replicability of future psychological science.

In addition to large replication efforts, researchers may be interested in comparing the results of this registered report to those of other smaller-scale registered reports. A recent review of the full population of psychology-focused registered reports found that 43.66% of results revealed full or partial support for hypotheses (Scheel, Schijen, & Lakens, 2020). Perhaps more central to the findings here, that review also found that when the evaluated results were replication studies, registered reports revealed full or partial hypothesis support 39.02% of the time. This rate was compared to standard publishing formats that reported full or partial hypothesis support 96.05% of the time (100% for the four replications they reviewed in the standard publishing format). This study observed a replication rate of 83.56% (by preregistered criteria; 71.55% when null effect replication rates were included). We are hopeful that others will venture to replicate large bodies of their own work because a large sample of such replications may offer insights into which elements of various designs led to higher or lower replication rates.

4.3 |. Implications for relationship science

Although the current work’s implications for science more broadly apply to relationship science specifically, there are also implications that are unique to relationship science. First, there have been several efforts to estimate replication rates in various areas of psychology. For instance, social psychology has been estimated to have a replication rate of 26.42% (Open Science Collaboration, 2015), cognitive psychology 52.63% (Open Science Collaboration, 2015), and personality psychology 87.2% (Soto, 2019). The studies we replicated include investigations that would typically be considered both social and personality psychology, although the replication rate of 83.56% (or 71.55% if both statistically significant and null results are included) seems to more closely approximate personality psychology. Unfortunately, our study design was not representative of relationship research and therefore cannot give a likely indication of the replication rate of relationship research. Still, this begs the question “what is the replication rate of relationship research?” and suggests that lumping it into larger categories like social or personality psychology may not be appropriate.

We hope that our use of this replication approach will be placed in the category of larger-scale relationship science collaborations that are being leveraged to answer important relationship questions. Among these efforts are the Templeton Foundation-funded love consortium (A Scientific Approach to Living in Love, n.d.). To this end, we have shared this dataset online so that future researchers may use it to answer their own research questions or even ask the research questions we did in different ways (e.g., using the small telescopes approach [Simonsohn, 2015] or using equivalence range/confidence interval approaches [Maxwell et al., 2015]). Other researchers may also use these data to test our or other claims under a different set of assumptions or use future statistical methods that may not yet be widely known.

A final implication for relationship science may come with examination of the replicability of actor versus partner effects. This examination was not preregistered and not considered prior to data analysis of this registered report and should be interpreted accordingly. We began considering this question as a result of other collaborative work that may question the significant, unique contributions of partner effects (Joel et al., 2020). The proportion of statistically significant actor effects (29; 54.72%) relative to statistically significant partner effects (24; 45.28%) in original reports was relatively even. Actor effects were replicated at a rate of 25 of 29 (86.21%), whereas partner effects were replicated at a rate of 21 of 24 (87.50%).11 Again, we cannot state whether this pattern will extend beyond our work, but we look forward to more replication attempts that help discern whether similar patterns are found across dyadic replication efforts more broadly.

4.4 |. Limitations

We have tried to make clear the limitations of our broader methodology of replication throughout this manuscript but highlight limitations that are more specific to the present replication here. First, there were several errors in data extraction and deviations from the preregistration. We have tried to be transparent by highlighting these errors and deviations, as well as how we addressed them, in a devoted section in the main text of the article, in the syntax, in a devoted summary document, and in the tables. We also took several steps to try to ensure that we caught these errors by having the author who originally conducted the steps of data extraction, writing syntax, and matching hypotheses repeat those steps. Several other members of the authorship team then independently replicated elements of data extraction, writing syntax, and matching hypotheses to attempt to catch further errors. Still, some of the errors (e.g., those that pertain to survey programming) cannot be undone. We also cannot rule out further errors that were missed but hope that sharing our data and syntax will help future researchers to locate them if they do exist.

Second, contrary to each of our original datasets, both partners responded in a single survey (i.e., one survey per dyad) rather than two separate surveys (i.e., one survey per partner). We employed two separate surveys in the past to discourage completion of both surveys by a single partner but combined them here under advisement from the Qualtrics Panels team. It is possible that this led more individuals to complete the survey for both members of the dyad than the alternative format. This is a difficult problem to quantify and rule out, but two pieces of evidence suggest that it may not be a substantial limitation of this study. First, anecdotally, one participant reached out to indicate that she would not be able to participate because she was no longer in a romantic relationship. Second, we included a single question at the end of the study that stated

NOTE: This will NOT affect your eligibility for payment. You will be paid in full regardless of your answer to this question. We simply need to know if each part of the survey was filled out by the correct member of the couple for the sake of good science. Was the first half of the survey filled out by [Actor Name] and the second half of the survey filled out by [Partner Name]?

Analysis of this question revealed that 332 dyads (97.65%) indicated that the survey was completed as prescribed.12 Although this does not ensure that dishonest dyad members did not answer this question dishonestly, it may be a positive sign that (a) people were willing to answer this question honestly despite it being against initial instructions and (b) we lack evidence that acting out of line with instructions was widespread. Still, some evidence may suggest this is not an optimal way to assess improper responding (Chandler & Paolacci, 2017).

A third limitation of the present replication is that there was a substantial number of measures included in this study. Although the present survey is reflective of (and in some instances shorter than) the way we have typically conducted cross-sectional dyadic studies in the past, there are valid concerns that longer surveys can result in an increased burden for participants, which in turn can cause unreliable responding (Rolstad, Adler, & Rydén, 2011). Unfortunately, it is difficult for us to comment on how this compares to the length of surveys outside of our lab. Still, other than some unrealistically long response times (i.e., 17 hr), the average response time was 34.08 min, which is close to the prediction of 31.8 min per partner. Aligning with that prediction is important because surveys that take approximately as long as participants are told they will take may result in better-quality data (Galesic & Bosnjak, 2009). In addition, the order of measures was randomized (as was the order in several of the original studies); therefore, the more careless responses that may occur toward the end of the survey (Galesic & Bosnjak, 2009) should not systematically affect the present work. This limitation may prove particularly problematic for more established or more prolific researchers who have a much larger body of work that they are trying to replicate using the present methodology.

A fourth limitation of the present work is that, in some instances, a single test was used to examine the replicability of a research item that reported the use of the same measures in two different datasets (Hadden et al., 2016; Rodriguez et al., 2019). For instance, Rodriguez et al. (2019) reported two separate studies that tested identical associations between attachment and commitment using the same measures. In the present replication, these measures were only administered to participants once but were used to estimate replication rates for both studies in Rodriguez et al. (2019). This report treats these as unique tests (i.e., the number of tests reported reflects the number of statistically significant effects from the original reports), but they are not truly independent. This lack of independence makes interpretation of our replication rates more difficult because one association in the present replication factors into the replication rate multiple times (e.g., if a hypothesis was supported in the original work in two different studies (2/2 times) and was not in the current replication, it would have a reported replication rate of 0/2 in the current work when the overall replication rate inclusive of the current work would be 2/3). This was not the case when the original report assessed constructs in two studies with different measures (Nguyen et al., 2018). Likewise, some measures figured into the replicated work disproportionately. For example, many of the studies investigated the role of attachment using the Experiences in Close Relationships Short Form (Wei et al., 2007) or the role of self-determined relationship motivation using the Couples Motivation Questionnaire (Blais et al., 1990); thus, unusual responding across our sample on this measure could significantly skew findings. We do not have evidence that findings were skewed, but it should be considered given that it can have major implications for future replication efforts that are modeled on the present format. Perhaps this limitation is generalizable to other work, given that the nature of our field is carving out niche areas of research in which labs and researchers become experts.

4.5 |. Conclusion

The present work reflected on a new means of replication that may encourage deep self-inspection. This novel method involved replicating a large body of a researcher’s own work. Specifically, we examined all peer-reviewed, cross-sectional, dyadic studies to which the first author of this RR had contributed. As a result, this RR differed from other replication efforts by not focusing on findings that are noteworthy for their prestige or representativeness of the wider field. In reflecting on this methodology, we aimed to present a balanced perspective on the costs and benefits that may be associated with this style of replication for (a) the research team conducting the replication, (b) science broadly, and (c) relationship science specifically. We hope other researchers will use this reflection to improve upon our methodology when incorporating replications of their own work early and often into their own research process.

ACKNOWLEDGMENTS

Research reported in this publication was supported by the National Institute on Alcohol Abuse and Alcoholism (NIAAA) of the National Institutes of Health under award number F31AA026195 and the Robert L. Kane Endowed Chair in Long-Term Care and Aging. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. Participant payments for the present registered report were generously funded by the editorial team.

Funding information

National Institute on Alcohol Abuse and Alcoholism, Grant/Award Number: F31AA026195

Footnotes

1

It is also worth noting that single-author publications are rare, and the same benefits and costs may apply to other authors on the original work that is being replicated using this methodology.

2

We want to point out that passing a round of peer review or being published do not necessarily indicate finality or approval of the manuscript. For instance, reporting may change from a poster to a manuscript. Likewise, a manuscript might receive a promising revise and resubmit only to be rejected later in the review process. Along these same lines, even when scholarly work passes all rounds of peer review and is published, that publication does not reflect an endorsement of the quality of a paper.

3

Other coauthors of the original work being replicated might face similar risks if their work is not found to be replicable. We were particularly lucky to have so many collaborators who were willing to engage in this sort of open science. Still, if others are not as fortunate, they should carefully weigh whether this method will present the possibility of offending collaborators whose work might not be replicable.

4

Although the present method of replication is also not systematic across the field, it does represent systematic replication of work within the proposed parameters (e.g., all of a researcher’s dyadic, cross-sectional findings that have passed some form of peer review).

5

It is also possible that the divergence may simply be the result of one author differing from the rest of the community, but this pattern of findings will likely suggest that this would be an area worthy of future study.

6

Although this work may be representative of the ECR, it is not necessarily so. In addition to discrepancies that might arise with other methodologies the ECR may have employed that are not being replicated, there may be heterogeneity in replications (Gelman, 2019; Simmons & Simonsohn, 2019). Therefore, to have a truly representative estimate of a body of work, many replications should be conducted both by the researcher and by independent research teams across the range of methodologies and findings of the ECR.

7

In our estimates of power, we did not consider work outside of our own, a decision that may lead to an incorrect estimate of effect size. On the other hand, the effects we would be most likely to detect would be those in the published literature, which are notorious for being much larger than effects in the true population (Camerer et al., 2018; Ebersole et al., 2016; Klein et al., 2015; Open Science Collaboration, 2015). Given that this article is primarily a reflection piece on replicating an early career researcher’s work (rather than aiming to discern the theoretical implications of the replication attempts), we decided that reporting the effect sizes of the early career researcher was the most appropriate. Still, the questions of greatest interest should drive this decision for those researchers who conduct similar RRs in the future (i.e., the relative importance of the replicability of an individual researcher’s work vs. the theoretical implications of replication/nonreplication).

8

Although one way to conceptualize successful or unsuccessful replication is by examining statistically significant findings in the same direction as the original report, there are also others. For instance, findings that are not statistically significant in original reports and are not statistically significant in a replication may constitute a replication. Similarly, effect sizes may be used to determine replicability. A replication effect size that is statistically significantly different from an original report might be considered a failure to replicate even if both effects are statistically different from zero and in the same direction. These and other (e.g., Maxwell, Lau, & Howard, 2015; Simonsohn, 2015) methods of conceptualizing successful and unsuccessful replications have merits and may even answer distinct questions and therefore should be carefully considered in future replication efforts using this methodology.

9

We failed to mention this in the manuscript prior to data collection, but pre-registered that we would treat our data as indistinguishable by gender or sex and conducted our analyses accordingly.

10

We thank the editor for the suggestion to add this statistic to help summarize the present RR following precedent of other RRs (Open Science Collaboration, 2015). For an interesting discussion of this statistic and its interpretation, please see the main text of Lakens (2016) blog post and the associated comments at the bottom of the page. Calculation of these correlations can be found on OSF.

11

Some original effects were not included in these calculations because they constituted confounding between actor and partner effects (e.g., indirect effects made up of an actor a path and a partner b path; discrepancy outcomes that are made up of both actor and partner responses).

12

No results change substantially when only those 332 dyads are included in analyses.

DATA AVAILABILITY STATEMENT

As part of IARR’s encouragement of open research practices, the authors have provided the following information: This research was pre-registered. The aspects of the research that were pre-registered were: the hypotheses and data analytic plan. The registration was submitted with the study materials. Because this is an RR, we submitted our pre-registration to OSF following in-principle acceptance. The data used in the research are collected, and are posted on OSF. The materials used in the research are available. The materials have been posted on OSF along with the data and pre-registration.

REFERENCES

  1. A Scientific Approach to Living in Love: A Framework for the Future. (n.d.). John Templeton Foundation. Retrieved from https://www.templeton.org/grant/a-scientific-approach-to-living-in-love-a-framework-for-the-future [Google Scholar]
  2. Ackerman RA, & Kenny DA (2016). APIMPowerR: An interactive tool for Actor-Partner Interdependence Model power analysis. Retrieved from https://robert-ackerman.shinyapps.io/APIMPowerR/
  3. Ackerman RA, Ledermann T, & Kenny DA (2015). Power considerations for the Actor-Partner Interdependence Model.
  4. Al-Marzouki S, Roberts I, Evans S, & Marshall T (2008). Selective reporting in clinical trials: Analysis of trial protocols accepted by The Lancet. The Lancet, 372(9634), 201. 10.1016/S0140-6736(08)61060-0 [DOI] [PubMed] [Google Scholar]
  5. Baker ZG, Chopik WJ, & Nguyen TTT (under review). A dyadic investigation of romantic relationships and friendships within an Investment Model framework.
  6. Baker ZG, Nguyen TTT, Knee CR, & Petit WE (2018, July). Motivation and pro-relationship behaviors: Empathy as a mediator among friends and partners. Motivation and pro-relationship behaviors. Biennial Meeting of the International Association for Relationship Research, Fort Collins, CO. [Google Scholar]
  7. Baumeister RF, & Vohs KD (2016). Misguided effort with elusive implications. Perspectives on Psychological Science, 11(4), 574–575. 10.1177/1745691616652878 [DOI] [PubMed] [Google Scholar]
  8. Bem DJ (1972). In Berkowitz L (Ed.), Self-Perception Theory (Vol. 6, pp. 1–62). Cambridge, Massachusetts: Academic Press. 10.1016/S0065-2601(08)60024-6 [DOI] [Google Scholar]
  9. Bem DJ (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of Personality and Social Psychology, 100(3), 407–425. 10.1037/a0021524 [DOI] [PubMed] [Google Scholar]
  10. Blais MR, Sabourin S, Boucher C, & Vallerand RJ (1990). Toward a motivational model of couple happiness. Journal of Personality and Social Psychology, 59(5), 1021–1031. [Google Scholar]
  11. Buhrmester M, Kwang T, & Gosling SD (2011). Amazon’s mechanical Turk: A new source of inexpensive, yet high-quality, data? Perspectives on Psychological Science, 6(1), 3–5. 10.1177/1745691610393980 [DOI] [PubMed] [Google Scholar]
  12. Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J, Johannesson M, … Wu H (2018). Evaluating the replicability of social science experiments in nature and science between 2010 and 2015. Nature Human Behaviour, 2(9), 637–644. 10.1038/s41562-018-0399-z [DOI] [PubMed] [Google Scholar]
  13. Campbell L, Loving TJ, & LeBel EP (2014). Enhancing transparency of the research process to increase accuracy of findings: A guide for relationship researchers. Personal Relationships, 21(4), 531–545. [Google Scholar]
  14. Canevello A, & Crocker J (2010). Creating good relationships: Responsiveness, relationship quality, and interpersonal goals. Journal of Personality and Social Psychology, 99(1), 78–106. 10.1037/a0018186 [DOI] [PMC free article] [PubMed] [Google Scholar]
  15. Chambers CD, Dienes Z, McIntosh RD, Rotshtein P, & Willmes K (2015). Registered reports: Realigning incentives in scientific publishing. Cortex, 66, A1–A2. 10.1016/j.cortex.2015.03.022 [DOI] [PubMed] [Google Scholar]
  16. Chambers CD, & Sala SD (2013). Spliting the review process into two stages. https://www.elsevier.com/editors-update/story/peer-review/spliting-the-review-process-into-two-stages
  17. Chandler JJ, & Paolacci G (2017). Lie for a dime: When Most Prescreening responses are honest but Most study participants are impostors. Social Psychological and Personality Science, 8(5), 500–508. 10.1177/1948550617698203 [DOI] [Google Scholar]
  18. Chen B, Vansteenkiste M, Beyers W, Boone L, Deci EL, Van der Kaap-Deeder J, Duriez B, Lens W, Matos L, Mouratidis A, & others. (2014). Basic psychological need satisfaction, need frustration, and need strength across four cultures. Motivation and Emotion, 1–21. [Google Scholar]
  19. Cohen J (1992). A power primer. Psychological Bulletin, 112(1), 155–159. [DOI] [PubMed] [Google Scholar]
  20. Cooper ML (1994). Motivations for alcohol use among adolescents: Development and validation of a four-factor model. Psychological Assessment, 6(2), 117–128. 10.1037/1040-3590.6.2.117 [DOI] [Google Scholar]
  21. Ebersole CR, Atherton OE, Belanger AL, Skulborstad HM, Allen JM, Banks JB, … Nosek BA (2016). Many labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology, 67, 68–82. 10.1016/j.jesp.2015.10.012 [DOI] [Google Scholar]
  22. Enhancing Reproducibility through Rigor and Transparency j grants.nih.gov. (n.d.). Retrieved from https://grants.nih.gov/reproducibility/index.htm
  23. Faul F, Erdfelder E, Lang A-G, & Buchner A (2007). G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences. Behavior Research Methods, 39(2), 175–191. 10.3758/BF03193146 [DOI] [PubMed] [Google Scholar]
  24. Floyd K (2002). Human affection exchange: V. Attributes of the highly affectionate. Communication Quarterly, 50(2), 135–152. 10.1080/01463370209385653 [DOI] [Google Scholar]
  25. Finkel EJ, Eastwick PW, & Reis HT (2015). Best research practices in psychology: Illustrating epistemological and pragmatic considerations with the case of relationship science. Journal of Personality and Social Psychology, 108(2), 275–297. 10.1037/pspi0000007 [DOI] [PubMed] [Google Scholar]
  26. Galesic M, & Bosnjak M (2009). Effects of questionnaire length on participation and indicators of response quality in a web survey. Public Opinion Quarterly, 73(2), 349–360. 10.1093/poq/nfp031 [DOI] [Google Scholar]
  27. Gelman A (2019). A debate about effect-size variation in psychology: Simmons and Simonsohn; McShane, Böckenholt, and Hansen; Judd and Kenny; and Stanley and Doucouliagos. In Statistical Modeling, Causal Inference, and Social Science. Retrieved from. https://statmodeling.stat.columbia.edu/2019/04/30/a-debate-about-effect-size-variation-in-psychology-simmons-and-simonsohn-mcshane-bockenholt-and-hansen-and-judd-and-kenny/ [Google Scholar]
  28. Haddad S, Tou RYW, Baker ZG, Britton M, DiBello AM, Hadden BW, & Derrick JL (2016, July). All by myself: Does the fear of being single predict drinking to cope? Biennial Meeting of the International Association for Relationship Research, Toronto, ON. [Google Scholar]
  29. Hadden BW, Baker ZG, & Knee CR (2018). Let it go: Relationship autonomy predicts pro-relationship responses to partner transgressions. Journal of Personality, 86(5), 868–887. 10.1111/jopy.12362 [DOI] [PubMed] [Google Scholar]
  30. Hadden BW, Rodriguez LM, Knee CR, DiBello AM, & Baker ZG (2016). An actor–partner interdependence model of attachment and need fulfillment in romantic dyads. Social Psychological and Personality Science, 7(4), 349–357. 10.1177/1948550615623844 [DOI] [Google Scholar]
  31. Hagger MS, & Chatzisarantis NL (2016a). Commentary: Misguided effort with elusive implications, and sifting signal from noise with replication science. Frontiers in Psychology, 11(4), 546–573. 10.3389/fpsyg.2016.00621 [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Hagger MS, & Chatzisarantis NL (2016b). A multilab preregistered replication of the ego-depletion effect. Perspectives on Psychological Science, 11(4), 546–573. 10.1177/1745691616652873 [DOI] [PubMed] [Google Scholar]
  33. Hardwicke TE, & Ioannidis JPA (2018). Mapping the universe of registered reports. Nature Human Behaviour, 1, 793–796. 10.1038/s41562-018-0444-y [DOI] [PubMed] [Google Scholar]
  34. Hendrick SS (1988). A Generic Measure of Relationship Satisfaction. Journal of Marriage and Family, 50(1), 93–98. 10.2307/352430 [DOI] [Google Scholar]
  35. Joel S, Eastwick PW, Allison CJ, Arriaga XB, Baker ZG, Bar-Kalifa E, … Wolf S (2020). Machine learning uncovers the most robust self-report predictors of relationship quality across 43 longitudinal couples studies. Proceedings of the National Academy of Sciences, 117(32), 19061–19071. 10.1073/pnas.1917036117 [DOI] [PMC free article] [PubMed] [Google Scholar]
  36. Joel S, Eastwick PW, & Finkel EJ (2018). Open sharing of data on close relationships and other sensitive social psychological topics: Challenges, tools, and future directions. Advances in Methods and Practices in Psychological Science, 1(1), 86–94. 10.1177/2515245917744281 [DOI] [Google Scholar]
  37. Jolliffe D, & Farrington DP (2006). Development and validation of the Basic Empathy Scale. Journal of Adolescence, 29(4), 589–611. [DOI] [PubMed] [Google Scholar]
  38. Klein RA, Cook CL, Ebersole CR, Vitiello CA, Nosek BA, Chartier CR, … Ratliff KA (2019). Many Labs 4: Failure to Replicate Mortality Salience Effect With and Without Original Author Involvement. PsyArXiv. 10.31234/osf.io/vef2c [DOI] [Google Scholar]
  39. Klein RA, Ratliff KA, Vianello M, Adams RB, Bahník Š, Bernstein MJ, … Nosek BA (2014). Investigating variation in replicability. Social Psychology, 45(3), 142–152. 10.1027/1864-9335/a000178 [DOI] [Google Scholar]
  40. Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB, Alper S, & Friedman M (2015). Many labs 2: Investigating variation in replicability across sample and setting. Manuscript in Preparation. [Google Scholar]
  41. Lakens D (2016, January 29). The 20% Statistician: The correlation between original and replication effect sizes might be spurious. The 20% Statistician. Retrieved from http://daniellakens.blogspot.com/2016/01/the-correlation-between-original-and.html [Google Scholar]
  42. La Guardia JG, Ryan RM, Couchman CE, & Deci EL (2000). Within-person variation in security of attachment: A self-determination theory perspective on attachment, need fulfillment, and well-being. Journal of Personality and Social Psychology, 79(3), 367–384. 10.1037//0022-3514.79.3.367 [DOI] [PubMed] [Google Scholar]
  43. Lane SP, & Hennes EP (2018). Power struggles: Estimating sample size for multilevel relationships research. Journal of Social and Personal Relationships, 35(1), 7–31. 10.1177/0265407517710342 [DOI] [Google Scholar]
  44. Lindsay DS (2015). Replication in psychological science. Psychological Science, 0956797615616374, 26(12). [DOI] [PubMed] [Google Scholar]
  45. Maxwell SE, Lau MY, & Howard GS (2015). Is psychology suffering from a replication crisis? What does “failure to replicate” really mean? American Psychologist, 70(6), 487–498. 10.1037/a0039400 [DOI] [PubMed] [Google Scholar]
  46. Mehlenbacher AR (2019). Registered reports: Genre evolution and the research article. Written Communication, 36(1), 38–67. 10.1177/0741088318804534 [DOI] [Google Scholar]
  47. McCullough ME, Chris K, Sandage SJ, Worthington EL Jr., Brown SW, & Hight TL (1998). Interpersonal forgiving in close relationships: II. Theoretical elaboration and measurement. Journal of Personality and Social Psychology, 75(6), 1586–1603. 10.1037/0022-3514.75.6.1586 [DOI] [PubMed] [Google Scholar]
  48. Murray SL, Leder S, MacGregor JCD, Holmes JG, Pinkus RT, & Harris B (2009). Becoming irreplaceable: How comparisons to the partner’s alternatives differentially affect low and high self-esteem people. Journal of Experimental Social Psychology, 45(6), 1180–1191. 10.1016/j.jesp.2009.07.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
  49. Nguyen TTT, Baker ZG, & Knee CR (2018, July). Need fulfillment as a mediator in the relationship between motivation and trust. Biennial Meeting of the International Association for Relationship Research, Fort Collins, CO. [Google Scholar]
  50. Nosek BA, Ebersole CR, DeHaven A, & Mellor D (2018). The preregistration revolution. Proceedings of the National Academy of Sciences, 115(111), 2600–2606. 10.1073/pnas.1708274114 [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Nosek BA, Spies JR, & Motyl M (2012). Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science, 7(6), 615–631. [DOI] [PMC free article] [PubMed] [Google Scholar]
  52. Norton R (1983). Measuring Marital Quality: A Critical Look at the Dependent Variable. Journal of Marriage and Family, 45(1), 141–151. 10.2307/351302 [DOI] [Google Scholar]
  53. Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716. 10.1126/science.aac4716 [DOI] [PubMed] [Google Scholar]
  54. Popper K (2005). The logic of scientific discovery. New York, NY: Routledge. [Google Scholar]
  55. Rempel JK, Holmes JG, & Zanna MP (1985). Trust in close relationships. Journal of Personality and Social Psychology, 49(1), 95. [PubMed] [Google Scholar]
  56. Rodriguez LM, Fillo J, Hadden BW, Øverup CS, Baker ZG, & DiBello AM (2019). Do you see what I see? Actor and partner attachment shape biased perceptions of partners. Personality and Social Psychology Bulletin, 45(4), 587–602. 10.1177/0146167218791782 [DOI] [PMC free article] [PubMed] [Google Scholar]
  57. Rolstad S, Adler J, & Rydén A (2011). Response burden and questionnaire length: Is shorter better? A review and meta-analysis. Value in Health: The Journal of the International Society for Pharmacoeconomics and Outcomes Research, 14(8), 1101–1108. 10.1016/j.jval.2011.06.003 [DOI] [PubMed] [Google Scholar]
  58. Rusbult CE (2000). Measuring perpetrator amends and victim forgiveness in marital interactions [Unpublished instrument], University of North Carolina at Chapel Hill, Chapel Hill, NC. [Google Scholar]
  59. Rusbult CE, Martz JM, & Agnew CR (1998). The investment model scale: Measuring commitment level, satisfaction level, quality of alternatives, and investment size. Personal Relationships, 5(4), 357–387. [Google Scholar]
  60. Rusbult CE, Verette J, Whitney GA, Slovik LF, & Lipkus I (1991). Accommodation processes in close relationships: Theory and preliminary empirical evidence. Journal of Personality and Social Psychology, 60(1), 53–78. 10.1037/0022-3514.60.1.53 [DOI] [Google Scholar]
  61. Scheel AM, Schijen M, & Lakens D (2020). An excess of positive results: Comparing the standard Psychology literature with Registered Reports. Retrieved from 10.31234/osf.io/p6e9c [DOI] [Google Scholar]
  62. Schönbrodt F, Mellor D, & Bergmann C (2018). Academic job offers that mentioned open science.
  63. Schweinsberg M, Madan N, Vianello M, Sommer SA, Jordan J, Tierney W, … Uhlmann EL (2016). The pipeline project: Pre-publication independent replications of a single laboratory’s research pipeline. Journal of Experimental Social Psychology, 66, 55–67. 10.1016/j.jesp.2015.10.001 [DOI] [Google Scholar]
  64. Simmons JP, & Simonsohn U (2019, April 24). [76] Heterogeneity is replicable: Evidence from Maluma, MTurk, and Many Labs. Data Colada. Retrieved from http://datacolada.org/76 [Google Scholar]
  65. Simonsohn U (2015). Small telescopes: Detectability and the evaluation of replication results. Psychological Science, 26(5), 559–569. 10.1177/0956797614567341 [DOI] [PubMed] [Google Scholar]
  66. Soto CJ (2019). How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychological Science, 30(5), 711–727. 10.1177/0956797619831612 [DOI] [PubMed] [Google Scholar]
  67. Spielmann SS, MacDonald G, Maxwell JA, Joel S, Peragine D, Muise A, & Impett EA (2013). Settling for less out of fear of being single. Journal of Personality and Social Psychology, 105(6), 1049. [DOI] [PubMed] [Google Scholar]
  68. Tavris C, & Aronson E (2007). Self-justification in public and private spheres. The General Psychologist, 42, 4–7. [Google Scholar]
  69. The Religious Replication Project: Using Pre-registered Replications and Bayesian Statistics to Improve the Experimental Study of Religion. (n.d.). John Templeton Foundation. Retrieved from https://www.templeton.org/grant/the-religious-replication-project-using-pre-registered-replications-and-bayesian-statistics-to-improve-the-experimental-study-of-religion [Google Scholar]
  70. Tofighi D, & MacKinnon DP (2011). RMediation: An R package for mediation analysis confidence intervals. Behavior Research Methods, 43(3), 692–700. [DOI] [PMC free article] [PubMed] [Google Scholar]
  71. Wei M, Russell DW, Mallinckrodt B, & Vogel DL (2007). The Experiences in Close Relationship Scale (ECR)-short form: Reliability, validity, and factor structure. Journal of Personality Assessment, 88(2), 187–204. [DOI] [PubMed] [Google Scholar]
  72. Zacks JM, & Roediger HL III (2004). Setting up your lab and beginning a program of research. In The compleat academic: A career guide (pp. 135–152). Washington, DC: American Psychological Association. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

As part of IARR’s encouragement of open research practices, the authors have provided the following information: This research was pre-registered. The aspects of the research that were pre-registered were: the hypotheses and data analytic plan. The registration was submitted with the study materials. Because this is an RR, we submitted our pre-registration to OSF following in-principle acceptance. The data used in the research are collected, and are posted on OSF. The materials used in the research are available. The materials have been posted on OSF along with the data and pre-registration.

RESOURCES