Abstract
Statistical analyses are a crucial component of the biomedical research process and are necessary to draw inferences from biomedical research data. The application of sound statistical methodology is a prerequisite for publication in the American Heart Association (AHA) journal portfolio. The objective of this document is to summarize key aspects of statistical reporting that might be most relevant to the authors, reviewers, and readership of AHA journals. The AHA Scientific Publication Committee convened a task force to inventory existing statistical standards for publication in biomedical journals and to identify approaches suitable for the AHA journal portfolio. The experts on the task force were selected by the AHA Scientific Publication Committee, who identified 12 key topics that serve as the section headers for this document. For each topic, the members of the writing group identified relevant references and evaluated them as a resource to make the standards summarized herein. Each section was independently reviewed by an expert reviewer who was not part of the task force. Expert reviewers were also permitted to comment on other sections if they chose. Differences of opinion were adjudicated by consensus. The standards presented in this report are intended to serve as a guide for high-quality reporting of statistical analyses methods and results.
Keywords: biostatistics
The American Heart Association (AHA) journal portfolio includes Arteriosclerosis, Thrombosis, and Vascular Biology; Circulation; Circulation: Arrhythmia and Electrophysiology; Circulation: Genomic and Precision Medicine; Circulation: Cardiovascular Imaging; Circulation: Cardiovascular Interventions; Circulation: Cardiovascular Quality and Outcomes; Circulation: Heart Failure; Circulation Research; Hypertension; Journal of the American Heart Association; and Stroke. Collectively, these journals receive >20 000 manuscript submissions per year. The statistical analysis methods used in these submitted manuscripts vary in terms of approaches and sophistication, ranging from simple descriptive statistics (eg, counts and frequencies) to complex modeling (eg, multivariable regression, propensity score adjustment, instrumental variables, survival analyses, development of risk prediction models) and meta-analyses. Most original research manuscripts submitted to the AHA journal portfolio are reviewed by at least 1 statistical reviewer or a statistical editor before publication. The AHA Scientific Publication Committee oversees the AHA journal portfolio.
TASK FORCE CHARGE AND APPROACH
To harmonize expectations for reporting statistical analyses across journals, the AHA Scientific Publication Committee charged the task force with the following:
Identify core study designs and methodological topics that may be used to organize a structured list of suggestions for quality reporting of statistical analyses.
Evaluate available resources and guidelines relevant to the selected topics.
Generate a curated compendium of resources that will delineate key design domains and methodological challenges of direct relevance to manuscripts submitted to AHA journals. This compendium should consider existing guidelines and identify those most useful and relevant to these topics for the purpose of publication in AHA journals.
The goals of the recommendations for statistical reporting in cardiovascular medicine developed through deliberations of the task force are (1) to guide statistical reviewers in determining whether the data have been analyzed appropriately, (2) to provide best practices to authors as they report statistical analyses in papers submitted to AHA journals, and (3) to enhance the reader experience by helping them to understand the study design, analyses, and results. As a general rule, statistical reporting should be sufficiently detailed that a qualified statistician would be able to recreate the results if given the manuscript (including supplemental material) and the study data set.
We must also acknowledge that no set of reporting guidelines can be entirely independent of study design considerations. The intent of this document is principally to serve as a reporting guideline rather than a statistical or design tutorial, but any reporting guideline necessarily has some standards that are related to design and analytical choices.
Task force members were selected by the AHA Scientific Publication Committee, including 9 editors from the AHA journal portfolio. Members of the task force suggested section topics that pertained to either key study designs or key methodological topics; a group conference was used to winnow the list (choosing the most important, combining similar or related topics) to a final selection of 12 sections. A lead author from the task force was designated for each of the 12 sections on the basis of individual expertise of the task force members, with 1 to 2 secondary authors assigned to each section as readers. Each section was independently reviewed by an expert in the field who was not part of the task force, as selected by the lead author for each section. When necessary, differences of opinion were adjudicated by discussion between the chair (V.L.R.) and vice chair (A.D.A.) of the task force or with the section lead for a particular topic. Independent reviewers are listed in the Acknowledgments.
STRUCTURE OF THE RECOMMENDATIONS FOR STATISTICAL REPORTING
Twelve sections are presented. Some sections (such as Missing Data) are applicable to nearly all research, whereas others are more narrowly focused. There is intentional redundancy across sections in that some constructs are mentioned in each of the sections to which they apply for ease of reference. In addition, >1 section may apply to a given manuscript. The 12 sections are as follows:
General Standards
Observational Studies: Diagnostic Tests and Validation
Observational Studies: Clinical Prediction Models
Statistical Genetics
Randomized Controlled Trials
Systematic Reviews and Meta-Analyses
Survival Analyses
Bayesian Statistical Approaches
Missing Data
Correlated Data
Covariable Adjustment and Propensity Scores
Power and Sample Size Considerations
ACRONYMS (LISTED ALPHABETICALLY) AND CORRESPONDING SECTION IN ARTICLE
AHA American Heart Association (Introduction)
CONSORT Consolidated Standards of Reporting Trials (Section 5)
DELTA-2 Difference Elicitation in Trials (Section 12)
PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses (Section 6)
STARD Standards for Reporting of Diagnostic Accuracy Studies (Section 2)
1. GENERAL STANDARDS
This section is relevant to all manuscripts submitted to the AHA journal portfolio.
1.1. Existing Guidelines
The AHA journals encourage authors to follow relevant existing guidelines when applicable (https://www.aha-journals.org/research-guidelines). These guidelines are presented in the Table. Throughout the present document, we direct authors to said guidelines, highlighting the points most relevant to the statistical reviewers and editors of the AHA journal portfolio, as well as authors. Authors are asked to adhere to the reporting recommendations by the Enhancing the Quality and Transparency of Health Research network.1 We acknowledge the existence of other similar documents and encourage authors, readers, and reviewers to refer to them as well.2–5
Table.
Reporting Guidelines by Study Type
| Study type | Reporting guideline | Additional info |
|---|---|---|
| Study protocols | SPIRIT | PRISMA-P |
| Observational studies | STROBE | Extensions |
| Diagnostic/prognostic studies | STARD | TRIPOD |
| Randomized trials | CONSORT | Extensions |
| Systematic reviews | PRISMA | Extensions |
For more information, see the EQUATOR network.1 CONSORT indicates Consolidated Standards of Reporting Trials; EQUATOR, Enhancing the Quality and Transparency of Health Research; PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses; PRISMA-P, Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols; SPIRIT, Standard Protocol Items: Recommendations for Interventional Trials; STARD, Standards for Reporting of Diagnostic Accuracy Studies; STROBE, Strengthening the Reporting of Observational Studies in Epidemiology; and TRIPOD, Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis.
1.2. Methodological Considerations
1.2.1. Study Population
The study population must be described in detail, including dates, sites of recruitment, and how the patients/participants were selected (convenience sample, consecutive series, geographically defined population), with complete reporting of inclusion and exclusion criteria. Too often, the description is overly vague (eg, “the study cohort comprised 500 patients undergoing percutaneous coronary intervention at our institution”), leaving the reader lacking important context. Exclusion criteria should be individually identified with the frequency with which they are met to specifically illustrate how the cohort was generated from the source population at study onset to allow the identification of the final group included in analyses. Flow diagrams are strongly recommended to describe details of participant inclusions and exclusions leading to the final study population and can often be included in the manuscript supplement.
1.2.2. Statistical Methods
Statistical methods should be detailed, including models, comparisons, or tests with sufficient information to identify which method is being used. Many papers include vague statements such as “Continuous variables were analyzed using t tests; Wilcoxon tests were used as appropriate” or “The treatment groups were compared using propensity score analysis.” Such phrases are not sufficiently specific for the reviewer to identify which test has been used for each of the relevant analyses. The task force acknowledges that it may be challenging to fit this entirely in the Methods section; footnotes beneath tables and figures are often excellent ways to clarify what analyses are being reported. For any given hypothesis test, it should be readily clear to the reviewer which test was used to obtain the corresponding P value. For all multivariable models, full descriptions of all covariates considered for inclusion in the model are required.
1.2.3. Visualization
Whenever possible, exact data points should be visualized rather than summarized (eg, comparing means by use of bar charts).6 Very different distributions of data can lead to the same bar chart, although the implications can be quite different.6 The article by Weissgerber et al7 provides excellent examples and is recommended for authors as they consider the most appropriate way to display their data for clarity and purpose.
1.2.4. Categorical Variables
Categorical/ordinal variables should not be analyzed as continuous variables. For example, the New York Heart Association classification, which is reported as Class 1, 2, 3, or 4, should be analyzed as a categorical variable on an ordinal scale.
1.2.5. Continuous Variables
Continuous variables should be scaled in such a way that the estimates are interpretable for clinical use. It is difficult to interpret an odds ratio or hazard ratio for a 1-year increase in age in a given situation or for a 1-mm Hg increase in blood pressure. It makes more sense to scale this in terms of the change in risk per 10 years or per 10-mm Hg increase in blood pressure. All reports of continuous variables must include the scale used.
1.2.6. Associations
Avoid ranking variables in terms of their strength of association on the basis of the magnitude of their effect size when the variables are on different scales. If the odds ratio is 1.10 per 10 years for age but 1.15 for men versus women, that does not mean sex is a stronger predictor of the outcome than age. True cases in which variables are ranked to identify the strongest predictor of outcome are fairly rare, but if the researchers truly have a desire to make such statements, a form of standardization must be used to put the variables on the same scale.
1.2.7. P Values
Report exact P values (eg, P=0.03) for statistical tests rather than using symbols to denote significance categories such as “P<0.10; P<0.05; P<0.01.” Appropriate levels of precision are generally to report the P value with 2 significant digits unless the P value is <0.01, in which case 1 significant digit may be appropriate (eg, 0.002 is only 1 significant digit, but this is adequate rather than reporting 0.0021). Precise P values with 2 significant digits must be presented when unaccompanied by estimates of effect size and CI. More generally, it is always good practice to report an estimate of effect size and one of uncertainty rather than isolated P values.
Very low P values accompanied by point estimates of the effect and CIs may typically be reported as <0.001, unless a greater level of precision is called for because of the chosen significance level or corrections for multiple testing.
If the P value is very close to 1, report it as >0.99 rather than 1.00 unless the P value is actually equal to 1.
1.2.8. Multiple Comparison
Correction for multiple comparisons remains controversial and is context dependent according to the research objective and inferences drawn. If a manuscript includes multiple hypothesis tests, authors must either explicitly state that no corrections for multiple comparisons have been performed or name the method used to account for multiple comparisons.
1.2.9. Statistical Inference
Avoid the “absence of evidence=evidence of absence” error. If a hypothesis test includes a P value above the chosen α level, it is not correct to infer that there was no effect or no difference between groups.8 Authors should use appropriate language that reflects the meaning of a frequentist significance test; some excellent examples are available on the data methods website.9
1.2.10. Preregistration
Preregistration of statistical analysis plans is encouraged when practical. This is commonplace in clinical trials, but there is increasing momentum for more broadly registering the statistical analysis plan for all research studies. The Center for Open Science provides resources for those interested in preregistering any research study.9a
1.2.11. Share Data or Code
It is not always practical to share study data, and a full discussion of the arguments for and against mandatory data sharing lies beyond the scope of this document. That said, it is generally practical for study teams to share the statistical code used for their analyses, and if they are willing to do so, this can greatly facilitate statistical review by making it perfectly clear to the reviewer what the authors have actually done.
1.3. Data Visualization
No matter how sound the statistical methods, work will be judged heavily by how data are presented, and a large part of this burden is borne by figures and tables. Data visualization has become a prominent field in and of itself. However, herein, the focus is not on strategies for generating dynamic and arresting displays; rather, we emphasize 3 attributes of good data presentation: completeness, informativeness, and truthfulness.7
1.3.1. Completeness
Figures should have titles, and axes should be labeled. Units of measurements should be provided.
If symbols or colors convey meaning, then a legend should be provided, preferably within the figure itself rather than in the accompanying text.
Measures of uncertainty or dispersion such as error bars or shaded regions for CIs should be displayed on the graph or included in the table. We generally recommend SD over SE for clinical data because the SD represents the variation within the study data, whereas the SE is quantifying uncertainty in an estimate of the population mean. Whatever the measure of uncertainty, the figure legend should make clear what is being displayed.
1.3.2. Informativeness
Figures should maximize the amount of information presented about the underlying data distribution.
Figures illustrating summary measures (eg, bar graphs) may conceal potentially important information on multimodality, sample size, and presence of outliers. This is particularly problematic when sample size is small. Dramatically different data distributions can be consistent with the same bar graph.
Multiple graphing packages allow the ability to overlay individual data points onto a box-whisker or violin representation of the data, so that the reader can gauge how data are distributed.
1.3.3. Truthfulness
Many choices are available in deciding what data to show and how to represent them.
Some choices portray an exaggerated representation of underlying trends such as choosing a nonrepresentative slice of the data or truncating the axis to exaggerate the difference between groups/change in a linear trend. This practice should be avoided.
The narrative told by the figures should align with what is supported by an honest analysis of the complete data.
In the case of a statistical hypothesis test that is accompanied by a figure, note that it is insufficient to show only the P value and the graphical presentation; the appropriate summary statistics (group means and SDs, mean difference with CI, etc) should be presented as well. A side-by-side boxplot that shows a P value but no other summary statistics in the text omits key information.
2. OBSERVATIONAL STUDIES: DIAGNOSTIC TESTS AND VALIDATION
During recent decades, the number of diagnostic tests has rapidly increased. As for all new medical technologies, diagnostic tests should be thoroughly evaluated before their introduction into practice. We recommend that authors consult the STARD statement when planning a study of diagnostic accuracy.10
2.1. Specific Points of Emphasis for STARD
The analytical plan should clearly state the study hypothesis and goals and describe the study design.
The study design and methods section should include a clear description of the study cohort (inclusion and exclusion criteria) and any control group used.
The basis of the selection of the study population should be justified. The relation of the study population to the clinical population to which the test will be applied and whether the study population represents the spectrum of disease anticipated should be addressed.
A flow diagram presenting how the final study population was identified is advised.
Sample size calculations should document sufficient power for the goals of the study.
A careful description of the process of test performance and interpretation, the coding of results (including blinded versus unblinded interpretation, single versus multiple readers), and the definition of test abnormality is necessary. The prevalence of disease or test abnormality must be included.
The treatment of missing, uninterpretable, or equivocal data must be described.
The metrics used to define diagnostic accuracy usually include sensitivity, specificity, accuracy, positive and negative predictive values, and positive and negative likelihood ratios. Receiver-operating characteristics curves can be used, and partial receiver-operating characteristics should be considered when appropriate.11 These methods are suitable for case-control studies that sample on disease status.
For cohort studies, multivariable modeling is preferable to assess the incremental value of testing over pretest information.12
Potential sources of bias should be detailed, including those introduced by patient selection and inadequate sampling. The use of any bias correction method should be specified. Whether positive and negative results are verified by the same standard should be presented.
In the case of validation studies, the basis for and definition of the test or metric examined and the gold standard or criterion standard used must be clearly defined.
2.2. Interrater and Intrarater Variability
The selection of the patient sample for the determination of interrater and intrarater variability must be explained and supported by sample size calculations. The range of values and representation of normal values, as well as levels of abnormal values, should be described.
For imaging studies, the approach to handling variability in image quality should be mentioned.
According to the type of data used for the study metrics, options include κ statistics, intraclass correlation coefficients from a 1-way ANOVA, and SEMs.13 Bland-Altman analysis is usually helpful to show agreement between 2 methods or raters.14 The U statistic, invented by Hoeffding and described in Chapter 16 of Biostatistics for Biomedical Research, is also an option.15
3. OBSERVATIONAL STUDIES: CLINICAL PREDICTION MODELS
Clinical prediction models may be used to guide clinical decisions, to inform patients about the likely course of a disease, or to stratify patients by disease severity in clinical trials. The availability of electronic medical records and registries and the technical ability to link different sources of information have provided a fertile ground for the proliferation of clinical prediction models and risk scores. A recent systematic review identified >350 models for estimating risk of cardiovascular disease.16 Herein, we focus on traditional clinical prediction models (also sometimes referred to as prognostic models) using logistic regression—typically for short-term outcomes—or Cox regression—typically for long-term outcomes when time to event or censoring is important. However, it is important to mention that machine learning methods are increasingly being used and may perform very well, especially when large amounts of data are available.17,18 We recommend the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis guideline for reporting the development and validation of prediction models.19 Regardless of the analytical approach used, specific elements of importance are given in the following sections.
3.1. Variable Selection Approach
Three main approaches can be taken to select the variables for the clinical prediction model: (1) Variables can be chosen by expert opinion by prespecifying the exact predictors to be considered; (2) a stepwise method that relies on statistical criteria can be used to select optimal covariates for the model; or (3) a regularization method such as lasso or elastic nets20 can be used to fit the model by performing variable selection and controlling overfitting simultaneously. The last approach has been shown to be superior to stepwise21 regression, but stepwise methods and its variants are still frequently used in practice. Regardless of the approach chosen, the selection procedure must be clearly described.22
3.2. Linear Effects of Numeric Covariates
Flexible models should be considered for ordinal and continuous covariates that may have nonlinear effects; ideally, this should be prespecified rather than data driven. Whereas for ordinal variables with a small number of values (eg, number of previous pregnancies) it may be reasonable to consider a categorization of the variable (eg, 0 pregnancies, 1, >1), this should generally be avoided for continuous variables or ordinal data with a wide range of values. Two methodologies are useful for modeling nonlinear effects of a predictor: fractional polynomials23 and restricted cubic splines.24
3.3. Missing Data
Extending the existent approaches for variable selection to situations of missing data is an active area of research in statistics. Regardless of the approach selected, authors must address the amount of missing data and how missing data were handled in their model building procedures (see detailed guidance in Missing Data, Section 9).
3.4. Performance and Validation
It is necessary to assess the validity of the model and to assess its prediction accuracy. Evaluating the model in the same data used to build it will overestimate performance because the model has been optimized to those data. Thus, it is commonly recommended to evaluate it in a separate data set. Internal validation should be conducted together with the model development. A commonly used approach to this involves splitting the data set and using the first half to develop the model (also known as training data) and the other half to evaluate its performance (also known as testing data), although this procedure has been criticized. Alternative approaches include cross-validation or bootstrapping. External validation applies the model to a different population than that used to develop the model. This will allow exploration of the generalizability of the model.25 Model performance is assessed by discrimination, the ability of the model to separate positive from negative outcomes (often measured with a C statistic), and calibration, how similar predicted risk is to the observed risk (often represented with calibration plots; see references for detailed descriptions of how this may be done and reported).26,27 These concepts can be extended for time-to-event analyses.28 It should be emphasized that validation procedures are inherently complex. Complete cross-validation often requires an inner cross-validation step for variable selection within an outer cross-validation step for assessment of model performance.
3.5. Decision Curve Analysis
It may be desirable to augment the traditional performance metrics (discrimination, calibration) with a decision curve analysis, as illustrated by Vickers and Elkin.29
3.6. Supplemental Material
Several reviewers of this manuscript pointed out that reporting these quantities may extend beyond the level of detail traditionally permitted in the space constraints of a standard research manuscript, and they inquired about recommendations for what should be reported in the main body versus the supplement. We acknowledge that there is not always a clear distinction and therefore do not wish to mandate or dictate what should be main body versus supplemental content, but in that respect, we encourage authors to use supplemental content to fully explain things recommended or discussed here that are not included in the main text due to space constraints. It is not acceptable to respond to a reviewer comment that there was not sufficient space to explain a concept in the era of online supplement content allowing virtually limitless space to provide the necessary explanations and content requested.
4. STATISTICAL GENETICS
Statistical genetics refers to the development and use of statistical methods for deriving inferences from genetic data and most frequently refers to analyses performed on human genetic data, including but not limited to population genetics, genetic epidemiology, and quantitative genetics. Reports of findings and methods related to statistical genetics must consider a wide array of potential statistical pitfalls and biases. Some issues are specific to genetic or phenotypic data type, platform, or scale; other concerns are common to all genomic data analysis and in some cases may even be relevant for analyses of other “omics” data.
Quality control is essential and reproducible, and well-described quality control pipelines are crucial for generating unbiased results and ensuring the reproducibility of findings. Even small changes in the order in which thresholds are applied will alter the final data set and results. Many quality control assessments cannot be fully automated. Often, semiautomation can be achieved with manual inspection interspersed. It is expected that all contributing authors will abide by National Institutes of Health guidelines for depositing data in public repositories and will be willing to make full results of studies available after publication.30
Phenotypic considerations, including violations of assumptions about phenotypic distributions, and covariates are also important.31 Heritable covariates must be thoughtfully applied in the context of genetic analysis; for example, collider bias (eg, a distortion that modifies an association between an exposure and outcome, caused by attempts to control for a common effect of the exposure and outcome) can be an issue in causal inference modeling.32,33 In addition to outcome variable distribution, different kinds of genetic analyses have different sensitivities to substructure and relatedness, and it is essential to select the appropriate method to account for the data set under study. The comments and references provided in the sections below offer a general discussion of issues related to quality control, batch effects, missing data, technical variability, and structural variation, as well as some specific information related to a given technology.
4.1. Statistical Considerations in Assessments of Data Quality and Imputation
Complete documentation should be included for quality control pipelines such that analyses are reproducible and should include information on how diverse ancestries and batch effects in genotyping and gene expression data analysis are taken into account.34–39
Information on how controls are defined should be given, including inclusion, exclusion, or matching criteria, which often should include phenotypic, sex, age, and ancestral considerations.40–43
As human genetic sample sizes grow large, there is often a dramatic increase in the complexity of population histories represented in the data. The residual effects of this substructure can be amplified in statistical analyses, and researchers presenting results of large data sets (as a rule of thumb, samples >100 000 individuals) should directly address this potential source of bias. Admixed or founder populations require specialized analyses and documentation of approach.44–47 Similarly, relatedness should be directly measured and statistically accommodated or eliminated.48
There are no standardized scientific definitions for the terms ancestry, diversity, ethnicity, and race, yet these terms are often used incorrectly.49,50 For example, it is erroneous to use race and ethnicity interchangeably because these terms have distinct definitions.51 For additional guidance on terminology and perspectives on the importance of diversity in genetic studies of cardiometabolic disease, see the work by Fernández-Rhodes et al.52
Studies involving epigenomics, including methylation data, ChipSeq data, topologically associating domains, and chromatin accessibility are especially vulnerable to potential bias attributable to genomic features such as guanine-cytosine content, distance to the nearest gene, and structural variation. Approaches used to control such bias should be explicitly addressed.53
Studies involving analysis of sequence data generated across different centers must directly address this potential source of bias.54
Detection and analysis of structural variation in the human genome55,56 have their own unique set of quality control and analytical issues that require documentation for reproducibility and interpretation of results.
The diversity and size of reference panels affect the quality of genetic imputation. It is important for studies that include imputed genotypic data to document the software (including relevant parameters), the reference panel, and the selection of the scaffold single nucleotide polymorphisms.57,58
4.2. Common Statistical Approaches to Data Analysis
Candidate gene association studies should include a compelling justification for the genes and variants studied and should address statistical power and replication.59
Authors should justify the methods they choose to apply in genome-wide association analyses,60–62 and if a standard software package is used, it should be appropriately cited with version details. Whether an established pipeline or a “home-brew” (eg, tests developed by laboratories primarily for inhouse use) analysis approach is use, packages or containers describing the process should be made available to ensure reproducibility of analyses.
Relationships in family studies should be genetically verified, including detection of cryptic relatedness and verification of pedigrees.63–65 As noted, independent samples drawn from populations often contain undocumented related individuals. This relatedness should be measured and analyzed appropriately (eg, linear mixed-model approaches such as efficient mixed-model association expedited and the Efficient and Parallelizable Association Container Toolbox).66,67
Large data sets that are not formally of a cohort design often oversample controls relative to cases, which can introduce significant bias in linear mixed models.68 Furthermore, in maximum-likelihood estimation of the logistic model, accurate estimation of the parameters can be challenging when events are very rare.69 In such cases, bias reduction such as the Firth bias-reduced logistic regression method70 or generalized mixed-model association tests that use saddle point approximation to calibrate the distribution of score test statistics may be necessary to mitigate bias attributable to case-control imbalance.68 Bayesian approaches are another option that may be able to provide exact inference in the presence of case-control imbalance.
Similarly, when household-based survey approaches or other complex survey sampling is used, appropriate statistical approaches must be used to minimize bias resulting from the confounding of genetic relatedness and environmental risk factors by household.71
Genome and exome-wide sequencing is now a popular approach for genetic investigations. Such analyses require appropriate approaches for rare variant analysis, including kernel-based association tests and gene-based or pathway-based burden tests, accompanied by detailed descriptions of inclusion criteria, reference data, and relevant parameters.72,73 Often, the threshold for establishing genome-wide significance should be more stringent in analyses of whole-genome sequence data than in array-based genome-wide association studies in order to appropriately control for type 1 error. When sliding window approaches or grouped tests (such as gene-based tests) are used, the number of tests must be given to justify the threshold of multiple test correction.
Combining summary statistics across analyses via meta-analysis is a common approach for both rare and common variant analyses. Special considerations in genomic meta-analyses,74–76 and transethnic meta-analyses77 in particular include detailed examination of effect heterogeneity, linkage disequilibrium differences across studies, effect masking attributable to epistasis (particularly in populations of recent African descent), and conditional analyses to uncover novel variants in established risk-association regions of the genome. Special attention should also be given to the sampling framework and ascertainment strategies of contributing studies and the effects that these study designs may have on downstream combined and meta-analyses. Large-scale meta-analysis, especially in populations with refined patterns of linkage disequilibrium, enable fine mapping approaches, which should be accompanied by descriptions of identified credible sets of causal variants.
Another popular approach to homing in on causal variation is colocalization of functional elements with association signals such as expression quantitative trait loci signals to detect target genes.78–81 Any such applications should include discussions of the strengths and weaknesses of the leveraged data and methodologies.
Expression prediction and transcriptome-wide association studies are gene-based approaches that aggregate single nucleotide polymorphism–level effects for functionally oriented interpretation of effects by tissue or cell type.82 Linkage disequilibrium between single nucleotide polymorphisms that are predictive of expression of different local genes can complicate the interpretation of results. Combining methods such as colocalization analysis with transcriptome-wide association studies is one approach to improve inferences.
Once signals are established that implicate genes, it is often of interest to explore gene sets or pathways in which top genes are overrepresented.83–85 These can be challenging to interpret because of our incomplete knowledge of biological pathways and gene networks, clustering of functionally similar genes in the genome, and the range of statistical approaches used to estimate enrichments. As a result, such analyses should be described in detail to aid review and interpretation.
Visual representations of global data such as Manhattan plots, heat maps, violin plots, scree plots, and QQ plots (and the accompanying lambda values) can be useful in evaluating the statistical robustness of a genome-wide association study, as well as identifying potential sources of error that could be corrected via genomic control.86 Deviations from the expected distribution of P values may also indicate extreme polygenicity, which can be effectively used via linkage disequilibrium score regression to estimate heritability.87
Mendelian randomization88,89 is a form of causal inference modeling that takes advantage of the singular directionality for genetic variation: DNA variants can affect phenotypic outcomes, but phenotypes do not alter germline DNA variants.90 A common use for mendelian randomization studies is to test whether a genetic variant with a clear effect on some intermediate phenotype (eg, high-density lipoprotein cholesterol levels) is also predictive of an outcome (eg, myocardial infarction). The primary challenge to interpretation of mendelian randomization is pleiotropy, the ability of a single variant to independently affect multiple outcomes. Thus, applications of mendelian randomization should address potential pleiotropy and how and whether pleiotropy can be ruled out as a contributing factor to observations.
Replication studies are frequently used to validate findings from genomic studies. However, in some cases, for example, for rare or difficult-to-obtain phenotypes or populations, replication may not be feasible. In these cases, strict significance thresholds should be applied and other sources of validation should be sought.
4.3. Basic Statistical Guidelines in RNA Sequence Data Analysis
Best practices will depend on the specific data set under consideration. For example, RNA sequencing is sensitive to specific issues related to missing data and technical variability,91 differential expression analysis,92–94 including transcript- and exon-level differential tests, and coexpression network analysis.95 In general, “omics” data analyses require detailed descriptions of the normalization procedures, quality control, and evaluation and handling of outliers.96–98
Just as in genomic discoveries, publications of “omics” analyses must include citations (with versions) of established pipelines and shared access to home-brew analysis pipelines, as well as specific information of the data used (eg, release or data freeze).
5. RANDOMIZED CONTROLLED TRIALS
To ensure that protocols fully address key study elements, the Standard Protocol Items: Recommendations for Interventional Trials were developed by experts representing diverse stakeholders involved in the design, funding, conduct, review, and publication of trial protocols. Relying on these recommendations will help with reporting of results.99
The CONSORT statement is the standard guideline for reporting of randomized controlled trials.99a The CONSORT recommends that all manuscripts reporting primary randomized controlled trial results include a flow diagram with details of enrollment and retention and address a 25-point checklist of critical items for a high-quality trial report. The CONSORT guidelines should be followed in their entirety by authors who submit results of randomized controlled trials. It should also be noted that there are various extensions to CONSORT for specific trial designs of interest; these are not addressed in detail here, but if such a guideline exists, it should be followed. Specific points of emphasis from the CONSORT statement most pertinent to the present statement are detailed below.
5.1. CONSORT Item 2b: Specific Objectives/Hypotheses That the Trial Was Designed to Test
Include a clearly written statement such as “In the current trial, we hypothesized that a loading dose of atorvastatin delivered before percutaneous coronary intervention would reduce the risk of major adverse cardiovascular events in the 30 days after procedure.”
5.2. CONSORT Item 3a: Description of Trial Design
Specify design type (eg, parallel-group, factorial design, crossover trial)
Specify level of randomization (individual patient, provider, clinic, hospital)
Specify whether the trial is designed to test for superiority, equivalence, or noninferiority
5.3. CONSORT Item 6a: Description of the Study Outcomes
List and define all outcome measures (primary and secondary). The definition must provide sufficient detail for others to use the same outcome for analysis and to replicate the exact outcome capture in a different population.
5.4. CONSORT Item 7a: Justification of Sample Size
Describe how the sample size was determined, including an appropriate power calculation. Identify the outcome and statistical test on which the power calculation was based, all parameters used in the calculation, and the resulting target sample size. Many authors fail to provide sufficient detail to recreate their sample size calculation (eg, “Our sample size of 200 was determined to have 80% power at α=0.05” is missing several necessary components).
If a flexible sample size design was used with Bayesian analyses or provisions for sample size reestimation, the procedure should be described, possibly including code if the trial was planned by simulation.
If the study was planned around precision of a specific quantity rather than based on power to detect a specific treatment effect, the rationale for this approach must be justified, and again researchers must report all parameters and the statistical approach used in the calculations to determine that the sample size was sufficient to achieve the reported precision.
5.5. CONSORT Items 8 and 9: Description of Randomization Scheme/Allocation Ratio
Describe the procedure used to generate the randomization sequence. Any restrictions (stratification, blocking, or minimization) should be named and described. If blocking was used, the block sizes should be reported (eg, “randomly permuted blocks of size 2, 4, and 6”). If stratification was used, the covariates that determined the strata should be named; if a continuous covariate was used as a stratification variable, the threshold used to determine the strata must be reported (eg, stratified randomization by baseline left ventricular ejection fraction <35% versus ≥35%). If minimization was used, the algorithm should be described (see below for additional comment).
Report the allocation ratio (most trials use 1:1 randomization, but the allocation ratio should always be specified whether 1:1 or a different ratio is used).
If an adaptive randomization scheme was used, the details must be provided to the extent possible; this may be facilitated by including code and may require the use of extensive supplementary materials to describe the algorithm fully.
The word random is often used inappropriately to describe trials in which nonrandom allocation methods were used such as alternation, hospital numbers, or date of birth. When investigators use one of these methods, they should describe it precisely and should not use the term random or any variation of it.
This last point merits a brief expansion in terms of minimization. Trials that use a minimization algorithm for treatment allocation are largely nonrandom after the first few treatment assignments (although variants exist, some of which incorporate random components at all stages). Because this document is chiefly for reporting guidance rather than a prescription for how to perform trials or analysis, we will not take a firm position on whether minimization is a good or bad idea in clinical trials or whether a trial that uses minimization may call itself a randomized trial. However, it is important that trials that use minimization be explicit about the details of the minimization algorithm and how it was accounted for in the data analysis. We include references for those interested in more detailed discussion of minimization.100–105
5.6. CONSORT Item 12: Statistical Methods Used to Compare Groups
Describe the statistical analyses in sufficient detail that a knowledgeable statistician with access to the original data and codebook could recreate the reported results. Specify which statistical procedure was used for all of the analyses reported in the manuscript, not just the primary analysis. Even if a protocol article has been published, authors still must include crucial details of the analyses in the principal outcomes report (supplemental material may be necessary for particularly complex designs). It should be made clear exactly who is included in the analytical population and how the population was analyzed (eg, intention to treat, modified intention to treat, per protocol) with sufficient details to appreciate how the cohort is defined. If the primary analysis is adjusted for any covariates, this should be specified, as well as which variables are adjusted for and whether they were prespecified. If interim analyses are performed, they should be described (the number and timing of interim analyses, whether they are planned or unplanned, and any prespecified decision rules related to stopping for efficacy futility, as well as other adaptations based on the interim results); see CONSORT Item 14.
There should be no ambiguity in how the outcomes are analyzed; a common challenge is an outcome variable that is measured as a numeric quantity (eg, left ventricular ejection fraction) that is analyzed as a dichotomized quantity (eg, “myocardial recovery” defined as “follow-up left ventricular ejection fraction ≥35% with an increase of at least 10% from the baseline left ventricular ejection fraction”). The definition must be sufficiently clear that the reader can determine precisely how the outcome variable appears in the analysis. Although this is chiefly a reporting guideline, we feel compelled to point out that such dichotomization is statistically inefficient and loses power, so we encourage researchers to use the full power of their outcome when possible.
5.7. CONSORT Item 13: Participant Flow/Losses
Include a flow diagram that lists the numbers of patients who were screened, deemed eligible, consented, and randomized; actually initiated their assigned treatment; withdrew; and were lost to follow-up, as well as the number included in the final analysis.
Clarify whether the investigators included all participants who were randomized in the groups to which they were originally allocated (intention to treat); if not, describe the modified intention to treat or per-protocol approach that was used.
5.8. CONSORT Item 14: Stopping Reasons
If the trial was stopped early, include a description of why the trial was stopped. If interim analyses were used in this decision, it should be reported how many were done, when they were performed, if there were statistical guidelines or stopping rules in place before the trial was initiated, and how they were used in the decision to stop the trial. The Data and Safety Monitoring Board or Data Monitoring Committee role should be described in any such decisions.
5.9. CONSORT Item 15: Baseline Data
Include a description of baseline characteristics of the participants who were included in the trial. It is commonly recommended (although not universally agreed on) to present the baseline characteristics stratified by assigned treatment.106 If baseline characteristics are presented by treatment assignment, note that statistical significance tests comparing baseline variables between treatment groups should not be performed, as explained by Altman,107 Altman and Doré,108 Begg,109 and Senn110,111 and advised against by CONSORT. The presence of significance tests comparing baseline characteristics often leads the reader to conclude that the randomization was flawed if they see ≥1 “significant” differences between the groups, although it is highly likely that this will occur in large trials, and readers may discount or misinterpret the trial results for this reason. See the reference above for an alternative perspective that advises against the presentation of baseline characteristics separated by treatment arm.
5.10. CONSORT Item 17: Outcomes and Estimation
For each primary and secondary outcome, report a summary of the outcome in each group (eg, the mean response in each group or the number of participants in each group who had the event), together with the contrast between groups, or effect size.
For continuous data, the effect size is usually presented as the difference in means. For binary outcomes, the effect size could be the risk ratio, odds ratio, or risk difference. For time-to-event data, the effect size is usually the hazard ratio or difference in median survival time (ideally analyzed and reported at a prespecified time horizon).
CIs (or at least some measure of variation) should be presented for the contrast between groups. P values may also be provided in addition to CIs, but results should not be reported solely as P values. When P values are reported, use the exact P value, not simply “P<0.05.”
Results should be reported for all planned primary and secondary outcomes, not just analyses that were statistically significant or interesting (outcome reporting bias). The primary outcomes should always be reported first in the Abstract and Results, followed by secondary outcomes. In addition, any adverse events that are not specified as primary or secondary outcomes should be reported.112
5.11. CONSORT Item 18: Ancillary Analyses
In addition to primary and secondary outcomes, trial publications may include exploratory or post hoc analyses. If subgroup analyses were undertaken, authors should report which subgroups were examined, why, if they were prespecified, and how many were prespecified. In the evaluation of subgroups, the question is not whether there is a statistically significant treatment effect within a subgroup but whether treatment effects in the subgroup are significantly different from treatment effects outside the subgroup. This is best assessed with tests for statistical interaction. Ideally, such tests should be model based with covariate adjustment.
5.12. CONSORT Item 23/24: Registration and Protocol
Authors should report the trial registration, existence of a protocol paper, and any changes in the protocol between the protocol paper and publication of the primary results publication.
6. SYSTEMATIC REVIEWS AND META-ANALYSES
A systematic review systematically identifies, selects, critically assesses, and synthesizes evidence relevant to a well-defined question about diagnosis, prognosis, or therapy. A systematic review that also includes a quantitative synthesis of primary study data is both a systematic review and meta-analysis. The most common reporting guidelines for systematic reviews and meta-analyses are the PRISMA statements and their extensions.113,114 Building on the PRISMA guidelines, recommendations were formulated to improve the transparency, accuracy, completeness, and frequency of documented systematic review and meta-analysis protocols.115 In addition, a recent AHA scientific statement116 provides an overview of methodological standards for meta-analyses and qualitative systematic reviews of cardiac prevention and treatment studies. Specific points of emphasis most pertinent to AHA statistical reviewers include the following.
6.1. PRISMA Item 1
Manuscripts reporting these types of review should be titled as such, eg, the phrase “A Systematic Review and Meta-Analysis” should be included as a subtitle for indexing and search purposes.
6.2. PRISMA Item 5
Specify whether a review protocol exists and where it may be accessed. Registration of the systematic review protocol is encouraged via PROSPERO.117
6.3. PRISMA Items 6 to 10
The characteristics of eligible studies, search strategy, and data collection process should be described with enough detail that independent investigators could recreate the methods. At a minimum, for each database searched, authors should report the database, platform, or provider (eg, Ovid, Dialog, PubMed) and the start and end dates for the search of each database. The systematic search should be updated to within 3 months of submission. A detailed search algorithm for at least 1 database must be included as supplementary data.
6.4. PRISMA Items 12 and 19
Describe methods used for assessing the risk of bias of individual studies. Assessing the risk of bias of individual studies in an systematic review and meta-analysis should be part of the conduct and reporting of any systematic review. Many risk-of-bias tools are available, depending largely on the study design of the primary studies, and most are designed to derive an estimate of low, moderate, or high risk of bias. The National Toxicology Program provides a searchable database of risk-of-bias tools for observational studies of exposures.117a For randomized and nonrandomized clinical trials, authors are advised to consider the risk-of-bias tools developed by the Cochrane Collaboration.117b
6.5. PRISMA Item 13
State the principal summary measures (eg, risk ratio, difference in means) extracted from the studies. The chosen summary effect measure may not be directly available from all of the included studies. If calculations are done to derive these measures or their variability, they need to be described in detail. Data not explicitly given in the text of primary reports may be able to be reconstructed from figures with digitizing software.
6.6. PRISMA Items 14 and 21
Describe methods of handling data and combining results of studies. The 2 most common methods of combining results of studies are the fixed-effect and random-effects models. These models have different underlying assumptions, and the choice of model should be made a priori. The fixed-effect model considers only within-study variability, whereas the random-effects model considers both within- and between-study variability. Different methods for performing both types of meta-analysis should be explicitly stated. Common fixed-effect approaches are Mantel-Haenszel and inverse variance, whereas random-effects analyses often use the Der-Simonian and Laird approach, although other methods exist, including Bayesian meta-analysis. Authors should report how between-study variability (heterogeneity) was evaluated. The I2 statistic quantifies the amount of variation in results across studies beyond that expected by chance. When there are only a few studies, inferences about heterogeneity should be cautious.
6.7. PRISMA Items 15 and 22
Specify any assessment of risk of bias that may affect the cumulative evidence. Nonpublication of research findings dependent on the actual results is an important risk of bias to a systematic review and meta-analysis and may result in summary measures that are biased away from the null. The general concept of small-study bias includes publication bias, among other important sources of bias, and should be addressed in any meta-analysis. The rule of thumb is that assessment of these biases is useful only if at least 10 studies are included in the review.
6.8. PRISMA Items 16 and 23
Describe any methods for additional analyses, indicating which are prespecified. Additional analyses may include subgroup analysis and meta-regression, which are often done to try to understand between-study heterogeneity. These analyses are dependent on the number of studies available and thus are not often possible. Sensitivity analyses should be undertaken, whether prespecified or not. Results of these analyses are commonly shown in a summary forest plot in which the overall result is presented along with results, for example, from subgroups or analyses including only low-risk-of-bias studies.
6.9. PRISMA Item 17
Report the number of studies screened, assessed, and included in the review. All reports should include an appropriate PRISMA flowchart to depict the flow of information through the different phases of the systematic review (a template of the PRISMA flow diagram is available online113).
6.10. PRISMA Item 18
Include a table showing characteristics of primary studies. Table 1 in a systematic review and meta-analysis is often a summary of included studies, showing enough detail in study-level characteristics so that the population of interest is sufficiently described and can be compared between studies and assessed for generalizability.
6.11. PRISMA Items 20 and 21
For all outcomes considered, present a forest plot of individual study estimates, with an overall estimate if meta-analyses were undertaken. Forest plots are the most recognizable attribute of a meta-analysis; they show the individual study results and their combined result. It should always be clear which of the included studies contributed to each meta-analysis, ideally showing the weight each component study contributed (and whether they are fixed- or random-effects model weights). For each forest plot, a measure of the consistency of the results from the included studies such as I2 should be provided.
7. SURVIVAL ANALYSES
Many studies have a primary end point of mortality or the occurrence of a composite end point such as major adverse cardiovascular event in participants followed up for a period of time after an index time point (eg, randomization, occurrence of a disease or a particular procedure, initiation of a new medication). These end points are analyzed with techniques referred to as survival analyses, which are appropriate for time-to-event data, and account for censoring (eg, participants reaching the end of study or lost to follow-up without ability to observe the event).
7.1. Reporting Results of Time-to-Event Analyses
The total number of participants at risk (overall and within each study arm that is being compared) at the beginning of the study.
Explicit definition of what constitutes an event in the analysis. Outcomes such as all-cause mortality are typically obvious, but cardiovascular disease literature often includes outcomes such as major adverse cardiovascular event or clinical improvement, which must be clearly defined.
The total number of events that occurred during study follow-up (overall and within each study arm that is being compared).
The time at which participants are considered at risk for the event (randomization, initiation of study medication, start of procedure, etc); see comment below on immortal time.
At least 1 summary index of the amount of time at risk (eg, the total amount of person-time at risk for the outcome or the mean/median available follow-up time).
Summary statistics: When estimating the difference in risk of the time-to-event outcome between groups or the relation between selected risk factors and the outcome, a summary statistic must be reported (eg, hazard ratio, difference in survival proportions or restricted mean survival time at a specific time point, or difference in median survival) with an accompanying CI or credible interval. The statistical model (eg, Cox proportional hazards model) used to estimate the summary statistics should be named or described, and an estimate of uncertainty (eg, CI) should be included for each point estimate (authors should not simply report hazard ratio=1.50, P=0.01 for a variable without including some measure of uncertainty). If the analysis was performed with a Cox model, the proportional hazards assumption should be tested and the results reported.118
If a multivariable model is used, a description of the model building strategy that includes variable selection and methods to verify assumptions should be provided (Section 11, Covariable Adjustment and Propensity Scores).
Graphical presentation: It is common to present a graphical summary of time-to-event data using Kaplan-Meier curves. Recently, statisticians performed the KMunicate study and proposed additions to the standard presentation of Kaplan-Meier curves to enhance interpretability.119,120 First, add shaded regions (representing CIs) to indicate the uncertainty surrounding the Kaplan-Meier estimates; second, add an extended risk table with the number of participants at risk, number censored, and number of events that have occurred to date across the study time points. Because this option is not yet commonly available in most standard statistical software, at minimum, the remaining number at risk over time (within each group represented on the figure) should be presented beneath the horizontal axis. Preferences for the time intervals may vary according to the specific clinical question and available data; for example, a study that reports 5 years of follow-up may choose to report the number at risk at 1-year intervals, and a study that reports outcomes over 90 days may report the number at 30-day intervals.
Another option that may be helpful is presenting a figure that plots the point estimate for the difference in the 2 Kaplan-Meier curves along with a shaded region representing confidence limits for the difference.
If adjusted survival curves are presented, the process in which they are derived must also be clear. (For example, were covariates set to their mean or some chosen representative values?) Some outcomes that we may analyze as time-to-event data may occur more than once (eg, hospitalization); statistical models (eg, Andersen-Gill, Prentice-Williams-Peterson, Wei-Lin-Weissfeld) allow the inclusion of multiple time-to-event outcomes for the same patient.121–123 If this approach is used, it should be clearly stated, and the model should be described in sufficient detail to appreciate what was done to account for the presence of repeated events from the same participant. (How was each new period at risk defined for patients with >1 occurrence of the event? When did patients the risk set after experiencing an event. What was done with time after their last occurrence of the event during which they were still alive and potentially at risk for another event but did not experience it? What statistical model was used to analyze the data?).
Time-varying covariates: If the analysis includes ≥1 time-varying covariates, it must be clearly distinguished which variables are treated as time varying.
Competing risks may be a consideration (eg, time to recurrence and similar events with a competing risk of death). In this case, a suitable method that accounts for competing risks (Fine and Gray method, joint models) should be applied, and the number of terminal events must also be reported.
Immortal time: In the context of a prospective study/trial, the start time at which the patient’s time at risk begins is usually apparent (most often at the time of randomization or perhaps initiation of study treatment). In the context of a retrospective/observational study, this may be more difficult to identify, leading to concern about immortal time bias.124 For this reason, authors must be clear in identifying the beginning of time at risk and mindful of whether their analytical strategy introduces the possibility of immortal time for 1 or more of the groups studied.
8. BAYESIAN STATISTICAL APPROACHES
The Bayesian approach to statistical inference allows learning from evidence as it accumulates, beginning with a prior distribution and using the observed data to update the prior beliefs into a posterior probability function. Bayesian approaches allow calculation of the probability that a particular hypothesis is true given the observed data: Pr(H|D). In contrast, null hypothesis significance testing using a frequentist approach computes a P value that is akin to Pr(D|H0), or the probability of having obtained the data D (or even more extreme, unobserved data) if the null hypothesis H0 is true. The majority of clinical outcomes research has historically been performed using frequentist statistical approaches; however, in recent years, some high-profile cardiovascular outcomes trials such as the SURTAVI (Surgical Replacement and Transcatheter Aortic Valve Implantation) and DAWN trials have used a Bayesian approach, so it is essential for researchers to familiarize themselves with the basics of reporting Bayesian statistical analyses.125,126
Despite the increasing popularity of Bayesian inference in empirical research, there are few practical guidelines that provide detailed or standardized standards for how to apply Bayesian methods and interpret the results.127,128 In 2010, the Center for Devices and Radiological Health of the US Food and Drug Administration finalized guidelines for the application of Bayesian statistics in trial design and interpretation in clinical trials of medical devices.129
8.1. Key Standards for Conducting, Reporting, and Interpreting Results of Bayesian Analysis
Specify choice of prior distribution (normal, uniform, binomial, gamma) and the hyperparameter values (eg, mean, variance for a normal distribution), as well as the rationale and justification for this choice.
If an informative prior distribution is used, reviewers are likely to look with extra scrutiny, so authors should give a strong rationale for the choice of an informative prior and justification for the specific choices made.
Specify the structure of higher levels of the model, in the case of hierarchical models.
If the analysis will potentially be used to support a regulatory decision in support of treatment efficacy, it may be appropriate to define a success criterion (eg, posterior probability of effect >0.95 or another threshold) and to demonstrate by simulation that the approach retains frequentist type I error of no greater than 2.5% or 5%.
Methods for generating posterior summaries, including analytical technique or simulation such as Markov Chain Monte Carlo method and software used for analysis.
Present numeric or graphical summaries of the prior distribution and the posterior distribution.
Report posterior mean or median and appropriate credible interval for parameters or effects of interest.
Report posterior probabilities of effect sizes of interest, for example, P(hazard ratio >1) for effect >0 or P(hazard ratio >1.2) if that is considered a clinically relevant effect of interest.
Discuss robustness to different choices of prior distributions.
9. MISSING DATA
Missing data can be defined as “values that are not available and that would be meaningful for analysis if they were observed.”130 Missing data are generally categorized as missing completely at random, missing at random, or missing not at random. Unfortunately, missing not at random is most often the case in clinical research, which is problematic if the missing data are related to the clinical presentation or study outcome. Missing data are common and pose considerable challenges in the analyses and interpretation of clinical research.
For clinical trials, a special report in the New England Journal of Medicine provides a useful synopsis of the issue and underscores that, when relevant data are missing, no analytical approach can be guaranteed to produce unbiased estimates of treatment effects.130 The need to prevent missing data at the design stage applies equally outside the context of randomized trials (eg, to epidemiology research and other designs) if data are being collected prospectively. Hence, the best approach is to take preventive steps to minimize missing data during the design phase.
There is no agreed-on method for handling missing data in the analytical phase. It is important to underscore that the plausibility of assuming whether data are missing at random cannot generally be verified. Some commonly used methods for dealing with missing data include complete-case analyses, missing indicator method, single value imputation, multiple imputations, and inverse probability weighting.131–133
9.1. Standards for Statistical Reporting
Report the frequency of missing data for all primary and secondary outcome measures.
Report the frequency of missing data for all variables used as predictor variables in any modeling.
As much as possible, report reasons for missing data, and use that to inform a primary set of assumptions about the missing-data mechanism (missing completely at random, missing at random, missing not at random).
Conduct analyses using a statistically valid approach under the primary missing data assumption.
Consider sensitivity analyses that demonstrate robustness (or lack thereof) of inferences under departures from the primary missing-data assumption.
Provide sufficient details that your method could be recreated.
10. CORRELATED DATA
A critical assumption of many foundational statistical approaches is that the observations are independent. However, research studies can include data points that are not fully independent but correlated with one another in clusters (eg, repeated blood pressure assessments on the same patients longitudinally over time, spatial assessments of multidimensional cardiac imaging parameters recorded within a single patient, or all the patients treated at the same clinic having been assigned as a group to a treatment arm in a cluster randomized trial). Ignoring the dependence in the analysis of correlated data tends to downwardly bias error variance estimates and inflates the risk of false-positive results.134 When correlated data are present, care must be taken to ascertain and address the structure of the correlations involved. Similarly, to critically evaluate whether the appropriate methods were applied to the correlated data, it is necessary to report the aspects of sampling design and data with respect to exposures that may have yielded correlated observations.
Relevant resources include the “CONSORT 2010 Statement: Extension to Cluster Randomized Trials,”135 Fitzmaurice and Ravichandran’s136 “A Primer in Longitudinal Data Analysis,” the Tooth et al137 checklist of criteria for the design and reporting of observational longitudinal studies), the Niven et al138 report on methods for matched case-control studies, the Brown et al139 best-practices report, and the Altman et al140 report “Statistical Guidelines for Contributors to Medical Journals.”
10.1. Design, Analysis, and Reporting of a Study With Correlated Data
Whether study subjects were assigned or otherwise exposed to study conditions or exposed individually or in clusters.
The levels of clustering of observations present in the structure of the data (eg, longitudinal observations at the patient level on multiple patients at the clinic level).
Whether clustered observations have no variation in the exposure of interest among observations within clusters (ie, exposure status is nested in each cluster) or if there is variation in the exposure status within cluster.
What measurements were taken and when, including details on the number of measurements, timing of measurements, and duration of follow-up, etc.
Details on missing data for missed measurements and mistimed measurements (eg, a 1-year follow-up at 15 months) and how they were handled in the analysis.
All the assumptions made about the correlations of the observations within clusters should be specified.
Key details of the data modeling such as what link function was used, for example, for generalized linear modeling; whether a conditional (eg, mixed effects) or a marginal (eg, generalized estimating equations) modeling strategy was used that appropriately accounts for the correlations141,142; how time (or other structural variables) was handled in the model (eg, continuous and modeled as a polynomial or categorical with separate dummy variables to indicate follow-up assessments); and how the covariance structures were selected (eg, by the Akaike information criterion).143
Key results from the modeling, which should include model parameter estimates with CIs and can also include estimates of variance or correlations within or between the correlated data structures.
For example, consider these points as they apply to the design and analysis of a hypothetical cluster randomized controlled clinical trial of an intervention to reduce blood pressure over time:
The study had 2 distinct levels of correlated data. The first level was the clinic, corresponding to the cluster of consenting patients nested within each clinic randomly assigned, as a group, for exposure to the control treatment or to the intervention treatment. The second level was the individual patient, corresponding to the longitudinal baseline, 3-month, 6-month, and 12-month posttreatment blood pressure measurements drawn from each patient. After verification of modeling assumptions, a conditional modeling strategy was applied by constructing a linear mixed-effects regression model using an identity link for these continuous, multilevel clustered blood pressure data. The model included fixed effects for treatment assignment indicator, for categorical indicators of follow-up assessment time, and, to estimate the treatment effect over time, for interaction indicators between the treatment indicator and the time indicators. The model also included random intercept effects for the clinics, assuming an exchangeable covariance structure in order to account for the correlated patients clustered within clinics and an autoregressive covariance structure on the residual errors to account for the correlated longitudinally clustered blood pressure measures nested within patients. The parameter estimates for the treatment effect over time and their 95% CIs were reported, as well as the intraclass correlation coefficient, which estimates how similar the blood pressure measurements tend to be within clinics.
Depending on the goals and circumstances of such a study, marginal modeling based on generalized estimating equations might be preferable, as described by Brown et al139 and Altman et al.140
11. COVARIABLE ADJUSTMENT AND PROPENSITY SCORES
Multivariable regression is often used to contrast outcomes between ≥2 groups. It is applicable to both randomized and nonrandomized designs and more commonly used in observational studies.
In randomized trials, although there is no systematic bias in treatment assignment, covariable adjustment may be useful for several reasons. If the randomization is stratified, it is always recommended to adjust for the stratification variables in the primary analysis.144–146 In addition to the stratification variables, adjustment for other strong prognostic covariates typically will increase statistical power to detect true treatment effects (for continuous outcomes, by reducing the variance of the treatment effect estimate; for binary or time-to-event outcomes, by increasing the magnitude of the estimated treatment effect). If this any covariate adjustment is performed, authors should be specific about the variables included, why they were chosen, and how they were parametrized in the regression model.
In observational studies, adjustment for differences between groups is critical to evaluate outcomes. Although comparative effectiveness research is often singled out, its basic approach, however, relies on careful analysis of confounding and is vulnerable to the same biases as any observational studies.
Most commonly, control of confounding consists of careful application of covariate adjustment with some form of multivariable modeling. Authors are referred to the Strengthening the Reporting of Observational Studies in Epidemiology guidelines,147 which should be followed in their entirety.
11.1. Points of Emphasis
Include a clearly written analytical plan that explains how the study hypotheses or objectives were addressed and details of how this was performed.
Include information on the specification, definition, and ascertainment of the study end point. The number and frequency of these end points should be presented. In the case of time-to-event end points, a mean or median follow-up time also should be presented. When relevant, competing-risk issues should be addressed (see Section 7, Survival Analyses).
Premodeling analyses focusing on perceived sources of bias and confounding and methods used for ascertainment and control should be addressed.
Authors should specify the model form used (eg, logistic, linear, mixed model) and the approach to model development.
A priori identification of covariates used for adjustment is a preferable approach; a priori identification of covariate selection based on univariate analyses is generally discouraged. How these data elements are coded and handled in the model should be specified.
There is some controversy over whether descriptive tables that compare baseline characteristics of treatment/comparison groups should include significance tests. Note that the arguments concerning P values comparing treatment groups in Table 1 of a randomized trial are fundamentally different from those of an observational or nonrandomized study. In a randomized trial, it is known that the participants were drawn from the same underlying distribution and randomly allocated to groups, so a P value is computing the probability of something occurring by chance when it is already by definition known to have occurred by chance. In observational studies, it is plausible that patients in 1 interventional group came from a fundamentally different baseline distribution from those in a different interventional group, so the P value still arguably retains an inferential meaning in testing whether the treatment groups are drawn from different baseline populations. However, the Strengthening the Reporting of Observational Studies in Epidemiology statement advises against presenting significance tests for tables of descriptive characteristics, in part because of the preceding statement that univariable analyses ought not to be used to choose which covariates to account for in the analysis.
Present the details of how important model assumptions (eg, linearity, homoscedasticity) were examined and, if any of them influenced the final model building strategy, how they did so. In addition, report whether any preplanned interactions were considered, how they were tested, and the results. For interaction terms that were considered, authors should include a brief justification for why they were considered.
Authors are encouraged to consider the use of flexible regression methods such as restricted cubic splines in the event that relationships between continuous covariates and study outcomes are nonlinear.
Propensity scores are a technique to account for treatment selection bias in estimation of the effect of an intervention.148–151 Whether used as a factor for matching, stratification, weighting (eg, inverse probability of treatment weighting), or covariate adjustment, propensity scores should be based on rigorous modeling techniques. Authors should report the covariates used to construct the propensity score and the criterion used to choose them.
If propensity scores are used for matching, authors must report the caliper distance or matching algorithm used to create the matched pairs of patients and the total number of matched pairs that remained after matching compared with the number of patients in each treatment category from the original prematching cohort. Authors should also report a table comparing the baseline characteristics and assess the standardized differences after propensity-score matching to see whether a suitable balance has been achieved.149
If used for weighting, stratification, or covariate adjustment, authors should justify their approach. When inverse probability of treatment weighting is used, researchers should explicitly describe the type of weight used (eg, standard, stabilized, average treatment effect in the treated).
Note that if the sample size is sufficiently large, direct liberal covariate adjustment may be preferred to propensity score–based methods.
Instrumental variable methods are increasingly used in cardiovascular research to conduct comparative-effectiveness analyses.152–156 Adopted from econometrics research, these methods are used to help address unobserved confounding when treatment effects in nonrandomized studies are estimated.157,158 An instrumental variable is a factor that is associated with treatment assignment but is not directly related to the study outcome or indirectly related via pathways through unmeasured variables.159 Therefore, instrumental variable methods eliminate confounding without the need to ever measure the confounders.160 Many types of instrumental variables exist such as geographic location, facility or physician preference, date, and time.161
Instrumental variables are required to meet 3 conditions: (1) the instrument is associated with the exposure; (2) the instrument does not affect the outcome except through treatment itself (ie, exclusion restriction assumption); and (3) the instrument is independent of confounders such that it does not share any causes with the outcome (ie, random assignment assumption).160 To address the first condition, authors should provide an assessment of the strength of the instrument. This may include report of the partial R2 or, more commonly, the F statistic of the first-stage regression model.161,162 An F value of ≥10 has been shown to suggest a strong instrument.154 For preference-based instruments, authors should also describe or visually display the heterogeneity of the preference-based instrumental variable.155,156 The second and third conditions cannot be empirically verified. Thus, deep subject matter knowledge is required, which authors should describe in their methods to support that these assumptions have been reasonably met. Although the association between the instrument and observed confounders cannot be tested, authors should demonstrate that the instrumental variable balances measured characteristics across different levels of the instrument.154
12. POWER AND SAMPLE SIZE CONSIDERATIONS
The design of the study will determine how power and sample size estimation should be approached. Power and sample size for all prospectively planned analyses should be determined before initiation of the study.
12.1. Number of Observations
For all manuscripts, the number of observations collected and analyzed should be featured prominently in both the Abstract and main text.
12.2. Clusters
If the exposure of interest (randomized or observational) is not at the subject/patient level, authors should also ensure that the number of clusters (eg, providers, clinics, hospitals) is reported. If clustering is accounted for in the analysis, assumptions related to the clustering that factor into the sample size correlation (eg, intracluster correlation coefficient) should be specified.
12.3. Primary Analysis
If the manuscript reports data collected for the purpose of assessing the study hypothesis (ie, primary analysis), authors should justify the study sample size.
If the study is a randomized trial, please see guidance in the section Randomized Controlled Trials (Section 5), as well as the DELTA-2 guidance.163
If the study is not a randomized trial, provide justification of the final sample, whether driven by statistical considerations, logistical considerations, or other. This may include retrospective data collection, use of administrative databases, or population registry data in which the sample size may be whatever was available for analysis in many cases. If that is the case, the authors should address how they determined that the sample size was sufficient to meet the research objectives.
If a formal power calculation was used, authors should specify the outcome on which the calculation was based, the statistical test to be used, and any additional assumptions made (eg, magnitude of between-group differences, variability of data, duration of follow-up) and provide a brief note on the source or justification for the respective assumptions. All necessary inputs to reproduce the calculation should be provided; the report should be sufficiently detailed that an independent statistician would be able to recreate the calculation.
If the study was designed based on a quantity of precision rather than being powered to detect a specific effect size, details relevant to that approach should be reported.
12.4. Secondary Analysis
If the study consists of a secondary analysis (eg, data analyzed were not originally collected for the purposes of addressing the present research question, eg, using the imaging substudy from a randomized trial to address whether a specialized imaging technique had prognostic value for the clinical outcomes), then explicit power/sample size may not be required. Instead, a reference to the original study design may be sufficient.
12.5. Post Hoc Power Calculations
Despite lingering for years against advice of statistical experts, this still merits a brief comment.164 Post hoc power calculations using the observed effect size are a simple transformation of the study P value and should never be used because they do not answer a meaningful question. If a power calculation was not performed (eg, example above of secondary analysis using existing data set), then it may be acceptable to report a “power calculation” illustrating that the available sample size was sufficient to address the study question of interest. Note that this should be based on some minimum effect size of interest for the primary inference, not on the observed data.
DISCUSSION
Herein, we provide a summary of guidelines most suitable for reporting statistical analyses in the AHA portfolio of journals. These standards should not be interpreted as absolute rules and technical recommendations. They do not constitute a comprehensive guide on how to select appropriate statistical methods to analyze data. Indeed, no single document can account for every possible study design choice and analytical decision that researchers face, and authors should consult a statistician with experience in biomedical research.
Deliberate attention was given to directing authors to existing published guidelines and recommendations. In recent years, emphasis has been directed to new applications of existing methods such as Bayesian analysis, whereas new methods such as machine learning have emerged as increasingly relevant as data sets get larger and computing capabilities increase. This document should be interpreted as reflecting current knowledge, which will require periodic updates.
Supplementary Material
Acknowledgments
The authors thank the following colleagues (listed in alphabetic order) who provided rigorous and insightful reviews of each section of this document: Kaleab Z. Abebe, PhD (University of Pittsburgh); Peter C. Austin, PhD (Institute for Clinical Evaluative Sciences); Jesse A. Berlin, ScD (Johnson & Johnson); Laura J. Bonnett, PhD (University of Liverpool); Jody D. Ciolino, PhD (Northwestern University); Cynthia Crowson, PhD (Mayo Clinic); Brandon J. George, PhD, MS (Thomas Jefferson University); Michael O. Harhay, PhD, MPH (University of Pennsylvania); Frank E. Harrell, PhD (Vanderbilt University); Graeme L. Hickey, PhD (Medtronic); Pardeep S. Jhund, MBChB, PhD (University of Glasgow); Charles Kooperberg, PhD (Fred Hutchinson Cancer Research Center); Braxton D. Mitchell, PhD, MPH (University of Maryland); Phillip Schulte, PhD (Mayo Clinic); Ewout W. Steyerberg, PhD (Erasmus MC); and Jonathan G. Yabes, PhD (University of Pittsburgh). Naming these reviewers does not imply endorsement of the full contents of the document.
The authors also thank Denise Kuo, Director of Journal Operations, Circulation journals and JAHA; the AHA for general coordination of this effort; and Deborah Strain for manuscript preparation assistance.
Footnotes
Disclosures
Writing Group and Reviewer Disclosures are available in the Data Supplement.
Contributor Information
Andrew D. Althouse, Center for Research on Health Care Data Center, Division of General Internal Medicine, University of Pittsburgh, PA.
Jennifer E. Below, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN.
Brian L. Claggett, Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA.
Nancy J. Cox, Vanderbilt Genetics Institute, Vanderbilt University Medical Center, Nashville, TN.
James A. de Lemos, Division of Cardiology, University of Texas Southwestern Medical Center, Dallas.
Rahul C. Deo, Division of Cardiovascular Medicine, Brigham and Women’s Hospital, Boston, MA.
Sue Duval, Cardiovascular Division, University of Minnesota Medical School, Minneapolis.
Rory Hachamovitch, Department of Cardiovascular Medicine, Heart and Vascular Institute, Cleveland Clinic Foundation, OH.
Sanjay Kaul, Department of Cardiology, Cedars-Sinai Medical Center, and the David Geffen School of Medicine, University of California, Los Angeles.
Scott W. Keith, Division of Biostatistics, Department of Pharmacology and Experimental Therapeutics, Sidney Kimmel Medical College of Thomas Jefferson University, Philadelphia, PA.
Eric Secemsky, Richard A. and Susan F. Smith Center for Outcomes Research in Cardiology, Division of Cardiology, Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA.
Armando Teixeira-Pinto, School of Public Health, Faculty of Medicine and Health, University of Sydney, Australia.
Veronique L. Roger, Department of Cardiovascular Diseases Medicine, Mayo Clinic College of Medicine, Rochester, MN; Epidemiology and Community Health Branch National Heart, Lung and Blood Institute, National Institutes of Health, Bethesda, MD.
REFERENCES
- 1.EQUATOR Network. EQUATOR Reporting Guidelines. Accessed August 11, 2020. http://www.equator-network.org/ [Google Scholar]
- 2.Assel M, Sjoberg D, Elders A, Wang X, Huo D, Botchway A, Delfino K, Fan Y, Zhao Z, Koyama T, et al. Guidelines for reporting of statistics for clinical research in urology. BJU Int. 2019;123:401–410. doi: 10.1111/bju.14640 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Vickers AJ, Assel MJ, Sjoberg DD, Qin R, Zhao Z, Koyama T, Botchway A, Wang X, Huo D, Kattan M, et al. Guidelines for reporting of figures and tables for clinical research in urology. Urology. 2020;142:1–13. doi: 10.1016/j.urology.2020.05.002 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Harrell FE Jr. Checklist for authors: statistical problems to document and to avoid. Accessed August 11, 2020. https://discourse.datamethods.org/t/author-checklist/3407 [Google Scholar]
- 5.Jorgensen AL, Williamson PR. Methodological quality of pharmacogenetic studies: issues of concern. Stat Med. 2008;27:6547–6569. doi: 10.1002/sim.3420 [DOI] [PubMed] [Google Scholar]
- 6.Weissgerber TL, Milic NM, Winham SJ, Garovic VD. Beyond bar and line graphs: time for a new data presentation paradigm. PLoS Biol. 2015;13:e1002128. doi: 10.1371/journal.pbio.1002128 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Weissgerber TL, Winham SJ, Heinzen EP, Milin-Lazovic JS, Garcia-Valencia O, Bukumiric Z, Savic MD, Garovic VD, Milic NM. Reveal, don’t conceal: transforming data visualization to improve transparency. Circulation. 2019;140:1506–1518. doi: 10.1161/CIRCULATIONAHA.118.037777 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Wasserstein RL, Schirm AL, Lazar NA. Moving to a world beyond “p < 0.05”. Am Statistician. 2019;73:1–19. doi: 10.1080/00031305.2019.1583913 [DOI] [Google Scholar]
- 9.datamethods. Language for communicating frequentist results about treatment effects: datamethods. Accessed August 11, 2020. https://discourse.datamethods.org/t/language-for-communicating-frequentist-results-about-treatment-effects/934 [Google Scholar]; 9a. Center for Open Science website. Accessed August 11, 2020. https://www.cos.io/initiatives/prereg [Google Scholar]
- 10.Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig L, Lijmer JG, Moher D, Rennie D, de Vet HC, et al. ; STARD Group. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. BMJ. 2015;351:h5527. doi: 10.1136/bmj.h5527 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.McClish DK. Analyzing a portion of the ROC curve. Med Decis Making. 1989;9:190–195. doi: 10.1177/0272989X8900900307 [DOI] [PubMed] [Google Scholar]
- 12.Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation. 2007;115:928–935. doi: 10.1161/CIRCULATIONAHA.106.672402 [DOI] [PubMed] [Google Scholar]
- 13.Popović ZB, Thomas JD. Assessing observer variability: a user’s guide. Cardiovasc Diagn Ther. 2017;7:317–324. doi: 10.21037/cdt.2017.03.12 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14.Altman DG, Bland JM. Measurement in medicine: the analysis of method comparison studies. Statistician. 1983;32:307–317. [Google Scholar]
- 15.Harrell FE, Slaughter JC. Biostatistics for Biomedical Research. Vanderbilt Institute for Clinical and Translational Research Edge for Scholars, Department of Biostatistics. Accessed October 2, 2020. http://hbiostat.org/doc/bbr.pdf [Google Scholar]
- 16.Damen JA, Hooft L, Schuit E, Debray TP, Collins GS, Tzoulaki I, Lassale CM, Siontis GC, Chiocchia V, Roberts C, et al. Prediction models for cardiovascular disease risk in the general population: systematic review. BMJ. 2016;353:i2416. doi: 10.1136/bmj.i2416 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Deo RC. Machine learning in medicine. Circulation. 2015;132:1920–1930. doi: 10.1161/CIRCULATIONAHA.115.001593 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Stevens LM, Mortazavi BJ, Deo RC, Curtis L, Kao DP. Recommendations for reporting machine learning analyses in clinical research. Circ Cardiovasc Qual Outcomes. 2020;13:e006556. doi: 10.1161/CIRCOUTCOMES.120.006556 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19.Collins GS, Reitsma JB, Altman DG, Moons KG; TRIPOD Group. Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement: the TRIPOD Group. Circulation. 2015;131:211–219. doi: 10.1161/CIRCULATIONAHA.114.014508 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 20.Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Statist Soc B. 2005;67:301–320. [Google Scholar]
- 21.Steyerberg EW, Eijkemans MJ, Harrell FE Jr, Habbema JD. Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets. Stat Med. 2000;19:1059–1079. doi: [DOI] [PubMed] [Google Scholar]
- 22.Ivanescu AE, Li P, George B, Brown AW, Keith SW, Raju D, Allison DB. The importance of prediction model validation and assessment in obesity and nutrition research. Int J Obes (Lond). 2016;40:887–894. doi: 10.1038/ijo.2015.214 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Sauerbrei W Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Statist Soc Series. 1999;162:71. doi: 10.1111/1467-985X.00122 [DOI] [Google Scholar]
- 24.Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Stat Med. 2007;26:5512–5528. doi: 10.1002/sim.3148 [DOI] [PubMed] [Google Scholar]
- 25.Altman DG, Vergouwe Y, Royston P, Moons KG. Prognosis and prognostic research: validating a prognostic model. BMJ. 2009;338:b605. doi: 10.1136/bmj.b605 [DOI] [PubMed] [Google Scholar]
- 26.Steyerberg EW, Vickers AJ, Cook NR, Gerds T, Gonen M, Obuchowski N, Pencina MJ, Kattan MW. Assessing the performance of prediction models: a framework for traditional and novel measures. Epidemiology. 2010;21:128–138. doi: 10.1097/EDE.0b013e3181c30fb2 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Van Calster B, McLernon DJ, van Smeden M, Wynants L, Steyerberg EW; Topic Group “Evaluating Diagnostic Tests and Prediction Models” of the STRATOS Initiative. Calibration: the Achilles heel of predictive analytics. BMC Med. 2019;17:230. doi: 10.1186/s12916-019-1466-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Austin PC, Harrell FE Jr, van Klaveren D. Graphical calibration curves and the integrated calibration index (ICI) for survival models. Stat Med. 2020;39:2714–2742. doi: 10.1002/sim.8570 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Vickers AJ, Elkin EB. Decision curve analysis: a novel method for evaluating prediction models. Med Decis Making. 2006;26:565–574. doi: 10.1177/0272989X06295361 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 30.National Institutes of Health, Office of Science Policy. NIH genomic data sharing. Accessed October 2, 2020. https://osp.od.nih.gov/scientific-sharing/genomic-data-sharing/ [Google Scholar]
- 31.Mefford J, Witte JS. The covariate’s dilemma. PLoS Genet. 2012;8: e1003096. doi: 10.1371/journal.pgen.1003096 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 32.Greenland S Quantifying biases in causal models: classical confounding vs collider-stratification bias. Epidemiology. 2003;14:300–306. doi: 10.1097/01.EDE.0000042804.12056.6C [DOI] [PubMed] [Google Scholar]
- 33.Paternoster L, Tilling K, Davey Smith G. Genetic epidemiology and mendelian randomization for informing disease therapeutics: conceptual and methodological challenges. PLoS Genet. 2017;13:e1006944. doi: 10.1371/journal.pgen.1006944 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 34.Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–883. doi: 10.1093/bioinformatics/bts034 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Pluzhnikov A, Below JE, Konkashbaev A, Tikhomirov A, Kistner-Griffin E, Roe CA, Nicolae DL, Cox NJ. Spoiling the whole bunch: quality control aimed at preserving the integrity of high-throughput genotyping. Am J Hum Genet. 2010;87:123–128. doi: 10.1016/j.ajhg.2010.06.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–127. doi: 10.1093/biostatistics/kxj037 [DOI] [PubMed] [Google Scholar]
- 37.Grove ML, Yu B, Cochran BJ, Haritunians T, Bis JC, Taylor KD, Hansen M, Borecki IB, Cupples LA, Fornage M, et al. Best practices and joint calling of the HumanExome BeadChip: the CHARGE Consortium. PLoS One. 2013;8:e68095. doi: 10.1371/journal.pone.0068095 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Goldstein JI, Crenshaw A, Carey J, Grant GB, Maguire J, Fromer M, O’Dushlaine C, Moran JL, Chambert K, Stevens C, et al. ; Swedish Schizophrenia Consortium; ARRA Autism Sequencing Consortium. zCall: a rare variant caller for array-based genotyping: genetics and population analysis. Bioinformatics. 2012;28:2543–2545. doi: 10.1093/bioinformatics/bts479 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.TOPMed whole genome sequencing project–freeze 5b, phases 1 and 2. 2018. Accessed August 11, 2020. https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?id=phd007493.1 [Google Scholar]
- 40.Luca D, Ringquist S, Klei L, Lee AB, Gieger C, Wichmann HE, Schreiber S, Krawczak M, Lu Y, Styche A, et al. On the use of general control samples for genome-wide association studies: genetic matching highlights causal variants. Am J Hum Genet. 2008;82:453–463. doi: 10.1016/j.ajhg.2007.11.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Epstein MP, Duncan R, Broadaway KA, He M, Allen AS, Satten GA. Stratification-score matching improves correction for confounding by population stratification in case-control association studies. Genet Epidemiol. 2012;36:195–205. doi: 10.1002/gepi.21611 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Guan W, Liang L, Boehnke M, Abecasis GR. Genotype-based matching to correct for population stratification in large-scale case-control genetic association studies. Genet Epidemiol. 2009;33:508–517. doi: 10.1002/gepi.20403 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Lee AB, Luca D, Klei L, Devlin B, Roeder K. Discovering genetic ancestry using spectral graph theory. Genet Epidemiol. 2010;34:51–59. doi: 10.1002/gepi.20434 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 44.Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2:e190. doi: 10.1371/journal.pgen.0020190 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 45.Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, et al. Genes mirror geography within Europe. Nature. 2008;456:98–101. doi: 10.1038/nature07331 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11:459–463. doi: 10.1038/nrg2813 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Wu C, DeWan A, Hoh J, Wang Z. A comparison of association methods correcting for population stratification in case-control studies. Ann Hum Genet. 2011;75:418–427. doi: 10.1111/j.1469-1809.2010.00639.x [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Conomos MP, Miller MB, Thornton TA. Robust inference of population structure for ancestry prediction and correction of stratification in the presence of relatedness. Genet Epidemiol. 2015;39:276–293. doi: 10.1002/gepi.21896 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Risch N, Burchard E, Ziv E, Tang H. Categorization of humans in biomedical research: genes, race and disease. Genome Biol. 2002;3:comment2007. doi: 10.1186/gb-2002-3-7-comment2007 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Burchard EG, Ziv E, Coyle N, Gomez SL, Tang H, Karter AJ, Mountain JL, Pérez-Stable EJ, Sheppard D, Risch N. The importance of race and ethnic background in biomedical research and clinical practice. N Engl J Med. 2003;348:1170–1175. doi: 10.1056/NEJMsb025007 [DOI] [PubMed] [Google Scholar]
- 51.Fujimura JH, Rajagopalan R. Different differences: the use of “genetic ancestry” versus race in biomedical human genetic research. Soc Stud Sci. 2011;41:5–30. doi: 10.1177/0306312710379170 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 52.Fernández-Rhodes L, Young KL, Lilly AG, Raffield LM, Highland HM, Wojcik GL, Agler C, Love SM, Okello S, Petty LE, et al. Importance of genetic studies of cardiometabolic disease in diverse populations. Circ Res. 2020;126:1816–1840. doi: 10.1161/CIRCRESAHA.120.315893 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Teng M, Irizarry RA. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res. 2017;27:1930–1938. doi: 10.1101/gr.220673.117 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 54.Regier AA, Farjoun Y, Larson DE, Krasheninina O, Kang HM, Howrigan DP, Chen BJ, Kher M, Banks E, Ames DC, et al. Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat Commun. 2018;9:4038. doi: 10.1038/s41467-018-06159-4 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Escaramís G, Docampo E, Rabionet R. A decade of structural variants: description, history and methods to detect structural variation. Brief Funct Genomics. 2015;14:305–314. doi: 10.1093/bfgp/elv014 [DOI] [PubMed] [Google Scholar]
- 56.Larson DE, Abel HJ, Chiang C, Badve A, Das I, Eldred JM, Layer RM, Hall IM. svtools: population-scale analysis of structural variation. Bioinformatics. 2019;35:4782–4787. doi: 10.1093/bioinformatics/btz492 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Howie B, Marchini J, Stephens M. Genotype imputation with thou-sands of genomes. G3 (Bethesda). 2011;1:457–470. doi: 10.1534/g3.111.001198 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11:499–511. doi: 10.1038/nrg2796 [DOI] [PubMed] [Google Scholar]
- 59.Jorgensen TJ, Ruczinski I, Kessing B, Smith MW, Shugart YY, Alberg AJ. Hypothesis-driven candidate gene association studies: practical design and analytical considerations. Am J Epidemiol. 2009;170:986–993. doi: 10.1093/aje/kwp242 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81:559–575. doi: 10.1086/519795 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Marchini J, Howie B, Myers S, McVean G, Donnelly P. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007;39:906–913. doi: 10.1038/ng2088 [DOI] [PubMed] [Google Scholar]
- 62.Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Staples J, Ekunwe L, Lange E, Wilson JG, Nickerson DA, Below JE. PRIMUS: improving pedigree reconstruction using mitochondrial and Y haplotypes. Bioinformatics. 2016;32:596–598. doi: 10.1093/bioinformatics/btv618 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Staples J, Qiao D, Cho MH, Silverman EK, Nickerson DA, Below JE; University of Washington Center for Mendelian Genomics. PRIMUS: rapid reconstruction of pedigrees from genome-wide estimates of identity by descent. Am J Hum Genet. 2014;95:553–564. doi: 10.1016/j.ajhg.2014.10.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 65.Staples J, Witherspoon DJ, Jorde LB, Nickerson DA, Below JE, Huff CD; University of Washington Center for Mendelian Genomics. PADRE: Pedigree-Aware Distant-Relationship Estimation. Am J Hum Genet. 2016;99:154–162. doi: 10.1016/j.ajhg.2016.05.020 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 66.Kang HM, Sul JH, Service SK, Zaitlen NA, Kong SY, Freimer NB, Sabatti C, Eskin E. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42:348–354. doi: 10.1038/ng.548 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 67.Thornton T, McPeek MS. ROADTRIPS: case-control association testing with partially or completely unknown population and pedigree structure. Am J Hum Genet. 2010;86:172–184. doi: 10.1016/j.ajhg.2010.01.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 68.Zhou W, Nielsen JB, Fritsche LG, Dey R, Gabrielsen ME, Wolford BN, LeFaive J, VandeHaar P, Gagliano SA, Gifford A, et al. Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies. Nat Genet. 2018;50:1335–1341. doi: 10.1038/s41588-018-0184-y [DOI] [PMC free article] [PubMed] [Google Scholar]
- 69.King G, Langche Z. Logistic regression in rare events data. Political Analysis. 2001;9.2:137–163. Accessed August 4, 2020. https://gking.harvard.edu/files/0s.pdf [Google Scholar]
- 70.Firth EC, Poulos PW. Vascular characteristics of the cartilage and sub-chondral bone of the distal radial epiphysis of the young foal. N Z Vet J. 1993;41:73–77. doi: 10.1080/00480169.1993.35738 [DOI] [PubMed] [Google Scholar]
- 71.Lin DY, Tao R, Kalsbeek WD, Zeng D, Gonzalez F 2nd, Fernández-Rhodes L, Graff M, Koch GG, North KE, Heiss G. Genetic association analysis under complex survey sampling: the Hispanic Community Health Study/Study of Latinos. Am J Hum Genet. 2014;95:675–688. doi: 10.1016/j.ajhg.2014.11.005 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 72.Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet. 2014;95:5–23. doi: 10.1016/j.ajhg.2014.06.009 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 73.Nicolae DL. Association tests for rare variants. Annu Rev Genomics Hum Genet. 2016;17:117–130. doi: 10.1146/annurev-genom-083115-022609 [DOI] [PubMed] [Google Scholar]
- 74.Winkler TW, Day FR, Croteau-Chonka DC, Wood AR, Locke AE, Mägi R, Ferreira T, Fall T, Graff M, Justice AE, et al. ; Genetic Investigation of Anthropometric Traits (GIANT) Consortium. Quality control and conduct of genome-wide association meta-analyses. Nat Protoc. 2014;9:1192–1212. doi: 10.1038/nprot.2014.071 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 75.Evangelou E, Ioannidis JP. Meta-analysis methods for genome-wide association studies and beyond. Nat Rev Genet. 2013;14:379–389. doi: 10.1038/nrg3472 [DOI] [PubMed] [Google Scholar]
- 76.Shim S, Kim J, Jung W, Shin IS, Bae JM. Meta-analysis for genome-wide association studies using case-control design: application and practice. Epidemiol Health. 2016;38:e2016058. doi: 10.4178/epih.e2016058 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 77.Mägi R, Horikoshi M, Sofer T, Mahajan A, Kitajima H, Franceschini N, McCarthy MI, Morris AP; COGENT-Kidney Consortium, T2D-GENES Consortium. Trans-ethnic meta-regression of genome-wide association studies accounting for ancestry increases power for discovery and improves fine-mapping resolution. Hum Mol Genet. 2017;26:3639–3650. doi: 10.1093/hmg/ddx280 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 78.Spain SL, Barrett JC. Strategies for fine-mapping complex traits. Hum Mol Genet. 2015;24:R111–R119. doi: 10.1093/hmg/ddv260 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 79.Schaid DJ, Chen W, Larson NB. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet. 2018;19:491–504. doi: 10.1038/s41576-018-0016-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 80.Kichaev G, Roytman M, Johnson R, Eskin E, Lindström S, Kraft P, Pasaniuc B. Improved methods for multi-trait fine mapping of pleiotropic risk loci. Bioinformatics. 2017;33:248–255. doi: 10.1093/bioinformatics/btw615 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 81.Hormozdiari F, van de Bunt M, Segrè AV, Li X, Joo JWJ, Bilow M, Sul JH, Sankararaman S, Pasaniuc B, Eskin E. Colocalization of GWAS and eQTL signals detects target genes. Am J Hum Genet. 2016;99:1245–1260. doi: 10.1016/j.ajhg.2016.10.003 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 82.Wainberg M, Sinnott-Armstrong N, Mancuso N, Barbeira AN, Knowles DA, Golan D, Ermel R, Ruusalepp A, Quertermous T, Hao K, et al. Opportunities and challenges for transcriptome-wide association studies. Nat Genet. 2019;51:592–599. doi: 10.1038/s41588-019-0385-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 83.Mooney MA, Nigg JT, McWeeney SK, Wilmot B. Functional and genomic context in pathway analysis of GWAS data. Trends Genet. 2014;30:390–400. doi: 10.1016/j.tig.2014.07.004 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 84.Mooney MA, Wilmot B. Gene set analysis: a step-by-step guide. Am J Med Genet B Neuropsychiatr Genet. 2015;168:517–527. doi: 10.1002/ajmg.b.32328 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 85.de Leeuw CA, Neale BM, Heskes T, Posthuma D. The statistical properties of gene-set analysis. Nat Rev Genet. 2016;17:353–364. doi: 10.1038/nrg.2016.29 [DOI] [PubMed] [Google Scholar]
- 86.Devlin B, Roeder K. Genomic control for association studies. Biometrics. 1999;55:997–1004. doi: 10.1111/j.0006-341x.1999.00997.x [DOI] [PubMed] [Google Scholar]
- 87.Bulik-Sullivan BK, Loh PR, Finucane HK, Ripke S, Yang J, Patterson N, Daly MJ, Price AL, Neale BM; Schizophrenia Working Group of the Psychiatric Genomics Consortium. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet. 2015;47:291–295. doi: 10.1038/ng.3211 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 88.Davey Smith G, Hemani G. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum Mol Genet. 2014;23:R89–R98. doi: 10.1093/hmg/ddu328 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 89.Zheng J, Baird D, Borges MC, Bowden J, Hemani G, Haycock P, Evans DM, Smith GD. Recent developments in mendelian randomization studies. Curr Epidemiol Rep. 2017;4:330–345. doi: 10.1007/s40471-017-0128-6 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 90.Bell KJL, Loy C, Cust AE, Teixeira-Pinto A. Mendelian randomization in cardiovascular research. Circ Cardiovasc Qual Outcomes. 2021;14:e005623. doi: 10.1161/CIRCOUTCOMES.119.005623 [DOI] [PubMed] [Google Scholar]
- 91.Hicks SC, Townes FW, Teng M, Irizarry RA. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2018;19:562–578. doi: 10.1093/biostatistics/kxx053 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 92.Costa-Silva J, Domingues D, Lopes FM. RNA-seq differential expression analysis: an extended review and a software tool. PLoS One. 2017;12:e0190152. doi: 10.1371/journal.pone.0190152 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 93.Soneson C, Delorenzi M. A comparison of methods for differential ex-pression analysis of RNA-seq data. BMC Bioinformatics. 2013;14:91. doi: 10.1186/1471-2105-14-91 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 94.Finotello F, Di Camillo B. Measuring differential gene expression with RNA-seq: challenges and strategies for data analysis. Brief Funct Genomics. 2015;14:130–142. doi: 10.1093/bfgp/elu035 [DOI] [PubMed] [Google Scholar]
- 95.Li WV, Li JJ. Modeling and analysis of RNA-seq data: a review from a statistical perspective. Quant Biol. 2018;6:195–209. doi: 10.1007/s40484-018-0144-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 96.Hrdlickova R, Toloue M, Tian B. RNA-seq methods for transcriptome analysis. Wiley Interdisciplinary Reviews RNA. 2017; 8. doi: 10.1002/wrna.1364 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 97.Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A, McPherson A, Szcześniak MW, Gaffney DJ, Elo LL, Zhang X, et al. A survey of best practices for RNA-seq data analysis. Genome Biol. 2016;17:13. doi: 10.1186/s13059-016-0881-8 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 98.Wang Z, Gerstein M, Snyder M. RNA-seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10:57–63. doi: 10.1038/nrg2484 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 99.Chan AW, Tetzlaff JM, Altman DG, Laupacis A, Gøtzsche PC, Krleža-Jerić K, Hróbjartsson A, Mann H, Dickersin K, Berlin JA, et al. SPIRIT 2013 statement: defining standard protocol items for clinical trials. Ann Intern Med. 2013;158:200–207. doi: 10.7326/0003-4819-158-3-201302050-00583 [DOI] [PMC free article] [PubMed] [Google Scholar]; 99a. CONSORT. The CONSORT statement. Accessed August 11, 2020. http://www.consort-statement.org/ [Google Scholar]
- 100.Taves DR. The use of minimization in clinical trials. Contemp Clin Trials. 2010;31:180–184. doi: 10.1016/j.cct.2009.12.005 [DOI] [PubMed] [Google Scholar]
- 101.Berger VW. Minimization, by its nature, precludes allocation concealment, and invites selection bias. Contemp Clin Trials. 2010;31:406. doi: 10.1016/j.cct.2010.05.001 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 102.Taves DR. Minimization does not by its nature preclude allocation concealment and invite selection bias, as Berger claims. Contemp Clin Trials. 2011;32:323. doi: 10.1016/j.cct.2010.12.010 [DOI] [PubMed] [Google Scholar]
- 103.Taves DR. Rank-minimization with a two-step analysis should replace randomization in clinical trials. J Clin Epidemiol. 2012;65:3–6. doi: 10.1016/j.jclinepi.2011.06.020 [DOI] [PubMed] [Google Scholar]
- 104.Kahan BC. Rank minimization with a two-step analysis should not replace randomization in clinical trials. J Clin Epidemiol. 2012;65:808–809. doi: 10.1016/j.jclinepi.2012.02.002 [DOI] [PubMed] [Google Scholar]
- 105.Morris T Rank minimization with a two-step analysis should not replace randomization in clinical trials. J Clin Epidemiol. 2012;65:810–811. doi: 10.1016/j.jclinepi.2012.02.005 [DOI] [PubMed] [Google Scholar]
- 106.Datamethods. Accessed August 11, 2020. https://discourse.datamethods.org/t/should-we-ignore-covariate-imbalance-and-stop-presenting-a-stratified-table-one-for-randomized-trials [Google Scholar]
- 107.Altman DG. Comparability of randomized groups. J R Statist Soc D. 1985;34:125–136. [Google Scholar]
- 108.Altman DG, Doré CJ. Randomisation and baseline comparisons in clinical trials. Lancet. 1990;335:149–153. doi: 10.1016/0140-6736(90)90014-v [DOI] [PubMed] [Google Scholar]
- 109.Begg CB. Suspended judgment: significance tests of covariate imbalance in clinical trials. Control Clin Trials. 1990;11:223–225. doi: 10.1016/0197-2456(90)90037-3 [DOI] [PubMed] [Google Scholar]
- 110.Senn S Baseline comparisons in randomized clinical trials. Stat Med. 1991;10:1157–1159. doi: 10.1002/sim.4780100715 [DOI] [PubMed] [Google Scholar]
- 111.Senn S Testing for baseline balance in clinical trials. Stat Med. 1994;13:1715–1726. doi: 10.1002/sim.4780131703 [DOI] [PubMed] [Google Scholar]
- 112.Lineberry N, Berlin JA, Mansi B, Glasser S, Berkwits M, Klem C, Bhattacharya A, Citrome L, Enck R, Fletcher J, et al. Recommendations to improve adverse event reporting in clinical trial publications: a joint pharmaceutical industry/journal editor perspective. BMJ. 2016;355:i5078. doi: 10.1136/bmj.i5078 [DOI] [PubMed] [Google Scholar]
- 113.PRISMA Preferred Reporting Items for Systematic Reviews and Meta-Analyses: the PRISMA statement. Accessed August 11, 2020. http://www.prisma-statement.org/PRISMAStatement/Default.aspx [Google Scholar]
- 114.Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow CD, Shamseer L, Tetzlaff JM, Moher D. Updating guidance for reporting systematic reviews: development of the PRISMA 2020 statement. J Clin Epidemiol. 2021;134:103–112. doi: 10.1016/j.jclinepi.2021.02.003 [DOI] [PubMed] [Google Scholar]
- 115.Shamseer L, Moher D, Clarke M, Ghersi D, Liberati A, Petticrew M, Shekelle P, Stewart LA; PRISMA-P Group. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015: elaboration and explanation. BMJ. 2015;350:g7647. doi: 10.1136/bmj.g7647 [DOI] [PubMed] [Google Scholar]
- 116.Rao G, Lopez-Jimenez F, Boyd J, D’Amico F, Durant NH, Hlatky MA, Howard G, Kirley K, Masi C, Powell-Wiley TM, et al. ; on behalf of the American Heart Association Council on Lifestyle and Cardiometabolic Health; Council on Cardiovascular and Stroke Nursing; Council on Cardiovascular Surgery and Anesthesia; Council on Clinical Cardiology; Council on Functional Genomics and Translational Biology; and Stroke Council. Methodological standards for meta-analyses and qualitative systematic reviews of cardiac prevention and treatment studies: a scientific statement from the American Heart Association. Circulation. 2017;136:e172–e194. doi: 10.1161/CIR.0000000000000523 [DOI] [PubMed] [Google Scholar]
- 117.PROSPERO International Prospective Register of Systematic Reviews. Accessed August 11, 2020. https://www.crd.york.ac.uk/prospero/ [Google Scholar]; 117a. National Toxicology Program. Risk of bias tools. August 11, 2020. https://ntp.niehs.nih.gov/go/ohat_tools [Google Scholar]; 117b. Higgins JTP, Altman DG, Sterne JAC. Assessing risk of bias in included studies. In: Higgins JTP, Green S, eds. Cochrane Handbook for Systematic Reviews of Interventions. Version 5.1.0. The Cochrane Collaboration; 2011:chap 8. [Google Scholar]
- 118.UCLA Institute for Digital Research & Education Statistical Consulting. Testing the proportional hazard assumption in Cox models. Accessed August 11, 2020. https://stats.idre.ucla.edu/other/examples/asa2/testing-the-proportional-hazard-assumption-in-cox-models/ [Google Scholar]
- 119.Morris TP, Jarvis CI, Cragg W, Phillips PPJ, Choodari-Oskooei B, Sydes MR. Proposals on Kaplan-Meier plots in medical research and a survey of stakeholder views: KMunicate. BMJ Open. 2019;9:e030215. doi: 10.1136/bmjopen-2019-030215 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 120.Pocock SJ, Clayton TC, Altman DG. Survival plots of time-to-event out-comes in clinical trials: good practice and pitfalls. Lancet. 2002;359:1686–1689. doi: 10.1016/S0140-6736(02)08594-X [DOI] [PubMed] [Google Scholar]
- 121.Andersen PK, Gill RD. Cox’s regression model for counting processes: a large sample study. Ann Statist. 1982;10:1100–1120. [Google Scholar]
- 122.Prentice RL, Williams BJ, Peterson AV. On the regression analysis of multivariate failure time data. Biometrika. 1981;68:373–379. [Google Scholar]
- 123.Wei LJ, Lin DY, Weissfeld L. Regression analysis of multivariate incomplete failure time data by modelling marginal distributions. J Am Stat Assoc. 1989;84:1065–1073. [Google Scholar]
- 124.Suissa S Immortal time bias in pharmaco-epidemiology. Am J Epidemiol. 2008;167:492–499. doi: 10.1093/aje/kwm324 [DOI] [PubMed] [Google Scholar]
- 125.Reardon MJ, Van Mieghem NM, Popma JJ, Kleiman NS, Søndergaard L, Mumtaz M, Adams DH, Deeb GM, Maini B, Gada H, et al. ; SURTAVI Investigators. Surgical or transcatheter aortic-valve replacement in intermediate-risk patients. N Engl J Med. 2017;376:1321–1331. doi: 10.1056/NEJMoa1700456 [DOI] [PubMed] [Google Scholar]
- 126.Nogueira RG, Jadhav AP, Haussen DC, Bonafe A, Budzik RF, Bhuva P, Yavagal DR, Ribo M, Cognard C, Hanel RA, et al. ; DAWN Trial Investigators. Thrombectomy 6 to 24 hours after stroke with a mismatch between deficit and infarct. N Engl J Med. 2018;378:11–21. doi: 10.1056/NEJMoa1706442 [DOI] [PubMed] [Google Scholar]
- 127.Spiegelhalter DJ, Abrams KR, Myles JP. Bayesian Approaches to Clinical Trials and Health-Care Evaluation. John Wiley & Sons, Ltd; 2003. [Google Scholar]
- 128.Bittl JA, He Y. Bayesian analysis: a practical approach to interpret clinical trials and create clinical practice guidelines. Circ Cardiovasc Qual Outcomes. 2017;10:e003563. doi: 10.1161/CIRCOUTCOMES.117.003563 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 129.US Food and Drug Administration. Guidance for the use of bayesian statistics in medical device clinical trials. 2010. Accessed August 11, 2020. https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials [Google Scholar]
- 130.Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, Frangakis C, Hogan JW, Molenberghs G, Murphy SA, et al. The prevention and treatment of missing data in clinical trials. N Engl J Med. 2012;367:1355–1360. doi: 10.1056/NEJMsr1203730 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 131.Perkins NJ, Cole SR, Harel O, Tchetgen Tchetgen EJ, Sun B, Mitchell EM, Schisterman EF. Principled approaches to missing data in epidemiologic studies. Am J Epidemiol. 2018;187:568–575. doi: 10.1093/aje/kwx348 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 132.White IR, Royston P, Wood AM. Multiple imputation using chained equations: issues and guidance for practice. Stat Med. 2011;30:377–399. doi: 10.1002/sim.4067 [DOI] [PubMed] [Google Scholar]
- 133.Li P, Stuart EA, Allison DB. Multiple imputation: a flexible tool for handling missing data. JAMA. 2015;314:1966–1967. doi: 10.1001/jama.2015.15281 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 134.George BJ, Beasley TM, Brown AW, Dawson J, Dimova R, Divers J, Goldsby TU, Heo M, Kaiser KA, Keith SW, et al. Common scientific and statistical errors in obesity research. Obesity (Silver Spring). 2016;24:781–790. doi: 10.1002/oby.21449 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 135.Campbell MK, Piaggio G, Elbourne DR, Altman DG; CONSORT Group. CONSORT 2010 statement: extension to cluster randomised trials. BMJ. 2012;345:e5661. doi: 10.1136/bmj.e5661 [DOI] [PubMed] [Google Scholar]
- 136.Fitzmaurice GM, Ravichandran C. A primer in longitudinal data analysis. Circulation. 2008;118:2005–2010. doi: 10.1161/CIRCULATIONAHA.107.714618 [DOI] [PubMed] [Google Scholar]
- 137.Tooth L, Ware R, Bain C, Purdie DM, Dobson A. Quality of reporting of observational longitudinal research. Am J Epidemiol. 2005;161:280–288. doi: 10.1093/aje/kwi042 [DOI] [PubMed] [Google Scholar]
- 138.Niven DJ, Berthiaume LR, Fick GH, Laupland KB. Matched case-control studies: a review of reported statistical methodology. Clin Epidemiol. 2012;4:99–110. doi: 10.2147/CLEP.S30816 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 139.Brown AW, Li P, Bohan Brown MM, Kaiser KA, Keith SW, Oakes JM, Allison DB. Best (but oft-forgotten) practices: designing, analyzing, and reporting cluster randomized controlled trials. Am J Clin Nutr. 2015;102:241–248. doi: 10.3945/ajcn.114.105072 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 140.Altman DG, Gore SM, Gardner MJ, Pocock SJ. Statistical guidelines for contributors to medical journals. BMJ (Clin Res Ed). 1983;286:1489–1493. doi: 10.1136/bmj.286.6376.1489 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 141.McCulloch CE, Searle SR, Neuhaus JM. Generalized, Linear, and Mixed Models. 2nd ed. Wiley & Sons; 2008. [Google Scholar]
- 142.Diggle PJ, Heagerty P, Liang KY, Zeger SL. Longitudinal Data Analysis. 2nd ed. Oxford University Press; 2002. [Google Scholar]
- 143.Akaike H A new look at the statistical model identification. IEEE Trans Automat Contr. 1974;19:716–723. [Google Scholar]
- 144.Hernández AV, Steyerberg EW, Habbema JD. Covariate adjustment in randomized controlled trials with dichotomous outcomes increases statistical power and reduces sample size requirements. J Clin Epidemiol. 2004;57:454–460. doi: 10.1016/j.jclinepi.2003.09.014 [DOI] [PubMed] [Google Scholar]
- 145.Hernández AV, Eijkemans MJ, Steyerberg EW. Randomized controlled trials with time-to-event outcomes: how much does prespecified covariate adjustment increase power? Ann Epidemiol. 2006;16:41–48. doi: 10.1016/j.annepidem.2005.09.007 [DOI] [PubMed] [Google Scholar]
- 146.Kahan BC, Jairath V, Doré CJ, Morris TP. The risks and rewards of covariate adjustment in randomized trials: an assessment of 12 outcomes from 8 studies. Trials. 2014;15:139. doi: 10.1186/1745-6215-15-139 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 147.von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC, Vandenbroucke JP; STROBE Initiative. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Ann Intern Med. 2007;147:573–577. doi: 10.7326/0003-4819-147-8-200710160-00010 [DOI] [PubMed] [Google Scholar]
- 148.Austin PC. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behav Res. 2011;46:399–424. doi: 10.1080/00273171.2011.568786 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 149.Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28:3083–3107. doi: 10.1002/sim.3697 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 150.Austin PC. The relative ability of different propensity score methods to balance measured covariates between treated and untreated subjects in observational studies. Med Decis Making. 2009;29:661–677. doi: 10.1177/0272989X09341755 [DOI] [PubMed] [Google Scholar]
- 151.Austin PC, Stuart EA. Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat Med. 2015;34:3661–3679. doi: 10.1002/sim.6607 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 152.Secemsky EA, Kirtane A, Bangalore S, Jovin IS, Patel D, Ferro EG, Wimmer NJ, Roe M, Dai D, Mauri L, et al. Practice patterns and in-hospital outcomes associated with bivalirudin use among patients with non-ST-segment-elevation myocardial infarction undergoing percutaneous coronary intervention in the United States. Circ Cardiovasc Qual Outcomes. 2017;10:e003741. doi: 10.1161/CIRCOUTCOMES.117.003741 [DOI] [PubMed] [Google Scholar]
- 153.Wimmer NJ, Secemsky EA, Mauri L, Roe MT, Saha-Chaudhuri P, Dai D, McCabe JM, Resnic FS, Gurm HS, Yeh RW. Effectiveness of arterial closure devices for preventing complications with percutaneous coronary intervention: an instrumental variable analysis. Circ Cardiovasc Interv. 2016;9:e003464. doi: 10.1161/CIRCINTERVENTIONS.115.003464 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 154.Yeh RW, Vasaiwala S, Forman DE, Silbaugh TS, Zelevinski K, Lovett A, Normand SL, Mauri L. Instrumental variable analysis to compare effectiveness of stents in the extremely elderly. Circ Cardiovasc Qual Outcomes. 2014;7:118–124. doi: 10.1161/CIRCOUTCOMES.113.000476 [DOI] [PubMed] [Google Scholar]
- 155.Secemsky EA, Ferro EG, Rao SV, Kirtane A, Tamez H, Zakroysky P, Wojdyla D, Bradley SM, Cohen DJ, Yeh RW. Association of physician variation in use of manual aspiration thrombectomy with outcomes following primary percutaneous coronary intervention for ST-elevation myocardial infarction: the National Cardiovascular Data Registry CathPCI Registry. JAMA Cardiol. 2019;4:110–118. doi: 10.1001/jamacardio.2018.4472 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 156.Secemsky EA, Kirtane A, Bangalore S, Jovin IS, Shah RM, Ferro EG, Wimmer NJ, Roe M, Dai D, Mauri L, et al. Use and effectiveness of bivalirudin versus unfractionated heparin for percutaneous coronary intervention among patients with ST-segment elevation myocardial infarction in the United States. JACC Cardiovasc Interv. 2016;9:2376–2386. doi: 10.1016/j.jcin.2016.09.020 [DOI] [PubMed] [Google Scholar]
- 157.Maciejewski ML, Brookhart MA. Using instrumental variables to address bias from unobserved confounders. JAMA. 2019;321:2124–2125. doi: 10.1001/jama.2019.5646 [DOI] [PubMed] [Google Scholar]
- 158.Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA. 2007;297:278–285. doi: 10.1001/jama.297.3.278 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 159.Brookhart MA, Rassen JA, Schneeweiss S. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol Drug Saf. 2010;19:537–554. doi: 10.1002/pds.1908 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 160.Swanson SA, Hernán MA. Commentary: how to report instrumental variable analyses (suggestions welcome). Epidemiology. 2013;24:370–374. doi: 10.1097/EDE.0b013e31828d0590 [DOI] [PubMed] [Google Scholar]
- 161.Davies NM, Smith GD, Windmeijer F, Martin RM. Issues in the reporting and conduct of instrumental variable studies: a systematic review. Epidemiology. 2013;24:363–369. doi: 10.1097/EDE.0b013e31828abafb [DOI] [PubMed] [Google Scholar]
- 162.Davies NM, Smith GD, Windmeijer F, Martin RM. COX-2 selective nonsteroidal anti-inflammatory drugs and risk of gastrointestinal tract complications and myocardial infarction: an instrumental variable analysis. Epidemiology. 2013;24:352–362. doi: 10.1097/EDE.0b013e318289e024 [DOI] [PubMed] [Google Scholar]
- 163.Cook JA, Julious SA, Sones W, Hampson LV, Hewitt C, Berlin JA, Ashby D, Emsley R, Fergusson DA, Walters SJ, et al. DELTA(2) guidance on choosing the target difference and undertaking and reporting the sample size calculation for a randomised controlled trial. Br Med J. 2018;363:k3750. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 164.Hoenig JM, Heisey DM. The abuse of power: the pervasive fallacy of power calculations for data analysis. Am Statistician. 2001;55:19–24. doi: 10.1198/000313001300339897 [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
