Skip to main content
Indian Journal of Psychological Medicine logoLink to Indian Journal of Psychological Medicine
. 2023 Nov 22;45(6):640–641. doi: 10.1177/02537176231216842

Types of Analysis: Planned (prespecified) vs Post Hoc, Primary vs Secondary, Hypothesis-driven vs Exploratory, Subgroup and Sensitivity, and Others

Chittaranjan Andrade 1,
PMCID: PMC10964884  PMID: 38545527

Abstract

In research, there are different, overlapping ways in which the plan of analysis may be described. This article explains planned (prespecified) vs post hoc, primary vs secondary, hypothesis-driven vs exploratory, and subgroup and sensitivity analyses; intent-to-treat vs per-protocol vs completer analysis was explained in an earlier article in this column. A prespecified analysis is one that is outlined before the study starts; it is usually separated into primary and secondary analyses for the single primary outcome and the many secondary outcomes, respectively. Exploratory analyses examine relationships between variables in the absence of well-defined expectations; these tend to be statistical fishing expeditions. Subgroup analyses examine findings in categories of subjects in the sample. Sensitivity analyses examine whether the findings remain similar if the data are handled in a way that differs from the original plan of analysis. Post hoc analyses examine hypotheses that are conceptualized after the data are seen. All these terms are explained with the help of examples, and strengths and limitations are briefly discussed.

Keywords: Planned (prespecified) vs post hoc analysis, Primary vs secondary analysis, Hypothesis driven vs exploratory analysis, Intent-to-treat vs per-protocol vs completer analysis, Sensitivity analysis, Subgroup analysis


Among young researchers, understanding terminology improves clarity of thought and hence the quality of studies designed and conducted. There are different, overlapping ways in which research design can be described; these were explained in earlier articles in this column. Likewise, there are different, overlapping ways in which the plan of analysis may be described; one of these, intent-to-treat vs per-protocol vs completer analysis, was explained before. 1 This article explains other terms used to describe the plan of analysis in research.

Planned vs Post Hoc Analyses

A planned analysis, also known as a prespecified analysis, is one that is explicitly outlined in the study protocol before the study is started. In a planned analysis, general and specific objectives are stated, main and subsidiary hypotheses are framed, primary and secondary outcomes are defined, and statistical procedures expected to be applied are described. No deviation from this plan is permitted without a formal protocol amendment unless there are unexpected developments that make the original plan of analysis unviable. Planned analyses are desirable because, for example, they allow sample size estimation, and because they reduce the risk of HARKing, cherry-picking, p-hacking, fishing expeditions, and data dredging and mining, all of which are dubious research practices.2,3

A post hoc analysis is an unplanned analysis. For example, after the study is completed and the data are examined, something interesting or unexpected in the data may catch the eye; the investigator may then conduct an analysis that was not previously planned in order to pursue the new lead. As a positive, post hoc analyses may pick up a novelty, leading to new ideas, new hypotheses, and new research. As a negative, post hoc analyses may identify false positive results; that is, findings that exist only in the data set being examined.

Primary vs Secondary Analyses

A prespecified analysis has two components: the primary analysis (singular) and the secondary analyses (plural). The primary analysis is the analysis of the primary outcome and secondary analyses are analyses of the secondary outcomes. Findings in the primary analysis are deemed important; findings in secondary analyses, while potentially important, must be viewed with caution for at least two reasons: if some findings are statistically significant, for all we know these may be false positives due to the many analyses conducted, and if some are statistically nonsignificant, for all we know these may be false negatives because the analyses were underpowered (estimation of sample size for a study is based on the primary outcome measure). Large studies, including meta-analyses, may present dozens of secondary analyses in supplementary materials.

Hypothesis-driven vs Exploratory Analyses

Primary and secondary analyses examine prespecified research questions or hypotheses; so, these analyses are hypotheses-driven. Exploratory analyses, whether prespecified or not, examine relationships between variables in the absence of well-defined expectations. Thus, exploratory analyses are usually statistical fishing expeditions and are best discouraged; if conducted, the findings should be regarded with much caution.

Subgroup and Sensitivity Analyses

These are usually prespecified and are either hypothesis-driven or exploratory; sometimes, these are specified post hoc.A subgroup analysis examines whether the findings differ between discrete categories of subjects in the sample. A sensitivity analysis examines whether the findings remain the same if the data are tackled in a way that varies slightly from the original analysis; for example, the sample may be restricted by the exclusion of subjects with certain characteristics, or the analysis may be re-run with a different definition for an outcome. In subgroup analysis, there is no change in the way in which analyses are run; in sensitivity analyses, subgroups are not compared.

Hypothetical Example

Imagine a randomized controlled trial in which we study whether galantamine, as compared with placebo, improves cognition in schizophrenia. In the prespecified, hypothesis-driven analysis, we plan to examine whether galantamine, relative to placebo, improves the primary outcome, the MATRICS Consensus Cognitive Battery (MCCB) composite score; this will be the single primary analysis. In the prespecified, hypothesis-driven analysis, we also plan to examine whether galantamine, relative to placebo, improves secondary outcomes such as scores in individual MCCB domains, negative symptom ratings on two different scales, a measure of quality of life, a measure of disability, and global functioning; these will be the many secondary analyses.

In exploratory analyses, we examine whether the galantamine vs placebo improvement in the MCCB composite score and improvement in individual MCCB domains depend on age, sex, educational attainment, marital status, employment status, smoking, alcohol use, presence of metabolic syndrome, and other sociodemographic and clinical variables; this is the fishing expedition that is based more on hope than on hypotheses.

In a prespecified subgroup analysis, we examine whether the galantamine vs placebo improvement in the MCCB composite score is different in different subgroups, such as in men vs women, those with vs without lifetime cannabis use, and those with vs without prominent negative symptoms at baseline.

In a prespecified sensitivity analysis, we examine whether our findings remain the same or change when, for example, we study galantamine vs placebo outcomes only in patients who were not treatment-refractory at baseline, or only in chronically ill patients; that is, we examine whether the findings are sensitive to treatment-refractoriness or chronicity of illness. Sensitivity analyses can also examine whether the findings change when different cut-offs are used for definition of response, if improvement in MCCB scores was thus categorized.

After this study ended, we noted that few patients reached the target dose of galantamine; different patients took widely different average doses across the course of the trial. Therefore, in a post hoc analysis, we examine whether galantamine vs placebo improvement in the MCCB composite score varied with the average dose that patients received.

Parting Notes

In meta-analysis, sensitivity analyses could involve excluding from analysis studies with outlying results, studies that were conducted in a particular geographic region, studies conducted on patients with specific characteristics such as treatment refractoriness, studies conducted with subtherapeutic drug dosing, and so on. Readers are encouraged to pay attention to how prespecified, primary, secondary, exploratory, subgroup, sensitivity, intent-to-treat, completer, per-protocol, and other analyses are defined in studies that are published.

Footnotes

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author received no financial support for the research, authorship and/or publication of this article.

References

  • 1.Andrade C. Intent-to-treat (ITT) vs completer or per-protocol analysis in randomized controlled trials. Indian J Psychol Med, 2022; 44(4): 416–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Andrade C. The primary outcome measure and its importance in clinical trials. J Clin Psychiatry, 2015; 76(10): e1320–e1323. [DOI] [PubMed] [Google Scholar]
  • 3.Andrade C. HARKing, Cherry-Picking, P-Hacking, Fishing Expeditions, and Data Dredging and Mining as Questionable Research Practices. J Clin Psychiatry, 2021; 82(1): 20f13804. [DOI] [PubMed] [Google Scholar]

Articles from Indian Journal of Psychological Medicine are provided here courtesy of Indian Psychiatric Society South Zonal Branch

RESOURCES