Skip to main content
American Journal of Public Health logoLink to American Journal of Public Health
editorial
. 2015 Jul;105(Suppl 3):S371–S373. doi: 10.2105/AJPH.2014.302267

Small Is Essential: Importance of Subpopulation Research in Cancer Control

Shobha Srinivasan 1, Richard P Moser 1, Gordon Willis 1, William Riley 1, Mark Alexander 1, David Berrigan 1, Sarah Kobrin 1
PMCID: PMC4455491  PMID: 25905825

The ability to harness the benefits of “big data” has had a revolutionary impact on science, with its focus on the volume and variety of data sources, and application of both traditional and innovative analytic methods appropriate for large, aggregated data sets. We are concerned, however, about the opposite: “small data,” for which the size, dispersion, or accessibility of the population of interest makes it difficult to obtain adequate sample sizes to test specific research questions. Examples include racial or ethnic subpopulations (e.g., Honduran Latin Americans), populations occurring in specific geographic areas (e.g., reservations), and populations that have relatively rare characteristics (e.g., transgender persons). A great challenge is determining when a small group is of practical or theoretical interest (Figure 1). We define “practical and theoretical interest” broadly to include issues involving social justice, biological or geographic factors, and disease burden.1 Ultimately, it is critical to ensure that all segments of the US population benefit from this research and from the latest technologic advances in cancer care services and delivery.

FIGURE 1—

FIGURE 1—

Research with small data: identifying challenges.

INCLUDING UNDERREPRESENTED GROUPS IN RESEARCH

An example of the potential negative ramifications of not including underrepresented groups in research—or inappropriately aggregating them across groups—comes from the study of racial and ethnic health disparities and issues of equity in the United States. Intervention research often does not include a wide range of racial/ethnic subgroups; so it is not feasible to test whether an intervention created specifically for the majority group is also efficacious for the subgroups. Likewise, the ability to test whether an intervention can be altered for a particular subgroup is also often not possible. Epidemiological and surveillance research usually involves the inclusion of “minority or underserved populations” in addition to White or non-Hispanic White (NHW) groups. While this has allowed for a better understanding of these smaller populations and provides some progress toward addressing health inequities, there remain pockets of communities that are severely underrepresented within the broader “minority and underserved populations.”2–6

As a further example, although Asian Americans as a whole have high incomes and good health outcomes overall when compared with NHWs, Hispanics, African Americans, and American Indian/Alaska Natives, this generalized statistic masks the fact that subgroups of Asian Americans, such as the Cambodians and Hmong, lag severely behind other Asian Americans.3,4,7,8 Even within the NHW population there are communities that have long been disadvantaged (such as those living in Appalachian states), with low levels of income, literacy, and health outcomes.9–11 These subgroups have generally been omitted or excluded from the research process because of challenges with identification and recruitment. Through this commentary, we hope to encourage research in subpopulations; we recommend both the development of new methods and the innovative use of existing methodological and analytic strategies across both intervention and epidemiological research.

ALTERNATIVE STUDY DESIGNS AND ANALYSIS PROCEDURES

There is growing recognition that to implement interventions in small populations, it may be necessary to consider alternative study designs, such as the use of single-case designs attributing propensity scores, and randomized group designs. In 2013, many of the studies submitted to the Division of Cancer Control and Population Sciences at the National Cancer Institute (NCI) on subpopulation research that did not score well in peer review received comments that the randomized clinical trial design was not appropriate because the sample size was insufficient to detect changes in the effect of the intervention. This criticism raises the question of whether these studies would be better suited for alternatives to the standard randomized control trial design, such as single-case, within-subject controls, and a variety of quasi-experimental designs.

One solution for testing interventions in small samples is to focus on within-rather than between-group designs. Because a within-group design uses the sample as its own control, there is no need for a separate control group, reducing by up to half the sample size required for accurate statistical comparisons. Among group designs, there are a number of quasi-experimental approaches that could be considered, including interrupted time series12–15 and stepped wedge designs, the latter being particularly useful for studies in which there are distinct and dispersed cohorts or communities in which the intervention can be rolled out in a staggered manner.16,17 Single-case studies involving a series of N-of-1 trials could be used to test intervention adaptations in an iterative manner, and Bayesian estimates can be produced from this series of trials to evaluate the potential generalizability of the findings to the subpopulation.18 Within-subject designs require more intensive longitudinal data than typically obtained through between-subject designs, but the advent of technologies for capturing temporally dense data, such as ecological momentary assessment and passive sensor technologies, makes these approaches more viable. Such data could also be used in conjunction with multilevel analyses of behavior across different spatial areas. This kind of study design can be statistically powerful, even with modest numbers of samples per geographic unit.19

For epidemiological research, innovative recruitment methods may be very useful. For example, respondent-driven sampling20,21 has been successfully employed to identify and recruit groups for studies in which there is no existing sample frame, such as drug addicts or ethnic subgroups. Innovative analytic approaches, such as integrative data analysis,22 could be employed where independent data sets are combined together and analyzed as a whole to produce adequate representation and sample sizes. Integrative data analysis can also be used to combine data across multiple iterations of the same national survey where any one sample does not constitute an adequate sample size.

ADDRESSING THE CHALLENGE OF SMALL DATA

The National Institutes of Health (NIH)—and by extension the NCI—has an obligation to conduct research to improve the health of all Americans, not just the health of the majority population or those who are easy to identify. We therefore recommend the development and the use of methodological and analytic procedures to allow subpopulations to be meaningfully included in research. Figure 1 illustrates a model for determining when a “small” group is of research interest. However, it is also clear that other entities need to be involved in identifying populations of interest and in developing initiatives to address these groups, not just those who are responsible for grant funding decisions. For example, at the NIH, training for peer reviewers in study sections may be needed to ensure that they are knowledgeable about these innovative methods so that sound, rigorous scientific applications that employ them are understood and scored appropriately.

In addressing the above issues, NCI is planning a workshop to address three areas related to small populations:

  • (1) identification, recruitment, and retention strategies;

  • (2) epidemiological design and analytic approaches for small samples; and

  • (3) intervention design and analytic approaches for subpopulations.

Based on the products of this workshop and responses to this editorial, the NCI will explore next steps to strengthen subpopulation research.

Acknowledgments

The authors would like to thank and acknowledge the guidance and support of Robert T. Croyle, PhD. The authors would also like to thank Kathleene Ulanday, MPH, for assistance with the citations.

References

  • 1.Waller LA, Gotway CA. Applied Spatial Statistics for Public Health Data. Hoboken, NJ: John Wiley & Sons; 2004. [Google Scholar]
  • 2.Yancey AK, Ortega AN, Kumanyika SK. Effective recruitment and retention of minority research participants. Annu Rev Public Health. 2006;27:1–28. doi: 10.1146/annurev.publhealth.27.021405.102113. [DOI] [PubMed] [Google Scholar]
  • 3.Islam NS, Khan S, Kwon S, Jang D, Ro M, Trinh-Shevrin C. Methodological issues in the collection, analysis, and reporting of granular data in Asian American populations: historical challenges and potential solutions. J Health Care Poor Underserved. 2010;21(4):1354–1381. doi: 10.1353/hpu.2010.0939. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 4.Ghosh C. Healthy People 2010 and Asian Americans/Pacific Islanders: defining a baseline of information. Am J Public Health. 2003;93(12):2093–2098. doi: 10.2105/ajph.93.12.2093. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Institute of Medicine. The Health of Lesbian, Gay, Bisexual, and Transgender Aging: Biographical Approaches for Inclusive Care and Support. London, UK: Jessica Kingsley; 2012. [Google Scholar]
  • 6.Fredriksen-Goldsen KI, Kim HJ, Barkan SE, Muraco A, Hoy-Ellis CP. Health disparities among lesbian, gay, and bisexual older adults: results from a population-based study. Am J Public Health. 2013;103(10):1802–1809. doi: 10.2105/AJPH.2012.301110. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Chen MS, Jr, Hawks BL. A debunking of the myth of healthy Asian Americans and Pacific Islanders. Am J Health Promot. 1995;9(4):261–268. doi: 10.4278/0890-1171-9.4.261. [DOI] [PubMed] [Google Scholar]
  • 8.National Heart, Lung, and Blood Institute. Addressing Cardiovascular Health in Asian American and Pacific Islanders: a Background Report. 2000. Available at: http://www.nhlbi.nih.gov/health/prof/heart/other/aapibkgd/aapibkgd.pdf. Accessed June 25, 2014.
  • 9.Behringer B, Friedell GH. Appalachia: where place matters in health. Prev Chronic Dis. 2006;3:A113. [PMC free article] [PubMed] [Google Scholar]
  • 10.Appalachian Regional Commission. Underlying Socioeconomic Factors Influencing Health Disparities in the Appalachian Region. Final Report. 2008. Available at: http://www.arc.gov/assets/research_reports/SocioeconomicFactorsInfluencingHealthDisparitiesinAppalachianRegion5.pdf. Accessed June 25, 2014.
  • 11.Vanderpool RC, Gainor SJ, Conn ME, Spencer C, Allen AR, Kennedy S. Adapting and implementing evidence-based cancer education interventions in rural Appalachia: real-world experiences and challenges. Rural Remote Health. 2011;11(4):1807. [PMC free article] [PubMed] [Google Scholar]
  • 12.Cook TD, Campbell DT. Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston, MA: Houghton Mifflin Company; 1979. [Google Scholar]
  • 13.Hartmann DP, Gottman JM, Jones RR, Gardner W, Kazdin AE, Vaught RS. Interrupted time-series analysis and its application to behavioral data. J Appl Behav Anal. 1980;13(4):543–559. doi: 10.1901/jaba.1980.13-543. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Gillings D, Makuc D, Siegel E. Analysis of interrupted time series mortality trends: an example to evaluate regionalized perinatal care. Am J Public Health. 1981;71(1):38–46. doi: 10.2105/ajph.71.1.38. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Biglan A, Ary D, Wagenaar AC. The value of interrupted time-series experiments for community intervention research. Prev Sci. 2000;1(1):31–49. doi: 10.1023/a:1010024016308. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Mdege ND, Man MS, Taylor Nee Brown CA, Torgerson DJ. Systematic review of stepped wedge cluster randomized trials shows that design is particularly used to evaluate interventions during routine implementation. J Clin Epidemiol. 2011;64(9):936–948. doi: 10.1016/j.jclinepi.2010.12.003. [DOI] [PubMed] [Google Scholar]
  • 17.Woertman W, de Hoop E, Moerbeek M, Zuidema SU, Gerritsen DL, Teerenstra S. Stepped wedge designs could reduce the required sample size in cluster randomized trials. J Clin Epidemiol. 2013;66(7):752–758. doi: 10.1016/j.jclinepi.2013.01.009. [DOI] [PubMed] [Google Scholar]
  • 18.Zucker DR, Ruthazer R, Schmid CH. Individual (N-of-1) trials can be combined to give population comparative treatment effect estimates: methodologic considerations. J Clin Epidemiol. 2010;63(12):1312–1323. doi: 10.1016/j.jclinepi.2010.04.020. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Raudenbush SW, Bryk AS. Hierarchical Linear Models. 2nd ed. Thousand Oaks, CA: Sage Publications; 2002. [Google Scholar]
  • 20.Heckathorn DD. Respondent-driven sampling: a new approach to the study of hidden populations. Soc Probl. 1997;44(2):174–199. [Google Scholar]
  • 21.Salganik MJ, Heckathorn DD. Sampling and estimation in hidden populations using respondent-driven sampling. Sociol Methodol. 2004;34:193–239. [Google Scholar]
  • 22.Curran PJ, Hussong AM. Integrative data analysis: the simultaneous analysis of multiple data sets. Psychol Methods. 2009;14(2):81–100. doi: 10.1037/a0015914. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from American Journal of Public Health are provided here courtesy of American Public Health Association

RESOURCES