Question: What are the key characteristics of the cohort study design and its varied applications, and how can this research design be utilized in health sciences librarianship?
Data Sources: The health, social, behavioral, biological, library, earth, and management sciences literatures were used as sources.
Study Selection: All fields except for health sciences librarianship were scanned topically for either well-known or diverse applications of the cohort design. The health sciences library literature available to the author principally for the years 1990 to 2000, supplemented by papers or posters presented at annual meetings of the Medical Library Association.
Data Extraction: A narrative review for the health, social, behavioral, biological, earth, and management sciences literatures and a systematic review for health sciences librarianship literature for the years 1990 to 2000, with three exceptions, were conducted. The author conducted principally a manual search of the health sciences librarianship literature for the years 1990 to 2000 as part of this systematic review.
Main Results: The cohort design has been applied to answer a wide array of theoretical or practical research questions in the health, social, behavioral, biological, and management sciences. Health sciences librarianship also offers several major applications of the cohort design.
Conclusion: The cohort design has great potential for answering research questions in the field of health sciences librarianship, particularly evidence-based librarianship (EBL), although that potential has not been fully explored.
A young obstetrician took a position at the equivalent of a county hospital a number of years ago. Not long after beginning his new job in one of the two maternity wards at the hospital, the physician noticed that patients in his particular ward were contracting a mysterious infection at a rate of 11.4%. Yet, patients in the other maternity ward only contracted the infection at a rate of 2.7%. Four years prior to his arrival, the annual infection rate in his ward actually had been nearly 16%. The young obstetrician could find no obvious reason for this one difference between the wards, because conditions in both wards were essentially the same. Except for one possible factor: in his ward, only physicians and medical students delivered babies, whereas only midwives and nurse midwifery students delivered babies in the other ward. The obstetrician attempted to equalize conditions between the two wards in every conceivable way. He even instructed the physicians and medical students in his ward to imitate the seemingly gentler delivery methods used in the midwifery ward. Physicians both at his hospital and at other hospitals where outbreaks of the same infection had occurred had described their experiences in case reports, but none were able to understand what caused these infections.
While still grappling with this seemingly unsolvable problem a year later, one of the obstetrician's physician friends contracted the identical infection. This physician friend had accidentally cut himself with a scalpel while working on a cadaver during an autopsy, which led to his developing the same infection. The obstetrician soon recognized that the physicians and medical students from his maternity ward, who had been working on cadavers elsewhere in the hospital during slow periods in the maternity ward, had inadequately washed their hands before delivering babies. These physicians and medical students consequently were unintentionally infecting the postpartum women. In contrast, the midwives and nurse midwifery students rarely came into contact with cadaveric material.
The obstetrician, Ignaz Semmelweis, began work at the Vienna Lying-in Hospital in 1846. The clinical research method that he employed, now known as the cohort study design, enabled him to detect the cause of the frequently fatal puerpural (childbed) fever in his maternity patients. This brief biographical sketch [1–3] illustrates the effectiveness of the cohort study in observing possible causal relationships among variables. All cohort studies contain a population (e.g., pregnant women delivering at the Vienna Lying-in Hospital), an exposure (e.g., cadaveric material), and an outcome (e.g., risk of childbed fever). The cohort design will be defined and described in detail later in this article.
The British physician James Lind initially pioneered the use of the cohort design in 1747, predating Semmelweis by ninety-nine years, while determining the effective treatment of scurvy with citrus fruits in sailors [4]. PCA Louis of France in 1835 published his famous cohort study Recherches sur les effets de la saignée based upon a cohort design that disproved the effectiveness of bleeding patients for diseases such as pneumonia, which had a subsequent influence on both British and U.S. medicine [5]. In the United States, the cohort design first appeared in a study involving a cohort of tuberculosis patients, who had been treated during the years 1885 to 1901 at the Adirondack Cottage Sanitarium in upstate New York [6]. Another early cohort study linked increased dietary animal protein and a cure for pellagra [7]. Epidemiologists later employed the cohort design to detect patterns of familial contagion of tuberculosis during the 1930s in Kingsport, Tennesssee [8]. Beginning in 1949, citizens of Framingham, Massachusetts, were enrolled in a now famous prospective cohort study intended to detect causal relationships between lifestyle and heart disease [9]. Samet and Munoz provide interested readers with an excellent history on the development of cohort studies in epidemiology [10].
The cohort study design continues to be a popular research method in medicine, public health, and other scientific disciplines [11]. One commonly encounters research results from cohort designs reported in the contemporary clinical and public health literatures. Table 1 provides brief descriptions of the defined populations, exposures, and outcomes of health sciences cohort studies. These descriptions illustrate the diversity of research questions in the health sciences that can be addressed by utilizing the cohort study design.
Table 1 Examples of cohort study designs in the health sciences
Figure 1 offers a generic cohort study design with its three major components: (1) a defined population, (2) exposure status, and (3) outcomes. Figure 1 pertains primarily to applying health sciences uses of the cohort design as a vehicle for practicing evidence-based librarianship. The author will explain below how the same basic cohort design has been adapted to the social, behavioral, biological, and earth sciences. Most importantly, this article will demonstrate that the cohort design also has wide potential applicability to evidence-based librarianship.
Figure 1.
Generic cohort study
Defined population
All cohort studies begin with a defined cohort. A cohort may be most generally defined as a “band” or a “group,” harking back to the earliest known use of the term meant to designate one out of ten Roman legions [22]. Rothman and Greenland observe that “In epidemiology, the word cohort is often used to designate a group of people who share a common experience or condition” [23]. This group may comprise all of the defined population or simply a sample from that defined population. Regardless of data collection method, no member of the cohort should have the outcome of interest at the chronological beginning of the cohort study. This factor aids later inferences of causality as researchers observe changes over time. The first column in Table 1 lists a wide variety of defined populations studied in the health sciences. Cohorts can be composed of people who belong to any conceivable categorical group including, but not limited to, religious sects, vegetarians, smokers, occupational groups, lower socioeconomic status groups, victims of traumatic injury, adults having experienced adverse childhood events, former cancer or surgery patients, or those living in defined geographic regions [24]. Cohort studies even have been deployed for humorous purposes in mock studies of cohorts consisting of soap opera characters, jazz musicians, and teething infants [25–27].
Outside of the health sciences, cohorts generally are defined more broadly. Glenn, for example, defines a cohort as “Those people within a geographically or otherwise delineated population who experienced the same significant life event within a given period of time” [28]. Ryder defines a cohort even more generically: “A cohort may be defined as the aggregate of individuals (within some population definition) who experienced the same event within the same interval” [29]. Social and behavioral scientists have defined cohorts of every conceivable grouping of people: students, graduates, professionals, married couples, parents, divorced people, twins, blended families, children raised in dysfunctional families, adulterers, prostitutes, substance abusers, and agents of social change to name a few familiar variations. Table 2 lists diverse defined populations who have been studied with the cohort design in the social, behavioral, biological, and earth sciences. A recent dictionary by Bégaud defines a cohort as a “Group of subjects selected according to one or more common characteristic(s) and followed over time in order to identify, describe or quantify an event” [47].
Table 2 Diverse examples of cohort studies from outside the health sciences
Cohorts need not be comprised of humans either. Antarctic seals, Atlantic salmon, shellfish, dogs, rabbits, and evergreen trees also can be cohorts. Biological cohort studies tend to focus upon a group of individuals born at the same time, particularly in species with shorter life spans than humans [48–52]. “There are many different sources of cohorts and the most appropriate cohort often depends on the study questions, disease frequency, and study financial resources” observes one author [53]. For this article, it must be noted that a defined population in a cohort need neither be alive nor present for its changes to be measured through surrogates or artifacts, such as fossils [54].
Exposure or non-exposure
Figure 1 utilizes the triangle, as did the ancient Greeks, to symbolize change. In this diagram, the triangle can represent the existence of an exposure or non-exposure. Health sciences cohort designs normally focus upon a single exposure or intervention (i.e., change) that researchers suspect causes an outcome. In the opening medical example of Semmelweis's discovery of the cause of childbed fever, the exposure would be the manual contact of cadaveric tissue with postpartum women by physicians and medical students in one of the two maternity wards. Lind noted the exposure of sailors to citrus fruit as a cure for scurvy [55]. In the health sciences, these exposures or non-exposures can span a range of conditions as diverse as the ionizing radiation from an atomic bomb blast to a prescribed drug treatment to the avoidance of certain unhealthy lifestyles in the case of vegetarians. An exposure can include personal characteristics such as the presence of a psychiatric condition [56]. An exposure can be either an uncontrolled event or a conscious decision to make a medical intervention [57]. Or, an exposure actually can include people's behavior [58]. Cohort studies can measure exposures in different ways: intensity, duration, regularity, or even variability [59]. Different members of a cohort might be exposed over time in varying degrees to the potential cause of the outcomes. For example, not everyone who survived the bombings of Nagasaki or Hiroshima was exposed to identical amounts of ionizing radiation due to either their distance from ground zero or their shielding from the blast by structures. As another example, vegetarians vary in their consumption choices: some might be vegans, while others might engage in potentially risky behaviors such as excessive alcohol consumption or tobacco use.
The social and behavioral sciences, perhaps reflecting the complexity of human activity, tend to focus upon multiple rather than just one identified set of exposures or interventions hypothesized to cause an outcome. Table 2 indicates that the concept of exposure or non-exposure outside the health sciences may be best described in terms of a variable that causes change.
Figure 1 clarifies that not all hypothesized outcomes actually occur in a cohort study. Sometimes the hypotheses, null hypotheses, or alternative hypotheses fail to predict what turns out to be a completely unexpected outcome. A cohort study of soft-shelled clams by Belding, for instance, revealed the surprising discovery that clam beds in polluted waters were “extremely productive” [60]. One cohort study produced the unexpected finding that moderate amounts of alcohol might have beneficial health effects [61]. Another cohort study on the risk of developing Parkinson disease noted the possible preventive effect of caffeine [62]. In other instances, cohort studies yield no observed outcomes.
Semmelweis could have calculated that the relative risk for patients to contract childbed fever in his ward was 4.2 times greater than in the other ward. The relative risk ratio represents a probabilistic statement that helps decision makers in the health sciences, and other disciplines, recognize the chances that a certain outcome might occur [63]. Gordis indicates that in epidemiology, “The essential characteristic in the design of cohort studies is the comparison of outcome(s) in an exposed group and a non-exposed group (or a group with a certain characteristic and a group without that characteristic)” [64]. Samet and Munoz observe that “The dynamic nature of many risk factors and their relation in time to disease occurrence can only be captured in the cohort design” [65]. Semmelweis was able to compare the incidence rate between his ward and the other ward at the Vienna Lying-in Hospital. When cohort studies make comparisons, these comparisons can be within the cohort of interest (intracohort) or against other cohorts (intercohort) [66]. Some suggest that intracohort arrangements are more desirable [67]. Cohort studies sometimes can be combined through either systematic reviews or possibly in a more rigorous meta-analysis to make even greater generalizations [68].
Outside the health sciences, these comparisons between exposed and non-exposed groups and the group or groups with an outcome of interest simply do not seem as important. At least these aspects are not emphasized to the same extent. An outcome needs only to be descriptive to be of interest to the social, behavioral, or biological scientists [69]. As already noted, researchers in the health sciences have been keenly interested in making comparisons between outcomes and then linking these varied outcomes to differences in exposures. Outside the health sciences, interest in these dimensions of cohort studies span a continuum from simply describing differences in outcomes all the way to the aforementioned more rigorous statistical analyses found in the health sciences.
The use of cohort studies apparently arose to meet different needs within various disciplines. In the social sciences, Karl Mannheim has been credited for first recognizing the utility of the cohort study design as early as 1928. The social sciences then began to harness this study design in the 1930s through the 1950s, with further theoretical investigations by Talcott Parsons in 1959. Political scientists found new applications of the cohort design with voting studies in 1954. The popular media also used the cohort design in more casual ways during the 1950s to describe social phenomena [70–74]. The cohort design has wide applicability in the biological sciences due to the interest in studying the “natural history” of species. The author serendipitously, rather than systematically, identified a 1903 publication describing basic growth experiments at Cold Spring Harbor on Long Island as possibly the first known use of the cohort design in biology [75].
Data collection
Cohort studies collect data with either prospective or retrospective approaches, depending upon the sequence in which researchers begin to study the cohort. Prospective cohort studies, sometimes called “concurrent cohort studies,” such as the Framingham Study normally begin to measure relevant indicators of variables prior to an exposure of interest. These measurements continue throughout the study until a certain endpoint. Retrospective cohort studies, sometimes called “historic cohort studies,” identify the cohort, their exposure, and outcomes after an exposure, as follow-up studies.
Longitudinal studies are a specific subtype of cohort study involving multiple measurements of individuals in a cohort, sometimes continuously [76, 77]. Some researchers from the biological and social sciences tend to use the terms “longitudinal studies” and “cohort studies” interchangeably, however. The biological sciences tend to describe cohort studies as longitudinal studies, even when only one baseline and one follow-up measurement of each individual in a cohort occur [78]. This semantic issue has implications for any literature searches across different disciplines, although the Dictionary of Epidemiology [79] and Medical Subject Headings (MeSH) both subsume longitudinal studies categorically under cohort studies.
The cohort design specific to epidemiology, and more generally applicable to other fields, thus may be readily summarized. A cohort design includes members of a defined population who are exposed to a factor of interest over a period of time. These members are compared to members of the same population (or a similar population) who have not been exposed to the factor of interest. The data analysis then compares the outcomes over two or more periods of time between the two or more groups [80].
Librarianship both borrows from and influences other fields [81]. Librarianship employs cohort studies more closely resembling the social sciences models, which tend to describe phenomena rather than make inferences about possible causality. By presenting the diverse examples in the preceding text and in Tables 1 and 2, the author hopes to encourage librarians to explore new applications of the cohort study design to apply this study design to other research questions in librarianship. The author further hopes that some of the examples from the health sciences are particularly inspiring. References in the text and the tables refer readers to the actual reported studies as well as other resources for calculating the statistics appropriate to this design.
Methods for identifying cohort studies
The examples of cohort studies in Tables 1 and 2 represent well-known historic cohort studies or more recent diverse applications of the cohort study design outside librarianship. The author identified many of the health sciences examples by searching PubMed during March 2000, utilizing MeSH terms such as “Cohort Studies,” “Followup studies,” and “prospective studies” plus keywords such as “cohort” or “cohorts.” The author identified most of the cohort studies outside the health sciences by searching FirstSearch during April and May 2000 using keywords like “cohort” or “cohorts.” FirstSearch databases used included Agricola, BIOSIS, EconLit, Education Abstracts, Geobase, Humanities Abstracts, Legal Periodicals, Library Literature, PAIS, Social Sciences Abstracts, and Wilson Business. Some articles listed references to historically significant cohort studies in various fields. This approach represented a less scientific, narrative review literature review methodology [82].
The search strategy for identifying possible cohort studies in librarianship, particularly those from health sciences librarianship, began with searching the FirstSearch Library Literature database and PubMed during April 2000. The author used the aforementioned MeSH terms and keywords in PubMed and the keywords “cohort” and “cohorts” in Library Literature. These approaches were almost completely fruitless. The aforementioned controlled subject headings or keyword approaches used for other subject areas yielded hardly any possible studies in librarianship. The author ultimately had to engage in the kind of handsearching found in systematic reviews to supplement the database searches. The author made no special effort to extend his search beyond the library literature for library and information science articles, even though he already was aware of at least a few relevant articles published in other fields. The author also drew upon papers and posters presented at the 1998 through 2001 Medical Library Association (MLA) annual meetings, the MLA South Central Chapter meetings during 1999 to 2000, and the 2000 Joint Meeting of the Medical Library Group of Southern California and Arizona (MLGSCA) and the Northern California and Nevada Medical Library Group (NCNMLG), for examples. In this respect, the search for diverse illustrative examples of cohort studies in health sciences librarianship followed a systematic review [83] rather than a narrative review approach, although not all identified studies ultimately appeared in this article, because they featured either redundant adaptations or had been described already in a previous article by the author [84].
Researchers reporting in the library literature hardly ever categorize these studies by their formal name of cohort studies [85]. This lack of categorization thereby thwarts efforts to identify studies using standard controlled vocabulary or keyword search strategies. Consequently, the author manually searched the following journals specifically for the noted years: Bulletin of the Medical Library Association (BMLA), 1990 to 2000; Bibliotheca Medica Canadiana (BMC), 1995 to 2000; Health Libraries Review, 1995 to 2000; and Medical Reference Services Quarterly, 1994 to 2000. The restricted local availability of the latter three titles explains the limited coverage for a manual search. Following this systematic review, the author identified more recently published reports through his professional reading in these journals, attendance at MLA 2001, and reading of back and current issues of journals such as Hypothesis.
Collection resources cohort studies
Table 3 summarizes some applications of the cohort study design in health sciences librarianship concerning collection resources. The most common and enduring application of the collection resources cohort design traditionally has been the book or journal usage study. These studies can be found throughout the professional literature and even in library newsletters. Beginning in 1939, Postell pioneered adapting the cohort study design for collection resources use studies [98]. Health sciences librarians at Yale University, the Philadelphia College of Physicians, the Mayo Clinic, the National Library of Medicine, and Columbia University [99–105] also adapted the basic cohort design to answer important collection resources questions during the 1950s and early 1960s.
Table 3 Collection resources cohort studies
This usage studies genre of cohort studies endures, because it answers many practical—and sometimes a few theoretical—research questions. Table 3 includes most collection resources cohort studies identified through the author's systematic review. In Table 3, nearly all of the populations under study consist of users of various libraries or information centers. The exposures mainly consist of access to these collection resources. The observable outcome usually consists of the aggregate usage of the resources. In this adaptation, the use measurement becomes the locus for observing actual changes in the user population. Thus, while populations of humans are involved in collection resources cohort studies, their activity becomes recorded in the form of collection resources usage in ways similar to the ways that laboratory results, medical images, and other data become surrogates in health sciences cohort studies. Usage becomes the surrogate measure for changes in the user population without linking individuals to their specific usage data, however. Privacy policies and professional librarians' codes of ethics usually do not allow linking materials used with specific users, so humans are not studied per se, just the artifact of their use, in most of these cohort studies. The collection resources described in Table 3 offer the practical advantage over other types described in Tables 4 and 5 of not normally requiring approval from human subjects research committees, because individuals are not linked to their confidential usage data. All of these examples are meant to convey the wide applicability of the cohort design rather than to offer a complete inventory or to endorse the quality of the presented examples.
Table 4 User population cohort studies: user education studies
Table 5 User population cohort studies: information-seeking behavior cohort studies
User population cohort studies
Tables 4 and 5 summarize various types of user population cohort studies utilized by health sciences librarianship to answer research questions. Within this major categorization, Tables 4 and 5 distinguish between two subcategories: user education cohort studies and information-seeking behavior cohort studies. User education cohort studies typically involve a population of students or professionals acquiring additional library or informatics skills. The exposure becomes training of the population by librarians or other experts. The outcome or outcomes describes the post-exposure changes in trainees' library or informatics skills. The information-seeking behavior cohort studies outlined in Table 5 have diverse user populations. The exposures in Table 5 range from the proximity of users to a health sciences library to the running of mediated MEDLINE searches to personality profiles of potential users. The outcomes also vary due to the diverse exposures suspected of causing the changes. The cohort studies in Tables 4 and 5 either link user changes directly at the individual level of analysis or employ surrogates for these changes as would be typically found in the cohort studies found in Table 3.
Other cohort studies in librarianship
Tables 3, 4, and 5 exclude at least one additional health sciences librarianship cohort study, which simply does not fit into the two major typologies. This study further suggests the potential diversity of applications of the cohort study design. This retrospective cohort study by Newcomer and Piscotti examines the career paths of academic health sciences directors [132]. Such career path cohort studies also exist in other fields [133].
By now, readers can recognize that cohort studies are adaptable to studying a great variety of circumstances involving observable change. The basic orientation of a cohort study can range from simply descriptive to actually aiding predictions about the future. This article has shown that cohort studies moreover can be combined with other methodologies such as surveys, which may be a more familiar research method for many librarians. Finally, cohort designs are capable of addressing important questions involving probabilities of observed outcomes, such as projected use of collection resources or use of acquired library or informatics skills, that face the profession [134] as librarians enter an era of evidence-based librarianship (EBL). In the EBL hierarchy of evidence for decision making, cohort studies occupy one of the highest levels, just below randomized controlled trials (RCTs). Under most circumstances, a well-constructed cohort study simply has greater validity and probably greater reliability than even a series of well-done case studies.
Cohort studies, as observational studies, can be used in situations in which it would be unethical to subject participants either to harmful exposures or to deny beneficial exposures. RCTs, while offering the advantages of confirmatory or experimental research approaches, simply may be unethical under certain circumstances. Librarianship normally does not involve high-stakes situations in which harm could be dramatically caused by an exposure such as may be found in medicine. Yet, ethical issues do arise in the profession about denying a potentially beneficial exposure (to a library resource or a service) that may best be resolved with a cohort study instead of an RCT. Cohort studies also observe changes in naturally occurring circumstances, which lends a practical as well as ethical advantage over RCTs. Of course, the EBL question should drive the choice of best research design.
Cohort studies provide great versatility across different exposures, which might produce either dramatic or subtle outcomes [135]. The historic cohort study examples of both Semmelweiss and Lind yielded dramatic results. Semmelweiss's obstetrics patients stopped dying at a terrifying rate from childbed fever, and Lind's sailors nearly all made dramatic recoveries from scurvy. Cohort studies also can measure far more subtle and long-term effects without too much trouble from confounding such as in the Framingham Study or the other varied examples listed in Table 1. Established statistical methods are available for analyzing results in either the dramatic or the more subtle instances [136, 137].
The most controversial argument for the strength of cohort studies resulted from an actual comparison study reported in the New England Journal of Medicine during 2000. This study by Benson and Hartz suggested that, from their review, cohort and RCTs produced similar results in seventeen of the nineteen health conditions. Their suggestion was vigorously disputed by Pocock and Elbourne in the same issue, although both Moses and Feinstein previously had pointed to the realistic use of cohort studies [138–141] in place of RCTs. Additionally, psychological evidence indicated that “Experimenter Expectancy Effects” might lead scientific observers conducting RCTs as well as cohort studies to mistakenly perceive anticipated outcomes in study situations in which actual events moved in radically different directions. Two authors reviewing this impressive body of evidence have written that “The result of hundreds of studies demonstrate that experimenters find what they expect to find.” This body of evidence contributed to the implementation of double blinding [142].
The principal problems with the cohort design revolve around the possible introduction of bias in these kinds of studies. Pocock and Elbourne make this point when they note that cohort studies do not prove causality; they only suggest probabilities of causality [143]. Many other researchers have made similar points. Cohort studies do not randomly assign populations into exposure status so that some members of the population receive no exposure, while others will receive some variation of exposure as done in RCTs. This leads to the possibility of population members either self-selecting exposures on their own or haphazardly becoming exposed (or non-exposed) in ways that bias study outcomes. RCTs obviously are not immune from various forms of bias, either. In general, cohort studies seem to be more susceptible to either the aforementioned selection bias or confounding bias. Confounding bias occurs when variables other than the exposure identified in the cohort design really cause the outcome. Library science researchers can avoid confounding bias, in part, by articulating many diverse alternative hypotheses at the outset when designing their research to anticipate the existence of other explanatory variables. The many forms of bias need not be summarized here, but readers need to be keenly aware that no study design, whether cohort or RCT, can be immune to the many forms of bias. Interested readers are encouraged to refer to two chapters in the Encyclopedia of Epidemiologic Methods for concise discussions of bias relevant to cohort studies [144, 145].
Most library science examples described in this article do not match the methodological rigor of cohort studies utilized in contemporary clinical medicine. Cohort studies in health sciences librarianship more closely resemble studies in the biological, behavioral, or social sciences disciplines in most instances. The presence of a study here furthermore does not signify an endorsement of its validity. Yet, librarians have successfully adapted cohort study designs to solve practical problems, frequently with limited resources, to improve their collection resources or services. This article provides numerous references to guide interested readers to specific examples of cohort studies or to methodological discussions concerning cohort studies. This article and the cohort studies it references should inspire health sciences librarians to apply the cohort design to both familiar and novel adaptations. The cohort design offers librarians a powerful tool to improve their libraries.
The author wishes to thank Thomas Becker, M.D., Ph.D., and Kristine Tollestrup, Ph.D., for their explanations of the subtle intricacies of health sciences cohort designs. The author also appreciates David Bennahum, M.D., for alerting him to the first known cohort study carried out by Dr. James Lind.
