Document co-citation analysis to enhance transdisciplinary research

Caleb M Trujillo; Tammy M Long

doi:10.1126/sciadv.1701130

. 2018 Jan 3;4(1):e1701130. doi: 10.1126/sciadv.1701130

Document co-citation analysis to enhance transdisciplinary research

Caleb M Trujillo ¹, Tammy M Long ^1,^*

PMCID: PMC5752411 PMID: 29308433

Strategic literature searches have the potential to connect disciplines.

Abstract

Specialized and emerging fields of research infrequently cross disciplinary boundaries and would benefit from frameworks, methods, and materials informed by other fields. Document co-citation analysis, a method developed by bibliometric research, is demonstrated as a way to help identify key literature for cross-disciplinary ideas. To illustrate the method in a useful context, we mapped peer-recognized scholarship related to systems thinking. In addition, three procedures for validation of co-citation networks are proposed and implemented. This method may be useful for strategically selecting information that can build consilience about ideas and constructs that are relevant across a range of disciplines.

INTRODUCTION

Discipline-oriented scholarship has led to specialization to the extent that programs and policies now value pursuits that transcend traditional boundaries of scholarly inquiry (1). Transdisciplinary research synthesizes methods and ideas across several distinct academic disciplines to pursue a problem or purpose that is broader than a single discipline (1). Team science pursues deep knowledge integration through transdisciplinary approaches but faces challenges of demanding time and effort of participants and managing different values, languages, and norms of different members (2). For specialized scholars and team science members in search of transdisciplinary methods, concepts, or research frameworks, document co-citation analysis (DCA) is a method that may be useful for avoiding isolation in scholarship, expediting knowledge integration, and, ultimately, building consilience across disciplines. Specifically, DCA enables identification of relevant literature and scholarly communities that may be overlooked in standard approaches to literature searching. Resulting networks help visualize gaps between published research areas. The intended contribution of this report is to demonstrate how scholars can (i) leverage DCA as a potentially useful methodology for promoting transdisciplinarity and (ii) validate the results of a document co-citation network.

As an example of a specialized program in need of transdisciplinarity, research in undergraduate science education, also known as discipline-based education research (DBER), is facing similar issues after recent growth in the United States. In 2012, the National Research Council published a consensus report to establish DBER as a goal-oriented scholarship and express target areas to enhance its pursuits (3). Singer (4), the author of the report, commented, “Research on undergraduate science learning is currently a loose affiliation of related fields. The common feature is the focus on undergraduate teaching and learning within a discipline, using a range of methods with deep grounding in the discipline’s priorities, worldview, knowledge, and practices.” Unlike Science, Technology, Engineering, and Math (STEM) education, which prioritizes education of these broad disciplinary fields across levels, DBER focuses scholarship primarily within a particular discipline at the postsecondary level to understand teaching, learning, and development of expertise. In addition to clarifying the goals of DBER, the authoring committee recommended future engagement in “studies of cross-cutting concepts and cognitive processes” that connect disciplines (3). Whereas integration of concepts and approaches is defined as a priority, Talanquer (5) has commented that DBER infrequently breaks the traditional boundaries that separate the disciplines and, because of this, is at risk of fragmentation and isolation in scholarship. Talanquer (5) also warned that without cross-fertilization, the potential impact of research on teaching and learning would be limited and contribute little to resolve core educational issues. Despite the potential for broader conversations to improve systemic educational change, a review of the literature (6) indicated that scholarship in DBER, faculty development, and higher education policy are relatively independent and disconnected from each other.

To avoid isolation, DBER scholars would benefit from borrowing theoretical, conceptual, and methodological frameworks from psychology, education, social sciences, and other disciplines to strengthen scholarship in undergraduate teaching and learning (5, 7). Consequently, intersectional work across disciplines is occurring in DBER (8), but there remains a need to grow transdisciplinarity to support connections across subfields and build upon the foundational knowledge generated by researchers in complementary disciplines.

Background on DCA

DCA is among many different methods developed by bibliometric research to visualize and measure scholarship across different disciplinary fields. DCA is used to identify scholarship that has received peer-recognition indicated by citation patterns. For instance, when an author cites a particular document, the citation may indicate, among other properties, an idea or other resource that is important to the author’s scholarly engagement with the cited text (9). Similarly, when a group of authors cite a common set of documents, these co-citations indicate documents that may contain concept symbols—the ideas, experiments, or methods that have received peer recognition, as indicated by their co-occurrence of citations (9, 10). Therefore, studies of how documents are cited together can help researchers and practitioners understand important past contributions that were made within a field.

DCA measures the frequency of jointly cited documents (11). Figure 1 visualizes the steps to make a co-citation network from bibliographic data. Authors of source documents I, II, and III jointly cite documents C, D, and E (Fig. 1A). Lines connecting these jointly cited documents represent the co-citation relationship in a network (Fig. 1, B and C) where nodes represent cited documents and edges represent instances of co-citation. Edge weights represent the number of times that two documents were jointly cited. For example, documents D and E were jointly cited by I, II, and III, and therefore, D and E are connected by an edge with a weight of three (Fig. 1C). The degree of a co-cited document is equal to the number of edges (Fig. 1D) and represents the number of neighboring documents, which can be used to rank co-cited documents. Tightly connected groups of co-citations allow one to infer communities. Removing edges below a weight threshold can reveal highly co-cited documents or separate communities. For instance, Fig. 1E shows a network after trimming edges that are less than three in weight, revealing the most frequent co-citations. Unlike bulk citation counts, DCA can be used to identify peer-recognized documents and to visualize the relationships among works.

Fig. 1 — *DCA* illustration of the conversion of citation data (A) to a co-citation network (B) and the resulting node (C) and edge (D) metrics before and after trimming (E).

Researchers have used co-citation networks to map science scholarship. Early studies of DCA mapped literature in collagen research (12) and nuclear physics (13) to reveal specialization underlying the social and cognitive organization of science fields. Co-citation clusters correspond to aggregate word profiles of citing documents, which suggests that DCA can represent research foci as coherent but different specializations (14). Within STEM education, DCA has been used to map and group co-cited documents (15, 16), but little has been done to connect between education and non-education disciplines using DCA. Furthermore, unlike studies that use other literature mapping techniques [for example, that of Boyack et al. (17)], the results of co-citation networks are rarely validated. Some previous attempts to validate DCA used questionnaires sent to leading researchers to confirm historical significance of identified documents (12) and comparison to word profiles (14).

Here, DCA is illustrated using “systems thinking” as an example of a concept symbol that crosses traditional disciplinary boundaries. Systems thinking literature represents a wide breadth and diversity of disciplines, which makes the prospect of conducting a comprehensive literature review across fields intimidating. For instance, over 20 years ago, one annotated bibliography grouped 68 documents into seven different systems thinking approaches to summarize the breadth across many disciplines (18). Since then, additional disciplines have contributed to the systems thinking scholarship. Information on systems thinking is of particular interest to educators who wish to implement curricular recommendations to teach systems (19). Because of these conditions, the systems thinking literature provides a useful case for testing the validity and usefulness of DCA as a means to access recognized literature from different disciplines for a given topic.

RESULTS

Identification of key communities and publications

Document co-citation networks were generated to visualize the diversity of documents and communities in the systems thinking literature. From the Web of Science Core Collection, 229 source articles that contained system(s) thinking in their title from a range of disciplines were identified. The source documents varied in their publication dates with 41% published between the years 2010 and 2015 and 94% published after 1990, reflecting the recent growth of this area. These documents cited 7048 unique documents, and 246 documents were co-cited three or more times with at least one other document (≥3 network), 71 documents were co-cited five or more times (≥5 network), 35 documents were co-cited seven or more times (≥7 network), and 19 documents were co-cited nine or more times (≥9 network). Visualizations of the four trim levels of co-citation networks appear in Fig. 2. These graphs represent co-cited documents as nodes and the frequency of co-citation as weighted edges. Shapes and coloring of nodes denote the assigned communities.

Fig. 2 — Results trimmed at the following co-citation frequency levels: (A) ≥3, (B) ≥5, (C) ≥7, and (D) ≥9. A key is provided in the lower right panel (E). Nodes represent co-cited documents with top co-cited documents among a community labeled by author(s) and year published. Node shape and color represent assigned community determined by smart local moving (*SLM)* detection for each network. Edges represent co-citations between documents with frequencies represented by width and color tone. Communities in the ≥3 network of fewer than three documents were not included in the visual because these four small communities were complete and isolated. Visualization was made with organic layout in Cytoscape (34).

After conducting analyses from these networks, the top three most frequently co-cited documents by degree across identified communities in the networks were listed. For instance, the ≥3 network suggests 11 communities, but only 7 contain three or more documents (communities 0 to 6). Table 1 reports bibliographic information for the top co-cited documents in the ≥3 network, their community assignments, and their co-citation metrics. Data reported are as follows: “times cited” is the number of source documents citing the document, and “degree” is the number of other documents jointly cited at the respective trim level. For instance, Senge (20) represents a book assigned to community 0, which was cited by 62 source documents and co-cited at least three times with 90 other documents. The resulting co-cited documents and communities from the ≥3 network suggest literature that received recognition among scholars of systems thinking. The ≥3 network data are presented in detail because they included all co-cited documents present at other trim levels, and as reported below, they performed better than the other trim levels during the validation stage.

Table 1. Highly co-cited documents.

Top three co-cited documents among seven assigned communities for the ≥3 network in Fig. 2A. Communities containing fewer than three documents are omitted.

Community	Reference to co-cited document	Times cited	Degree
0	P. M. Senge, The Fifth Discipline: The Art and Practice of the Learning Organization (Doubleday and Company, 1990).	62	90
	J. W. Forrester, Industrial Dynamics (Massachusetts Institute of Technology, 1961).	29	46
	J. D. Sterman, Business Dynamics: Systems Thinking and Modeling for a Complex World (Irwin/McGraw-Hill, 2000).	28	17
1	P. Checkland, Systems Thinking, Systems Practice (Wiley, 1981).	59	111
	R. L. Ackoff, Creating the Corporate Future: Plan or be Planned for (Wiley, 1981).	18	47
	P. Checkland, J. Scholes, Soft Systems Methodology in Action (Wiley, 1990).	25	42
2	W. Ulrich, Critical Heuristics of Social Planning: A New Approach to Practical Philosophy (P. Haupt, 1983).	23	84
	C. W. Churchman, The Systems Approach (Delacorte Press, 1968).	18	65
	C. W. Churchman, The Design of Inquiring Systems: Basic Concepts of Systems and Organization (Basic Books, 1971).	15	49
3	O. Ben Zvi Assaraf, N. Orion, Development of system thinking skills in the context of earth system education. J. Res. Sci. Teach. 42 (5), 518–560 (2005).	9	23
	M. J. Jacobson, U. Wilensky, Complex systems in education: Scientific and educational importance and implications for the learning sciences. J. Learn. Sci. 15 (1), 11–34 (2006).	8	21
	M. Frank, Engineering systems thinking and systems thinking. J. Syst. Eng. 3 (3), 163–168 (2000).	8	21
4	M. C. Jackson, Systems Methodology for the Management Sciences (Plenum Press, 1991).	24	83
	R. L. Flood, M. C. Jackson, Creative Problem Solving: Total Systems Intervention (Wiley, 1991a).	24	69
	R. L. Flood, M. C. Jackson, Critical Systems Thinking: Directed Readings (J. Wiley, 1991b).	14	50
5	L. von Bertalanffy, General System Theory: Foundations, Development, Applications (George Braziller, 1968).	26	34
	M. Mulej, R. Espejo, M. C. Jackson, S. Kajzer, J. Mingers, P. Mlakar, N. Mulej, V. Potočan, M. Rebernik, A. Rosicky, B. S. Umpleby, D. Uršič, R. Vallee, Dialektična in druge mehkosistemske teorije: (podlage za celovitost in uspeh managementa) (Ekonomsko-poslovna fakulteta, 2000).	4	10
	M. Davidson, Uncommon Sense: The Life and Thought of Ludwig von Bertalanffy (1901–1972), Father of General Systems Theory (Tarcher, 1976).	4	9
6	S. J. Leischow, A. Best, W. M. Trochim, P. I. Clark, R. S. Gallagher, S. E. Marcus, E. Matthews, Systems thinking to improve the public’s health. Am. J. Prev. Med. 35 (2), S196–S203 (2008).	7	5
	J. B. Homer, G. B. Hirsch, System dynamics modeling for public health: Background and opportunities. Am. J. Public Health 96 (3), 452–458 (2006).	4	5
	W. M. Trochim, D. A. Cabrera, B. Milstein, R. S. Gallagher, S. J. Leischow, Practical challenges of systems thinking and modeling in public health. Am. J. Public Health 96 (3), 538–546 (2006).	5	4

Open in a new tab

Validation of co-cited documents and communities

Validation of co-citation networks is not commonly practiced. Here, we report three approaches to test the validity of inferences following from network results (Table 2).

Table 2. Validation.

Validation results of the systems thinking network are displayed for each of the following trim levels: three, five, seven, and nine or more co-citations. The number of documents and the number of co-citations in each network are indicated. Internal consistency is reported as Spearman’s rank correlations of times cited by source documents to degree of co-citation. Community validity was tested using a χ² test for independence between assigned network communities and subject communities. Stability was measured as the number of co-cited documents in the comprehensive network also found in the systems thinking network, and a Spearman’s rank correlation of the degree of co-citation for documents matched between the two networks.

	Systems thinking network trim levels
	≥3	≥5	≥7	≥9
Network metrics
Co-cited documents (number of nodes)	246	71	35	19
Co-citations (number of edges)	1,292	271	105	44
Internal consistency
Spearman’s value (S)	1,099,369	15,468	2151	584
P value	<0.001	<0.001	<0.001	0.034
ρ	0.56	0.74	0.70	0.49
Community validity
X²	494.55	85.40	45.16	17.47
Degrees of freedom (df)	280	48	30	12
P value	<0.001	<0.001	0.037	0.13
Stability to comprehensive network
Number of matching documents	68	36	24	18
S	26,613	4,459	1716	510
P value	<0.001	0.0095	0.25	0.47
ρ	0.45	0.42	0.25	0.47

Open in a new tab

To test internal consistency, a Spearman’s rank correlation was conducted between the times cited by the source documents and the degree of co-citation (fig. S1). The inference that the degree of co-citation and the times cited correlate was supported by a rejection of the null hypothesis (P < 0.05) at each trim level (Table 2). Intuitively, co-citation depends on citation, so the correlation between these metrics is not surprising, but an absence of a correlation would suggest that the results are internally nonsensible and would undermine the validity of the network.

To test the validity of the identified communities as areas of scholarship, we compared the co-citation network results to subject communities identified by WorldCat, which contained entries for more than 95% of the co-cited documents. After processing to remove qualifiers, subordinate topics, duplicates, and non-English terms, 241 subjects were attributed to the set of co-cited documents. Figure 3 indicates the top three WorldCat subject labels attributed to documents for each of the seven communities in the ≥3 network. The subjects grouped within documents to form 29 subject communities, which were cross-tabulated with the communities identified in the document co-citation network (fig. S2 and table S1). The results of a χ² test suggest that subject communities are likely related to co-citation communities in the ≥3, ≥5, and ≥7 networks but are independent of co-citation communities in the ≥9 network (Table 2).

Fig. 3 — The top three WorldCat subject labels are shown for each of the main communities. The color, shape, and bolded number correspond to the co-citation communities in the ≥3 systems thinking network in Fig. 2A. Numbers of co-cited documents within each community that have a topic label are reported.

To test the stability of the system(s) thinking co-citation networks results, a second document co-citation network with more comprehensive search terms was created and visualized in Fig. 4. For the comprehensive network, 542 source document entries were extracted to identify 20,032 unique cited documents, of which 149 documents were co-cited three or more times. Of the co-cited documents in the comprehensive network, 46% also appeared in the ≥3 systems thinking network (Table 2). When visualized, many of the co-cited documents in the systems thinking network can also be found in the largest subnetwork of the comprehensive network (Fig. 4). In addition, in terms of coverage, 18 of the 19 co-cited documents that are found in the ≥9 systems thinking network are also present in the comprehensive network, suggesting stability in the findings. Furthermore, the stability of the results were supported by a Spearman’s rank correlation of the degree of co-citation between the comprehensive network and the systems thinking network at trim levels of 3, 5, and 9 but not at a level of 7 (Table 2). Together, these data support the inference that the co-cited documents can be repeatedly identified in queries using more comprehensive search terms related to systems thinking.

Fig. 4 — A co-citation network generated from comprehensive search criteria with edges trimmed to frequencies of three or more co-citations. Documents matching those in the ≥3 systems thinking network of Fig. 2A are colored blue. Top co-cited documents from Table 1 are labeled.

DISCUSSION

Implications for research

DCA is one resource that may enhance transdisciplinary pursuits by helping scholars and practitioners to identify peer-recognized documents and communities of scholarship. First, DCA may be useful to explore patterns in influential literature developed across different disciplines. For instance, it is demonstrated that community structure could be detected within the co-citation networks for systems thinking. Despite von Bertalanffy’s (21) early proposal of general systems theory as a unifying foundation to transcend disciplines that study systems, the results suggest segregation among the communities of scholars that study systems thinking. Science and engineering education literature, the studies of Frank (22), Ben Zvi Assaraf and Orion (23), and Jacobson and Wilensky (24), were assigned to community 3 (in the ≥3, ≥5, and ≥7 networks) and can be seen as a smaller group of documents that is weakly connected to the other communities. According to Small (11), high co-citation suggests documents that have received peer recognition for contributing a concept symbol. By applying this interpretation to the observations, the results suggest that the concept symbols of systems thinking, as recognized by education scholars, may differ from the symbols recognized by authors outside of education. Thus, in the context of work on systems thinking, it may be the case that co-citation networks of literature substantiate a claim made by Talanquer (5) and Henderson et al. (6) that DBER, and perhaps STEM education, is at risk of fragmentation and isolation.

However, document co-citation networks have the potential to help users supplement standard literature reviews by accessing scholarship that bridges historically segregated communities. Comprehensive literature reviews are resource-intensive and, because they tend to originate within the researcher’s discipline, risk bias to recapitulate the representation of literature in their field. Students and those new to a discipline may not be well situated to identify influential literature within a domain. However, DCA has the potential to both expedite the literature review process and bridge gaps that segregate subject communities by helping researchers, practitioners, and students select documents strategically. Networks, such as the one above (for example, Fig. 2A), provide a map of key literature that can focus initial analysis and provide guidance for in-depth searching. Scholars and educators wanting to survey highly recognized contributions could focus their reading on documents that have a high degree of co-citation such as those in Table 1. Alternatively, and perhaps more importantly, co-citation networks identify key documents across research domains that can be leveraged to integrate perspectives and advance transdisciplinary research. If this is an aim of research, then one could calculate network metrics such as betweenness centrality to find documents that bridge scholarly communities or to prioritize potential “must-read” documents from different disciplinary perspectives. Similarly, the co-citation edges that link documents from different communities (by bridging color groups in Fig. 2) could indicate engagement across scholarly disciplines. Scholars may also compare the subject communities (fig. S2 and table S1) to co-citation communities to identify areas of diverse subjects, overlapping topics, or specialization. STEM education research, in particular, could benefit from information strategies that promote consilience in terms of understanding ideas that transcend disciplines, including crosscutting concepts, research methodologies, and theories of learning (3, 5, 7).

Finally, this report proposes procedures to validate results of a co-citation network to support appropriate inferences. Previously published document co-citation networks have used statistics to report and explore their results such as frequency of co-citation, word profiles, and clusters of documents (13, 14), but less has been done to test the validity of these results. Here, an approach for validation of DCA is offered by using network data to understand internal consistency, by comparing network data to external bibliographic catalogs to interpret communities, and by using different search conditions to provide estimates to understand the stability and scope of the results. Validity measures to support or reject document co-citation network results will be useful for quality reports and evaluations. As with other validity measures, there are limitations surrounding the proposed approaches.

Limitations of co-citation networks

The use of co-citation networks faces limitations in terms of source data, the assumptions underlying the interpretation of co-citation patterns, and the long-term stability of results. First, although the study of networks has made great progress in recent decades, the methods used to build networks greatly depend on accurate and reliable source data. Hence, the results are limited by the source database, the Web of Science Core Collection. Although well maintained, many journals and documents are not indexed in this database because its curators tend to bias the inclusion of high-impact literature rather than a large breadth of literature. For the purpose of identifying key documents from co-citation patterns, this was an acceptable limitation because the consistent format of cited references in Web of Science greatly expedited the processing stage when generating networks with the objective of identifying highly recognized documents that may contain noteworthy concept symbols. Consequently, some literature communities could appear in data from databases that are more inclusive but not as consistently formatted as the Web of Science Core Collection, and these may have been overlooked in the network presented. Similarly, the search term system(s) thinking may not fully capture the diversity of scholarship doing similar work by another name. However, effort was made to address this limitation by validating findings against more comprehensive search terms.

Second, limitations of the meaning and interpretation of co-citation exist. Leydesdorff (9) has noted that authors may include a citation in their writing for many reasons. By studying co-citations, it is assumed that the observed patterns reflect how multiple authors recognize a common set of documents in terms of the concept symbols represented in these records. Ultimately, the context and meaning expressed by authors when they cite the identified documents will provide insight into the document’s contribution to scholarship. Therefore, future work would benefit from analysis of both cited and citing works to understand what ideas, findings, or experiments are being communicated and the meaning attributed to the co-cited documents. In addition, a survey of leading researchers or analyzing word profiles may reveal deeper patterns underlying co-citations (12, 14).

Third, using co-citation networks to focus on highly co-cited documents may not accurately represent the contributions of authors for a variety of reasons. Focusing on the most co-cited documents means that the method overlooks minor dialogs or conferences that may have been significant to shaping discourse in a field (14). Similarly, documents that have been recently published are less likely to receive citations in the source documents and are therefore less likely to be co-cited. Because of this, researchers should take caution when interpreting networks informed from retrospective bibliographic data. Recent publications may one day also gain high co-citation frequencies once they receive recognition. Repeating the presented method with the same resources in 10 years will likely yield an altered network with new documents appearing as nodes, others being lost, and co-citation strengths rising or falling. The presented method could be considered sensitive to time, and thus, claims made about the identified key documents and communities are prone to dynamic changes. The inclusion of source documents published over longer periods of time may overlook unique trends visible only in a shorter time span. Alternative methods, such as document bibliographic coupling, may be better suited to understand the evolution of research trends (25, 26). A number of previously developed resources and software are available to help researchers construct document co-citation networks and other bibliographic visualizations. Annotations of some useful starting places for scholars and students who wish to further explore and implement co-citation analysis are provided in table S2.

MATERIALS AND METHODS

To map the systems thinking literature, bibliographic data were collected to construct co-citation networks. These data were used to identify key documents recognized by systems thinking authors and then validated by testing their internal consistency, external validity, and stability. The specific procedures implemented for generation and validation of co-citation networks are detailed below.

Generating a co-citation network

Network science has developed approaches and software to map a broad range of knowledge domains using diverse data sources (17, 27–29). Drawing from these multiple resources, a framework was adapted and developed for constructing co-citation networks and is summarized in Table 3. Table 3 shows both the general steps and the specific implementation of each to generate and analyze a network of co-cited documents for systems thinking.

Table 3. Steps adapted from a general process for mapping knowledge domains were implemented to build a co-citation network from bibliographic data.

Step	General process (29)	Implementation in this study
1. Data acquisition	Select an appropriate data source.	Search the Web of Science Core Collection for articles from different research areas whose titles contain “system(s) thinking” to export database entries.
2. Processing	Select a unit of analysis and extract the necessary data from the selected sources.	Select cited reference list from each document’s bibliographic entry and use R to merge duplicate citations for co-citation analysis.
3. Analysis	Choose an appropriate similarity measure and then calculate similarity values.	Calculate co-citation network using Science of Science (Sci2) and apply multiple thresholds to reveal different co-citation levels.
4. Visualization	Create a data layout using a clustering or ordination algorithm.	Perform SLM community detection to group co-cited documents and use Cytoscape to visualize the network, communities, and co-cited documents.

Open in a new tab

Data acquisition

Data were gathered by querying the Web of Science Core Collection (Thomson Reuters), a large bibliographic database. This database was selected because it has complete and consistently formatted citation information for its entries. The Web of Science Core Collection searches the Science Citation Index and Social Science Citation Index traditionally used in co-citation analysis and is maintained to prioritize historically impactful publications. Articles and review documents published between the years 1950 and 2015 that contained system(s) thinking in the title were searched and retrieved. The year range was intended to include both recent and historically significant documents. Although system(s) thinking may potentially limit the retrieval of scholarship doing similar work by another name, this search term was used to gather source documents likely to be most relevant to the purpose of this report and, by using this constraint, exclude irrelevant work from the scope of the search. To ensure representation from diverse disciplines in the search results, the complete bibliographic records were exported for the top 25 most cited documents from the top 30 research areas by record count (n = 229). This set of document records is hereafter referred to as “source documents.” Each source document record lists cited references.

Data processing

Reference lists were extracted from the source documents and merged duplicate references using a script developed in R (30). This process also combined entries of different editions of the same book.

Analysis

The processed records were converted into an undirected co-citation network using the software Sci2 (31). Once data were in a network format, the edges of weights less than three, five, seven, or nine in co-citation frequency were trimmed, and any isolated nodes were removed to create four trim levels of the network to be used for comparison. The degree and weight metrics were calculated as illustrated above in Fig. 1.

Visualization

To visualize the networks of co-cited documents as communities, an SLM community detection algorithm (32) was applied within the Sci2 platform to identify modular groupings. Although many algorithms that group nodes and edges into clusters, modules, cliques, or community groups have been published (33), the SLM approach was implemented to place each node into a community because this algorithm functions well across different scales with low computational resources. This algorithm works by rearranging nodes of community groups to find a high modularity, a measure of the number of edges within a grouping compared to the edges between groupings. Using the local moving heuristic, individual nodes are regrouped if and only if a different community assignment increases modularity. The function stops when a community reassignment no longer increases modularity. Because calculations are performed in a random order, different groupings are possible with multiple iterations.

After community detection, the networks were then graphed in Cytoscape (34) using the organic layout option. For reporting, the top three co-cited documents were listed by degree for each identified community.

Validation of identified documents and communities

Previous studies on co-citation networks have reported results, but little work has been done to validate the resulting networks. Without validation, inferences made about the documents and communities identified in a co-citation network may not be supported by evidence, leading to improper interpretations of results. The results were validated by evaluating the network for the following criteria: (i) the internal consistency of co-cited documents, (ii) the validity of the assigned communities as meaningful subject communities, and (iii) the stability of the results across different search terms. These three validation approaches may help support or refute claims made from the network results and, in doing so, contribute to strengthening co-citation networks as a research method.

Internal consistency

To measure internal consistency, the hypothesis was tested that the citation pattern was related to co-citation patterns by measuring a Spearman’s rank correlation between the times cited by the source documents and the co-citation degree for each co-cited document. If the bibliographic data failed to transform to a co-citation network format, then these metrics would be unlikely to correlate, and therefore, a null hypothesis of independence would be supported. Internal consistency was tested on trimmed networks with edges of weights greater than or equal to three, five, seven, and nine.

Validity of communities

Previous research in co-citation networks has made prima facie attempts to identify and visualize meaningful correspondence between the network structure and specialty disciplines (13) and compared results to word profiles (14). To assess the validity of the assigned communities in a data-driven manner in this report, a test of independence was conducted between assigned communities and subject communities identified from a database that differed from Web of Science. To find subject communities, WorldCat.org (35) was queried for each of the identified co-cited documents to gather the subject labels for each document. WorldCat.org is a union catalog that contains the world’s largest database of bibliographic records from libraries. These data were processed to remove qualifiers, subordinate topics, duplicate labels, and non-English terms. Then, to produce “subject communities,” SLM community detection was conducted to cluster documents and subject labels (31) so that each document belonged to one subject community. To test the validity of the results, the identified co-citation communities were cross-tabulated with the subject communities, and a χ² test of independence was performed. Inferences were considered supported for any P < 0.05.

Stability of results

The stability of results was assessed by repeating the creation of a co-citation network with a comprehensive set of search terms within the Web of Science Core Collection. To collect bibliographic data, the following phrases from the abovementioned annotated bibliography (18) were used to search entries by title: general systems theory, soft systems thinking, system dynamics, emancipatory systems thinking, cybernetics, hard systems thinking, organizations as systems, critical systems thinking, system thinking, or systems thinking. In addition, entries were searched for the following Web of Science topics: systems, system theory, system dynamics, system analysis, system thinking, systems thinking, systems engineering, system models, social systems, system design, or systems approach. From the search results, the top 25 most relevant articles, reviews, and proceedings papers according to Web of Science were selected from each of top 30 research areas by record count (n = 542). Documents were processed and analyzed as described for the initial “system(s) thinking” network. The resulting “comprehensive” co-citation network was then trimmed to co-citation frequencies of three or more co-citations (n =149). When possible, the co-cited documents were matched to the original results of the trimmed systems thinking networks. To estimate stability, the number of shared co-cited documents between the networks was calculated, and Spearman’s rank correlation of degree of co-citation was conducted between the corresponding documents of the networks. A low correlation would imply that patterns are different across the networks, whereas a high correlation might imply maintenance of co-citation patterns.

Supplementary Material

http://advances.sciencemag.org/cgi/content/full/4/1/e1701130/DC1

supp_4_1_e1701130__index.html^{(1.4KB, html)}

Acknowledgments

We would like to acknowledge the contributions of P. Zdziarska for the development of scripts that facilitated data processing and were instrumental to success of the reported work. We would also like to thank S. Chan, J. Momsen, E. Bray-Speth, and S. Wyse, as well as members of the Long Laboratory and the CREATE for STEM (Collaborative Research in Education, Assessment and Teaching Environments for the fields of Science, Technology, Engineering and Mathematics) postdoctoral group at Michigan State University for providing intellectual support and critiques to improve the development of this research report. Funding: This material is based on work supported by the NSF under grant no. DRL 1420492. Author contributions: C.M.T. and T.M.L. designed the study and wrote and edited the manuscript. C.M.T. collected the data and performed the analysis. T.M.L. managed the project. Competing interests: The authors declare that they have no competing interests. Data and materials availability: All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials and can be retrieved through subscription-based and nonsubscription-based services including the Web of Science Core Collection and WorldCat.org using the presented search parameters. Additional data and computer codes related to this paper may be requested from the authors.

SUPPLEMENTARY MATERIALS

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/1/e1701130/DC1

fig. S1. Scatterplot of times cited and degree of co-citation for ≥3 systems thinking network.

fig. S2. Network representation of co-cited documents organized as subject communities.

table S1. Tabulation of documents from identified co-cited communities (≥3 network) to identified subject communities (fig. S2).

table S2. An annotated bibliography of six useful resources for understanding DCA and other types of bibliographic networks.

REFERENCES AND NOTES

1.Aboelela S. W., Larson E., Bakken S., Carrasquillo O., Formicola A., Glied S. A., Haas J., Gebbie K. M., Defining interdisciplinary research: Conclusions from a critical review of the literature. Health Serv. Res. 42, 329–346 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]
2.National Research Council, Enhancing the Effectiveness of Team Science (National Academies Press, Washington, DC, 2015). [PubMed] [Google Scholar]
3.National Research Council, Division of Behavioral and Social Sciences and Education, Board on Science Education, Committee on the Status, Contributions, and Future Directions of Discipline-Based Education Research, in Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering, S. R. Singer, N. R. Nielsen, H. A. Schweingruber, Eds. (National Academies Press, 2012). [Google Scholar]
4.Singer S. R., Advancing research on undergraduate science learning. J. Res. Sci. Teach. 50, 768–772 (2013). [Google Scholar]
5.Talanquer V., DBER and STEM education reform: Are we up to the challenge? J. Res. Sci. Teach. 51, 809–819 (2014). [Google Scholar]
6.Henderson C., Beach A., Finkelstein N., Facilitating change in undergraduate STEM instructional practices: An analytic review of the literature. J. Res. Sci. Teach. 48, 952–984 (2011). [Google Scholar]
7.Dolan E. L., Biology education research 2.0. CBE Life Sci. Educ. 14, ed1 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]
8.Brewe E., Pelaez N. J., Cooke T. J., From vision to change: Educational initiatives and research at the intersection of physics and biology. CBE Life Sci. Educ. 12, 117–119 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Leydesdorff L., Theories of citation? Scientometrics 43, 5–25 (1998). [Google Scholar]
10.Small H. G., Cited documents as concept symbols. Soc. Stud. Sci. 8, 327–340 (1978). [Google Scholar]
11.Small H., Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973). [Google Scholar]
12.Small H. G., A co-citation model of a scientific specialty: A longitudinal study of collagen research. Soc. Stud. Sci. 7, 139–166 (1977). [Google Scholar]
13.Small H., Griffith B. C., The structure of scientific literatures I: Identifying and graphing specialties. Sci. Stud. 4, 17–40 (1974). [Google Scholar]
14.Braam R. R., Moed H. F., Van Raan A. F., Mapping of science by combined co-citation and word analysis I. Structural aspects. J. Am. Soc. Inf. Sci. 42, 233–251 (1991). [Google Scholar]
15.Yu Y.-C., Chang S.-H., Yu L.-. C., An academic trend in STEM education from bibliometric and co-citation method. Int. J. Inf. Educ. Technol. 6, 113–116 (2016). [Google Scholar]
16.Tang K.-Y., Wang C.-Y., Chang H.-Y., Chen S., Lo H.-C., Tsai C.-C., The intellectual structure of metacognitive scaffolding in science education: A co-citation network analysis. Int. J. Sci. Math. Educ. 14, 249–262 (2016). [Google Scholar]
17.Boyack K. W., Klavans R., Börner K., Mapping the backbone of science. Scientometrics 64, 351–374 (2005). [Google Scholar]
18.Lane D. C., Jackson M. C., Only connect! An annotated bibliography reflecting the breadth and diversity of systems thinking. Syst. Res. Behav. Sci. 12, 217–228 (1995). [Google Scholar]
19.American Association for the Advancement of Science, Vision and Change: A Call to Action (AAAS, 2010) [Google Scholar]
20.P. M. Senge, The Fifth Discipline: The Art and Practice of the Learning Organization (Doubleday and Company, 1990). [Google Scholar]
21.L. von Bertalanffy, General System Theory: Foundations, Development, Applications (George Braziller, 1968). [Google Scholar]
22.Frank M., Engineering systems thinking and systems thinking. Syst. Eng. 3, 163–168 (2000). [Google Scholar]
23.Ben Zvi Assaraf O., Orion N., Development of system thinking skills in the context of earth system education. J. Res. Sci. Teach. 42, 518–560 (2005). [Google Scholar]
24.Jacobson M. J., Wilensky U., Complex systems in education: Scientific and educational importance and implications for the learning sciences. J. Learn. Sci. 15, 11–34 (2006). [Google Scholar]
25.Kessler M. M., Bibliographic coupling between scientific papers. Am. Doc. 14, 10–25 (1963). [Google Scholar]
26.Boyack K. W., Klavans R., Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? J. Am. Soc. Inf. Sci. Technol. 61, 2389–2404 (2010). [Google Scholar]
27.Shiffrin R. M., Börner K., Mapping knowledge domains. Proc. Natl. Acad. Sci. U.S.A. 101 (Suppl. 1), 5183–5185 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]
28.Börner K., Chen C., Boyack K. W., Visualizing knowledge domains. Annu. Rev. Inf. Sci. Technol. 37, 179–255 (2003). [Google Scholar]
29.Börner K., Sanyal S., Vespignani A., Network science. Annu. Rev. Inf. Sci. Technol. 41, 537–607 (2007). [Google Scholar]
30.R Core Team, R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013); www.R-project.org/.
31.Sci2 Team, Science of Science (Sci2) Tool (Indiana University and SciTech Strategies, 2009); https://sci2.cns.iu.edu.
32.Waltman L., van Eck N. J., A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 471 (2013). [Google Scholar]
33.Fortunato S., Community detection in graphs. Phys. Rep. 486, 75–174 (2010). [Google Scholar]
34.Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B., Ideker T., Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]
35.Online Computer Library Center, WorldCat.org Services. 2001-2016 OCLC Online Computer Library Center Inc. (2016); retrieved from http://worldcat.org.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

http://advances.sciencemag.org/cgi/content/full/4/1/e1701130/DC1

supp_4_1_e1701130__index.html^{(1.4KB, html)}

1701130_SM.pdf^{(597.7KB, pdf)}

Supplementary material for this article is available at http://advances.sciencemag.org/cgi/content/full/4/1/e1701130/DC1

fig. S1. Scatterplot of times cited and degree of co-citation for ≥3 systems thinking network.

fig. S2. Network representation of co-cited documents organized as subject communities.

table S1. Tabulation of documents from identified co-cited communities (≥3 network) to identified subject communities (fig. S2).

table S2. An annotated bibliography of six useful resources for understanding DCA and other types of bibliographic networks.

[R1] 1.Aboelela S. W., Larson E., Bakken S., Carrasquillo O., Formicola A., Glied S. A., Haas J., Gebbie K. M., Defining interdisciplinary research: Conclusions from a critical review of the literature. Health Serv. Res. 42, 329–346 (2007). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] 2.National Research Council, Enhancing the Effectiveness of Team Science (National Academies Press, Washington, DC, 2015). [PubMed] [Google Scholar]

[R3] 3.National Research Council, Division of Behavioral and Social Sciences and Education, Board on Science Education, Committee on the Status, Contributions, and Future Directions of Discipline-Based Education Research, in Discipline-Based Education Research: Understanding and Improving Learning in Undergraduate Science and Engineering, S. R. Singer, N. R. Nielsen, H. A. Schweingruber, Eds. (National Academies Press, 2012). [Google Scholar]

[R4] 4.Singer S. R., Advancing research on undergraduate science learning. J. Res. Sci. Teach. 50, 768–772 (2013). [Google Scholar]

[R5] 5.Talanquer V., DBER and STEM education reform: Are we up to the challenge? J. Res. Sci. Teach. 51, 809–819 (2014). [Google Scholar]

[R6] 6.Henderson C., Beach A., Finkelstein N., Facilitating change in undergraduate STEM instructional practices: An analytic review of the literature. J. Res. Sci. Teach. 48, 952–984 (2011). [Google Scholar]

[R7] 7.Dolan E. L., Biology education research 2.0. CBE Life Sci. Educ. 14, ed1 (2015). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R8] 8.Brewe E., Pelaez N. J., Cooke T. J., From vision to change: Educational initiatives and research at the intersection of physics and biology. CBE Life Sci. Educ. 12, 117–119 (2013). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Leydesdorff L., Theories of citation? Scientometrics 43, 5–25 (1998). [Google Scholar]

[R10] 10.Small H. G., Cited documents as concept symbols. Soc. Stud. Sci. 8, 327–340 (1978). [Google Scholar]

[R11] 11.Small H., Co-citation in the scientific literature: A new measure of the relationship between two documents. J. Am. Soc. Inf. Sci. 24, 265–269 (1973). [Google Scholar]

[R12] 12.Small H. G., A co-citation model of a scientific specialty: A longitudinal study of collagen research. Soc. Stud. Sci. 7, 139–166 (1977). [Google Scholar]

[R13] 13.Small H., Griffith B. C., The structure of scientific literatures I: Identifying and graphing specialties. Sci. Stud. 4, 17–40 (1974). [Google Scholar]

[R14] 14.Braam R. R., Moed H. F., Van Raan A. F., Mapping of science by combined co-citation and word analysis I. Structural aspects. J. Am. Soc. Inf. Sci. 42, 233–251 (1991). [Google Scholar]

[R15] 15.Yu Y.-C., Chang S.-H., Yu L.-. C., An academic trend in STEM education from bibliometric and co-citation method. Int. J. Inf. Educ. Technol. 6, 113–116 (2016). [Google Scholar]

[R16] 16.Tang K.-Y., Wang C.-Y., Chang H.-Y., Chen S., Lo H.-C., Tsai C.-C., The intellectual structure of metacognitive scaffolding in science education: A co-citation network analysis. Int. J. Sci. Math. Educ. 14, 249–262 (2016). [Google Scholar]

[R17] 17.Boyack K. W., Klavans R., Börner K., Mapping the backbone of science. Scientometrics 64, 351–374 (2005). [Google Scholar]

[R18] 18.Lane D. C., Jackson M. C., Only connect! An annotated bibliography reflecting the breadth and diversity of systems thinking. Syst. Res. Behav. Sci. 12, 217–228 (1995). [Google Scholar]

[R19] 19.American Association for the Advancement of Science, Vision and Change: A Call to Action (AAAS, 2010) [Google Scholar]

[R20] 20.P. M. Senge, The Fifth Discipline: The Art and Practice of the Learning Organization (Doubleday and Company, 1990). [Google Scholar]

[R21] 21.L. von Bertalanffy, General System Theory: Foundations, Development, Applications (George Braziller, 1968). [Google Scholar]

[R22] 22.Frank M., Engineering systems thinking and systems thinking. Syst. Eng. 3, 163–168 (2000). [Google Scholar]

[R23] 23.Ben Zvi Assaraf O., Orion N., Development of system thinking skills in the context of earth system education. J. Res. Sci. Teach. 42, 518–560 (2005). [Google Scholar]

[R24] 24.Jacobson M. J., Wilensky U., Complex systems in education: Scientific and educational importance and implications for the learning sciences. J. Learn. Sci. 15, 11–34 (2006). [Google Scholar]

[R25] 25.Kessler M. M., Bibliographic coupling between scientific papers. Am. Doc. 14, 10–25 (1963). [Google Scholar]

[R26] 26.Boyack K. W., Klavans R., Co-citation analysis, bibliographic coupling, and direct citation: Which citation approach represents the research front most accurately? J. Am. Soc. Inf. Sci. Technol. 61, 2389–2404 (2010). [Google Scholar]

[R27] 27.Shiffrin R. M., Börner K., Mapping knowledge domains. Proc. Natl. Acad. Sci. U.S.A. 101 (Suppl. 1), 5183–5185 (2004). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R28] 28.Börner K., Chen C., Boyack K. W., Visualizing knowledge domains. Annu. Rev. Inf. Sci. Technol. 37, 179–255 (2003). [Google Scholar]

[R29] 29.Börner K., Sanyal S., Vespignani A., Network science. Annu. Rev. Inf. Sci. Technol. 41, 537–607 (2007). [Google Scholar]

[R30] 30.R Core Team, R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing, 2013); www.R-project.org/.

[R31] 31.Sci2 Team, Science of Science (Sci2) Tool (Indiana University and SciTech Strategies, 2009); https://sci2.cns.iu.edu.

[R32] 32.Waltman L., van Eck N. J., A smart local moving algorithm for large-scale modularity-based community detection. Eur. Phys. J. B 86, 471 (2013). [Google Scholar]

[R33] 33.Fortunato S., Community detection in graphs. Phys. Rep. 486, 75–174 (2010). [Google Scholar]

[R34] 34.Shannon P., Markiel A., Ozier O., Baliga N. S., Wang J. T., Ramage D., Amin N., Schwikowski B., Ideker T., Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003). [DOI] [PMC free article] [PubMed] [Google Scholar]

[R35] 35.Online Computer Library Center, WorldCat.org Services. 2001-2016 OCLC Online Computer Library Center Inc. (2016); retrieved from http://worldcat.org.

PERMALINK

Document co-citation analysis to enhance transdisciplinary research

Caleb M Trujillo

Tammy M Long

Abstract

INTRODUCTION

Background on DCA

Fig. 1. DCA.

RESULTS

Identification of key communities and publications

Fig. 2. Systems thinking document co-citation network.

Table 1. Highly co-cited documents.

Validation of co-cited documents and communities

Table 2. Validation.

Fig. 3. Subjects of co-citation communities.

Fig. 4. Comprehensive co-citation network.

DISCUSSION

Implications for research

Limitations of co-citation networks

MATERIALS AND METHODS

Generating a co-citation network

Table 3. Steps adapted from a general process for mapping knowledge domains were implemented to build a co-citation network from bibliographic data.

Data acquisition

Data processing

Analysis

Visualization

Validation of identified documents and communities

Internal consistency

Validity of communities

Stability of results

Supplementary Material

Acknowledgments

SUPPLEMENTARY MATERIALS

REFERENCES AND NOTES

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases