How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

Eric A Jensen; Paul Wong; Mark S Reed

doi:10.1371/journal.pone.0264914

. 2022 Mar 10;17(3):e0264914. doi: 10.1371/journal.pone.0264914

How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

Eric A Jensen ^1,^2,^*, Paul Wong ³, Mark S Reed ⁴

Editor: Lutz Bornmann⁵

PMCID: PMC8912237 PMID: 35271630

Abstract

This study investigates how research data contributes to non-academic impacts using a secondary analysis of high-scoring impact case studies from the UK’s Research Excellence Framework (REF). A content analysis was conducted to identify patterns, linking research data and impact. The most prevalent type of research data-driven impact related to “practice” (45%), which included changing how professionals operate, changing organizational culture and improving workplace productivity or outcomes. The second most common category was “government impacts”, including reducing government service costs and enhancing government effectiveness or efficiency. Impacts from research data were developed most frequently through “improved institutional processes or methods” (40%) and developing impact via pre-analyzed or curated information in reports (32%), followed by “analytic software or methods” (26%). The analysis found that research data on their own rarely generate impacts. Instead they require analysis, curation, product development or other forms of significant intervention to leverage broader non-academic impacts.

1 Introduction

Making a positive difference in the world, or “impact”, has long been a driving force for researchers across the disciplinary spectrum, whether they generate research data or not. There is now also growing interest in the impact of research from funders who want evidence of the value of their research investments to society [1, 2]. This interest has been driven, in part, by successive economic crises that have intensified the need to justify continued public investment in research. In response to this, the first national assessment of research impact was conducted by the UK via its Research Excellence Framework in 2014 (REF2014), and now there are national assessments of research impact in The Netherlands, Sweden, Italy, Spain, Norway, Poland, Finland, Hong Kong, Australia, New Zealand and the USA (for details, see Reed et al. [1]). The analysis of impact case studies included in this research is based on the definition of impact used in REF2014, as “an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia” [10] (p. 26). According to REF2014 guidance, the primary functions of an impact case study were to articulate and evidence the significance and reach of impacts arising from research beyond academia, clearly demonstrating the contribution that research from a given institution contributed to those impacts [10].

Investment in research data capacity and infrastructure has been a long-standing and important pillar of public funding for research. However, evaluations of the impact of research data have tended to focus on assessments of return on investment. For example, Beagrie and Houghton^3-6 estimated the return on investment for the British Atmospheric Data Centre, the Economic and Social Data Service and the Archaeology Data Service in the UK, and the European Bioinformatics Institute (EMBL-EBI), which was established to manage public life-science data on a large scale. While EMBL-EBI’s annual operating cost was £47 million, Beagrie and Houghton [2] estimated that the annual return on investment in using EMBL-EBI data was £920 million with another £1 billion from efficiency gain from using its data and services. Similarly, AuScope is an Australian national infrastructure investment in geoscience and geospatial data capability to support data intensive research and industries. Using similar economic modelling, Lateral Economics estimated that the economic benefits of AuScope was $3912 million AUD, while the total economic cost of AuScope is $261 million AUD, amounting to $15 AUD of benefits for every $1 AUD of investment [3].

Other approaches have also been considered for evaluating the impact of research data, including the use of bibliometrics and altmetrics [4, 5] and usage and end-user surveys [6]. There is clearly a broad interest in understanding how research data and infrastructures contribute to broad socio-economic impacts. This interest is also reflected in the development of an OECD reference framework in 2019 for assessing the scientific and socio-economic impact of research infrastructures [7]. However, each of the metrics used so far to evaluate the impact of research data only provide: a narrow economic perspective on the value of research data (in the case of monetary valuation methods); crude proxies for wider benefits (in the case of bibliometrics and altmetrics); or incomplete assessments of benefits, biased towards the immediate users of data (who are often researchers).

Based on a wider critique of metrics for evaluating impact [8], REF2014 adopted a case study approach, which has now been widely replicated and adapted for impact evaluation internationally (Reed et al. [1]). In the context of REF, case studies are historical narratives that reconstruct past events and claim that research by a submitting institution has had, or is having, causal impacts on society [9]). The use of case studies enabled the integration of metrics with evaluation data generated from a range of other methods (including qualitative), as part of a narrative description of the widest possible impacts arising from research [10]. The resulting publication of over 6,000 impact case studies in 2014 was unique in terms of its size and scope. Although the scores of individual case studies were not made public, it is possible to infer high (3* or 4*) versus low (unclassified, 1* or 2*) from an analysis of the published results. This provides a unique opportunity to conduct secondary analysis of impact case studies that were perceived by evaluation panels to have successfully demonstrated impact (i.e., high scoring cases) to extract new insights about impact generation processes.

In this paper, we have identified a sub-set of case studies that relied heavily on research data, to provide a more comprehensive evaluation of the wider benefits of research data than has ever been possible before. The sample is not representative of UK research institutions or disciplines, and the case studies are self-selected and written to showcase positive impacts rather than to provide more balanced, independent evaluations of outcomes. However, the breadth of case studies in the database makes it possible to evaluate the value of research data across a range of institutions (compared to economic evaluations which tend to focus on individual institutions), considering data generated and used across the disciplinary spectrum, in an exceptionally wide range of application contexts. Whilst imperfect, the analysis of these case studies offers an opportunity to provide insights into the societal benefits of investing in research data that have not previously been possible. In particular, we seek to: 1) understand the pathways and processes through which research data was used to generate impacts; and 2) evaluate the range of different impacts generated from research data, considering the potential for multiple impacts arising over time in many different domains.

2 Methods

The research evaluates the accounts contained in 2014 impact cases for the UK’s Research Excellence Framework (REF). The research focuses on the content of REF2014 impact narratives, investigating how research data delivered positive societal outcomes and what factors enabled such outcomes to develop. However, as a linguistic corpus, the detail impact description of all impact cases is made up of over 6,000 distinct cases with 6.3 million words in total. Impact case studies submitted to the exercise followed a set structure: 1 –Summary of the impact; 2 –Underpinning research; 3 –References to the research; 4 –Details of the impact; 5 –Sources to corroborate the impact [10]. Using this structure, case studies had four pages within which to show how a body of work (consisting of up to six research outputs by the submitting institution) contributed, whether directly or indirectly to “an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia”. In REF2014, research outputs and impact were peer reviewed in 36 disciplinary ‘Units of Assessment’. Panels consisted of academics and research users who were recruited to help evaluate impacts.

As a first step we apply a standard Informal Retrieval (IR) technique (Manning et al. [11], Baeza-Yates and Ribeiro-Neto [12]) to identify candidate cases where data featured prominently in the impact narratives. The aim is to reduce the volume of cases to a manageable size where a quantitative “content analysis” can be used. We focused on cases where the term “data” and its variant “dataset”, “datasets”, “database” and “databases” are mentioned explicitly in the impact narratives. The underlying procedure has been implemented in R (under MIT license) and is available at https://github.com/squarcleconsulting/ukrefimpactmined.

The application of IR involved creating a simplified representation of the corpus: the impact narratives were converted into a document-term matrix where the Term-Frequency Inverse Document Frequency (TF-IDF) was computed for each term in each impact case. Word occurrences play a fundamental role in this simplified model and the definition of TF-IDF. “Term frequency” (TF) is the ratio between the number of occurrences of a given term in a given impact case and the total number of all term instances in the case. “Inverse Document frequency” (IDF) is the logarithm of the ratio between the total number of cases over the number of cases with at least one occurrence of a given term. TF-IDF is the product of TF and IDF. Intuitively, a word frequently mentioned in an impact case (e.g., “health”) can provide information about the case. However, if the word is also mentioned in every impact case, then the word becomes less useful for differentiating the content of different impact cases. By comparing the TF-IDF value of the term “data” (and its variant) against the TF-IDF values of other terms in an impact case, we can provide a rough measure of how prominently the term “data” is featured in the impact narratives. Higher TF-IDF value would mean playing a more prominent role in the narrative. More specifically, the TF-IDF values of all terms in a given impact case were ranked by quartiles (25^th, 50^th, and 75^th percentiles). This allowed us to identify 484 impact cases where the TF-IDF values of the term “data” (and its variant) are above the third quartile of the ranking. This is further filtered by focusing on only those cases amongst the 484 cases with a REF rating of 3* or 4* (i.e., those that were high impact rating). There are 148 remaining cases as noted in Table 1. A manual review of these 148 cases were conducted to ensure their relevance.

Table 1. Different stages of information retrieval to identify candidate cases.

Summary	Value
Number of impact cases available	6637
Number of cases with “data” mentioned in impact section	2213
Number of cases where TF-IDF (data) is above Q3	484
Number of cases where TF-IDF (data) is above Q3 and REF rating are 3* or 4*	148

Open in a new tab

The second step of this study was conducted using quantitative content analysis, a well-established process for converting unstructured data into structured, numerical data in a reliable and accurate way (Krippendorff [13]; Neuendorf [14]). For this analysis, a subset of the sample (approx. 20%) was used to develop an analytic framework, with specific categories, definitions and examples. The framework was built through manually assessing this subset, extracting relevant parts and refining it over the course of the analysis. This process provided key operational definitions and was designed to address the research questions for the project.

The categories were applied systematically by research assistants independently perusing randomly allocated content. More than one category could be allocated to each impact case study, if applicable. To keep the content analysis focused, impact-related text passages were extracted manually from the longer impact narratives before coding. Finally, the different dimensions uncovered through the quantitative analysis were supplemented by the extraction of case examples. The detailed content analysis codebook (including specific analysis protocols) is available as supplementary material and the full data are available on the open access database Zenodo.

The content analysis process yielded quantifiable results, underpinned by intercoder reliability checks on randomly selected subsets. Intercoder reliability refers to the extent to which independent analysts (or ‘coders’) evaluating the same content characteristics have reached the same conclusion. A high level of agreement is taken as evidence that the content analysis has identified characteristics that are objectively evident in the texts being analysed (e.g. Jensen & Laurie [15]).

The first step in evaluating inter-coder reliability is to have the members of the coding team independently code the same (randomly selected) sub-set of sample cases. In accordance with this, 10% of the cases analysed by at least two analysts were randomly selected and tested for inter-coder reliability using Krippendorff’s Alpha (or ‘Kalpha’). As part of this inter-coder reliability randomly selected sub-sample, 52 units were analysed statistically. Variables showing ‘1.0’ in the Kalpha table means that there are no disagreements by the two analysts.

The results (Table 2) show that there were very good inter-coder reliability scores across the variables (all above .8 Kalpha, which is the established benchmark for good reliability). All differences were resolved by the lead researcher.

Table 2. Krippendorff’s alpha for variables analysed in this study.

Variable Name	Kalpha
Impact Type Overall Variable (see Table 3)	0.9288
Impact Pathways Overall Variable (see Table 5)	1.0
Impact Pathway: Searchable Database	1.0
Impact Pathway: Report or Static Information	0.9618
Impact Pathway: Mobile App	1.0
Impact Pathway: Analytic Software or Methods	0.9478
Impact Pathway: Improved Institutional Processes / Methods	1.0
Impact Pathway: Sharing of Raw Data	1.0
Impact Pathway: Sharing of Tech / Software	1.0
Impact Pathway: Other Impact Instrument	1.0
Impact Pathway: Unclear / Uncertain	1.0

Open in a new tab

Finally, it is important to highlight a key methodological limitation in this study. We are using impact case narratives that are being crafted to tell stories about the impact of research. This means that there may be an incentive for case study authors to emphasize how essential the research outputs (and also research data) were to the impacts described in the case. We must be cautious, therefore, in making generalizations to all research-linked impacts that may be taking place, many of which may have been systematically omitted from REF impact case studies specifically because multi-dimensional pathways to impact make causal claims more tenuous.

3 Results

3.1 Types of impact associated with research data

The first level of analysis in this research was designed to identify the types of impacts that are most linked to research data. The initial analysis of the cases revealed a set of different types of impact linked to research data, defined in Table 3.

Table 3. Identified types of impact.

Impact	Description
Government Spending / Efficiency Impact	Reducing the cost of delivering government services; increasing impact/quality of government service without raising cost.
Other Government / Policy Impact	Changing public policy or government regulations, or how either of these are implemented.
Practice Impact	Changing the ways that professionals operate; changing organizational culture; improving workplace productivity or outcomes; improving the quality of products or services through better methods, technology, understanding of the problems, etc.
General Public Awareness Impact	Improving public knowledge about a topic or increasing public visibility or attention for an issue.
Justice / Crime Reduction / Public Safety Impact	Reducing crime; Increasing efficiency in reducing crime; Improving justice outcomes (i.e., fairer; less cost; better social outcomes).
Public Health Impact	Improvements to the health of the population or a part of the population.
Economic Impact	Greater revenue, profit or market share is developed for a company or sector using research data.
Environmental Impact	Improvements in the natural environment, or reductions in threats or harm.
Other Kind of General Public Impact	Benefits for the public (not professionals/government) that are not explicitly stated above in another category.
Other Non-Academic Impact	REF-eligible non-academic impacts not falling into any of the categories above. (i.e., cannot include academic publications or improvements to teaching within a researcher’s own institution, as these would not be REF-eligible).
Unclear / Uncertain	Not enough detail or clarity to clearly identify.

Open in a new tab

The most prevalent types of research data-driven impact in our sample were related to Practice (45%) and Government (21%), which includes both Government Spending / Efficiency (6%) and Other Government / Policy impacts (15%). Likewise, other types of research data-driven impacts such as Economic impact (13%) and General Public Awareness impact (10%) were also represented in a noteworthy minority of cases (Table 4). REF2014 impact case content with impact dimensions that fit in more than one field were categorized for each impact separately.

Table 4. Prevalence of different types of impact associated with research data.

Impact Type	Percentage
Practice Impact	45%
Other Government / Policy Impact	15%
Economic Impact	13%
General Public Awareness Impact	10%
Government Spending / Efficiency Impact	6%
Public Health Impact	5%
Other Kind of General Public Impact	3%
Remaining	4%

Open in a new tab

The findings show that 45% of research data-linked impacts focused on practice (Table 4). In these cases, the research data have been used to develop changes in the ways that professionals operate. These changes have a direct (or indirect) impact on the organisational culture, improving workplace productivity or outcomes or improving the quality of products or services through better methods, technology, understanding of the problems. For example, because of the application of research data collected using the Work-Related Quality of Life (WRQoL) scale, there was a marked improvement in workplace wellbeing documented in an impact case study:

“Survey results at organisation level “allowed us to focus on sickness absence", prompting the introduction of a new sickness absence procedure: "days lost […] for stress/mental ill health for academic staff have more than halved from 1453 days to 604 days, a[n annual] saving estimated to be in the region of £100,000”.

Twenty-one percent of impacts were related to government and policy impacts, including at least one of the following benefits:

Reducing the cost of delivering government services
Increasing impact/quality of government services, without raising costs
Changing public policy or government regulations

An example of a case that focuses on reducing the cost of delivering government services or increasing impact/quality of government services without raising costs involved the UK government agency, the Office for National Statistics (ONS). Here, an analysis of research data from the Office for National Statistics was used as a basis to develop a ’disclosure framework and software package’ for the release of microdata to a vast range of end-users, including central and local government departments, and the health service. That is, ONS data were used to develop methods and software that in turn could be used to leverage value from ONS data for wide range of stakeholders across government.

Research data were also used to develop impacts on both the economy (13%) and general public awareness (10%). Examples of economic impact included cases that delivered health care cost savings while improving patient health outcomes. General public awareness impacts included changes to how people perceived and understood information related to contemporary topics, such as bullying and social media use, among others. Such general public-oriented impacts often focused on improving public knowledge about a topic or increasing public visibility or attention for an important issue.

3.2 How impact develops from research data

Impacts were developed from research data in several different ways, we refer to as impact pathways. Here, we analyse the nature of these different interventions, which could be understood as the means of developing impact, or the impact generating activities. The impact development approaches we identified are summarized in Table 5.

Table 5. Identified ways of developing impact, or impact pathways.

Impact pathways	Description
Searchable Database	A database that can be accessed to view the research data in a dynamic way (that is, offers the ability to select variables/filters, allowing for customized information to be accessed by users to use for their own purposes).
Reports or static information	Report containing pre-analysed/curated information, a static database, results tables or other methods of presenting the research data as processed information to be used without customisation or filtering of the data.
Mobile App	An application designed for smartphone or tablet to access the research data or an analysis/results of the data.
Analytic Software or Methods	Research data used to generate or refine software or research/analytic methods.
Improved Institutional Processes / Methods	Research data used to make an institution’s way of operating better/more efficient or more effective at delivering outcomes.
Sharing of Raw Data	Research data has an impact via being shared with others (in raw or minimally anonymized form) outside of the research team that generated the data so that they can do something with it (e.g. further analysis, etc.).
Sharing of Tech / Software	The research data have an impact via sharing technology or software that was created using the research data or that uses the research data somehow.
Other Impact Instrument	A clearly identifiable impact instrument that does not fit into any of the categories listed above.
Unclear / Uncertain	Impact instrument that is not detailed enough to clearly place into any pre-specified category.

Open in a new tab

The most common ways of developing impact from research data were ‘Improved Institutional Processes or Methods’ (40%), ‘Reports or static information’ (32%) and ‘Analytic Software or Methods’ (26%) (Table 6). As multiple ways of developing impact could be used in tandem, the analysis allowed for multiple impact instruments to be identified for a single impact.

Table 6. Prevalence of impact development pathways.

Impact pathways	Percentage
Improved Institutional Processes or Methods	40%
Reports (or other static, curated information)	32%
Analytic Software or Methods	26%
Sharing of Tech / Software	14%
Searchable Database	10%
Sharing of Raw Data	9%
Mobile App	1%
Other Impact Pathways	4%

Open in a new tab

Improving ‘institutional processes or methods’ was a major pathway to developing impact. One example of this category impact development comes from research on regulatory practice:

“[On regulatory practice] Raab’s research has also had a more direct impact on the regulation of information privacy, informing specific policies and frameworks in the UK and beyond. [. . .] His distinct contribution [to the Scottish Government’s Identity Management and Privacy Principles report] is evident in principles that warn against discrimination and social exclusion in identification processes; that encourage organisations to raise awareness of privacy issues; and that promote transparency [. . .] These have been widely cited and applied in Scottish Government policy”.

Other examples where research data were used to enhance institutional processes include changes in data privacy practices and enhancements in the transparency of companies’ operating procedures.

Research data were often used to develop impact through the production of ‘reports’ or other similar types of prepared information. Such reporting distils research data in a way that makes them intelligible for institutions, making the data useful for a wider user base outside of academia.

‘Analytic software or methods’ were also used to develop impact from research data. An example of this category of impact development can be drawn from the previous example showing a ‘government’ impact relating to the UK’s Office for National Statistics. This is because the way that the researchers developed impact involved using research data to create a “disclosure framework and software package”, which was then used to develop impact:

“Analysis of research data from the Office for National Statistics used as a basis to develop a ’disclosure framework and software package’ for the release of microdata to a vast range of end-users, including central and local government departments, and the health service.”

That is, the mechanism for developing impact in this case was improved methods for data disclosure and new software to implement those improved methods.

4 Discussion

The link between data and impact may be explained by a re-purposing of the Knowledge Hierarchy [16, 17] in which data is transformed into information when it is analysed and interpreted, information becomes knowledge when it is understood in a given context, and knowledge becomes “wisdom” or “enlightenment” when it is used effectively or “for good”. Definitions of data in the literature emphasise the lack of meaning or value of data that has not been processed, analysed, given context or interpretation [18, 19]. The definitions of wisdom and enlightenment by Zeleny [16] and Ackoff [17] have much in common with definitions of impact, as they suggest that the purpose of data is ultimately to be transformed into something “effective”, that “adds value” and is “socially accepted” [20]. Jashapara [21] focuses on the ethical judgement required to use knowledge to “act critically or practically in any given situation (pp. 17–18). It is clear that the perception of whether knowledge has been applied “wisely” will depend on the ontological and epistemological perspective of the evaluator. As Reed et al. [1] suggest, “impact is in the eye of the beholder; a benefit perceived by one group at one time and place may be perceived as harmful or damaging by another group at the same or another time or place”.

Such value judgements and assumptions are rarely explicit in definitions of either wisdom or research impact. However, for data to generate impact, it must first be transformed into information and knowledge, and knowledge is inherently subjective. The interpretation of information is strongly influenced by the mode of transmission (e.g. socially mediated via peers versus mass media; REF), context (e.g. Bandura’s idea [22] that all knowledge is formed through “social learning”, even when it is not socially mediated, as the interpretation of information is influenced by prevailing social, cultural and political contexts and worldviews) and power relations (e.g. determining who has access to data, how particular interpretations of data are managed to restrict the range of information available, and how information is used to drive decisions through overt and covert forms of power over and power with [23, 24]. In this light, Reed et al. [1] define research impact as, “demonstrable and/or perceived benefits to individuals, groups, organisations and society (including human and non-human entities in the present and future) that are causally linked (necessarily or sufficiently) to research” (emphasis added).

The research findings pertaining to impact pathways show how data from multiple disciplines contributed towards the generation of impact and emphasizes the importance of both analysing and interpreting data, and making this information easily accessible to decision-makers. Indeed, ‘reports or static information’ (32%) and ‘analytic software or methods’ (26%) were among the most frequent ways in which REF impact case studies claimed impacts arose from data. It is likely that these reports, software and methods played an important role in establishing causal relationships between research and impact, which has been shown to be a strong predictor of high scores in REF2014 case studies (Reichard et al. [25]). Reichard et al. found that high scoring case studies contained more attributional phrases connecting research to impact, and contained more explicit causal connections between ideas, and more logical connectives than low-scoring cases. The most common way in which data led to impact was via ‘improved institutional processes or methods’ (40%), suggesting the importance of adaptively (re-)interpreting information in specific operational contexts. This is consistent with the finding of Bonaccorsi et al. [9] that institutional agents (including university, government, NHS and the European Union) played an important role in demonstrating claims that research was causally related to impacts in REF2014 case studies (especially in the social sciences and humanities, which cited more agents per case study than STEM subjects). Louder et al. [26] have argued that it is essential for such process-related impacts to be captured in case studies, and Gow and Redwood [27] contend that they should have an impact category of their own, namely “procedural impacts”.

This study is the first attempt to analyse the development of impact from data, focusing for the first time on examples of research where data played a major role in the generation of both research findings and impact. It suggests that analysis, curation, product development or other strong interventions are needed to leverage value from research data. These interventions help to bridge the gap between research data and potential users or beneficiaries. While good data management, open data and streamlined access to data are necessary, further interventions are needed to maximize impact from research data. Extending the use of research data beyond the scope of academia requires not only traditional academic research skills, but also capabilities in public communication, entrepreneurship and boundary-crossing [27]. The findings show how impact can be facilitated through closer links between government, industry and researchers, capacity building and funding both for researchers to effectively use research data for developing impact and for potential beneficiaries to establish links with researchers and make research data available in usable formats.

Acknowledgments

We would like to express deep appreciation for the time, consideration and input of the ARDC Advisory Group. The project also benefited from the contributions of the UK REF’s impact case study authors and the open publication of these case studies. Finally, researchers contributing to this project include: Dr Jessica Norberto, Dr Benjamin Smith, Dr Aaron Jensen, Lars Lorenz, Christian Moll.

Data and code availability

The underlying computer code and data for this research are available at

https://github.com/squarcleconsulting/ukrefimpactmined (MIT licence)
https://doi.org/10.5281/zenodo.3543505 (CC-BY 4.0 International Attribution)
This work has benefited from a large number of R packages, including R core language version 3.4.4 [28], dplyr [29], easypackages [30], geometry [31], gglot2 [32], ggpubr [33], hugh [34], igraph [35], LDAvis [36], mallet [37], NLP [38], readr [39], refimpact [40], rJava [41], rsvd [42], Rtsne [43–45] stm [46], stmCorrViz [47], stringr [48], textminR [49], tidyr [50], tidytext [51], tm [52, 53], and topicmodels [54].

Data Availability

Computer code is available at https://github.com/squarcleconsulting/ukrefimpactmined (MIT licence). Please note that due to the Term of Use issued by REF UK, the source data of all impact case studies cannot be redistributed via github and must be retrived directly from https://impact.ref.ac.uk/casestudies/ Analysed data is available at https://doi.org/10.5281/zenodo.3543505 (CC-BY 4.0 International Attribution).

Funding Statement

Eric Jensen received funding from the Australian Research Data Common (ARDC) for this study. The ARDC played no role in the design, data collection and analysis of this research.

References

1.Reed M., et al. Evaluating research impact: a methodological framework (in Press). Res. Policy. [Google Scholar]
2.Beagrie N. & Houghton J. W. The Value and Impact of the European Bioinformatics. (2016). [Google Scholar]
3.Lateral Economics Ltd. AuScope Infrastructure Program–evaluation of impacts. (2016). [Google Scholar]
4.Urquhart C. & Dunn S. A bibliometric approach demonstrates the impact of a social care data set on research and policy. Health Info. Libr. J. 30, 294–302 (2013). doi: 10.1111/hir.12040 [DOI] [PubMed] [Google Scholar]
5.Belter C. W. Measuring the value of research data: A citation analysis of oceanographic data sets. PLoS One 9, (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Collins E. Use and Impact of UK Research Data Centres. Int. J. Digit. Curation 6, 20–31 (2011). [Google Scholar]
7.OECD. Reference Framework for Assessing the Scientific and Socio-Economic Impact of Research Infrastructures. https://www.oecd-ilibrary.org/docserver/3ffee43b-en.pdf?expires=1554432241&id=id&accname=guest&checksum=B5561A1B326FC5BFA51B342D89E7B97E (2019) doi: 10.1787/23074957 [DOI] [Google Scholar]
8.Wilsdon J., et al. The metric tide: report of the Independent Review of the Role of Metrics in Research Assessment and Management. http://www.hefce.ac.uk/pubs/rereports/Year/2015/metrictide/Title,104463,en.html (2015) doi: 10.13140/RG.2.1.4929.1363 [DOI] [Google Scholar]
9.Bonaccorsi A., Melluso N., Chiarello F. & Fantoni G. OUP accepted manuscript. Sci. Public Policy (2021) doi: 10.1093/scipol/scab037 [DOI] [PMC free article] [PubMed] [Google Scholar]
10.HEFCE. Research Excellence Framework (REF): Assessment framework and guidance on submissions. http://www.ref.ac.uk/media/ref/content/pub/assessmentframeworkandguidanceonsubmissions/GOS including addendum.pdf (2011). [Google Scholar]
11.Manning C. D., Raghavan P. & Schütze H. Introduction to Information Retrieval. (Cambridge University Press, 2008). [Google Scholar]
12.Baeza-Yates R. & Ribeiro-Neto B. Modern Information Retrieval: The Concepts and Technology behind Search. (Addison-Wesley Publishing Company, 2011). [Google Scholar]
13.Krippendorff K. Content Analysis: An Introduction To Its Methodology. (Thousand Oaks, CA: Sage, 2004). [Google Scholar]
14.Neuendorf K. A. The Content Analysis Guidebook. (Thousand Oaks, CA: Sage, 2017). doi: 10.4135/9781071802878 [DOI] [Google Scholar]
15.Jensen E. & Laurie C. Doing real research: A practical guide to social research. (SAGE: London, 2016). [Google Scholar]
16.Zeleny M. Management Support Systems: Towards Integrated Knowledge Management. Hum. Syst. Manag. 7, 59–70 (1987). [Google Scholar]
17.Ackoff R. L. From Data to Wisdom. J. Appl. Syst. Anal. 16, 3–9 (1989). [Google Scholar]
18.Jessup L. M. & Valacich J. S. Information Systems Today. (Prentice Hall, Upper Saddle River, NJ, 2003). [Google Scholar]
19.Bocij P., Chaffey D., G A. & Hickie S. Business Information Systems: Technology, Development and Management for the e-Business. (FT Prentice Hall, Harlow, 2003). [Google Scholar]
20.Rowley J. The wisdom hierarchy: Representations of the DIKW hierarchy. J. Inf. Sci. 33, 163–180 (2007). [Google Scholar]
21.Jashapara A. Knowledge Management: an Integrated Approach. (FT Prentice Hall, Harlow, 2005). [Google Scholar]
22.Albert Bandura. Social learning theory. (Prentice Hall Englewood Cliffs, N.J, 1977). [Google Scholar]
23.Clegg S. R. Frameworks of power. (London: Sage, 1989). [Google Scholar]
24.Lukes S. Power: A Radical View. (Palgrave Macmillian, 2005). [Google Scholar]
25.Reichard B., et al. Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014. Palgrave Commun. 6, 31 (2020). [Google Scholar]
26.Louder E., Wyborn C., Cvitanovic C. & Bednarek A. T. A synthesis of the frameworks available to guide evaluations of research impact at the interface of environmental science, policy and practice. Environ. Sci. Policy 116, 258–265 (2021). [Google Scholar]
27.Gow J. & Redwood H. Impact in International Affairs. (Routledge, 2020). doi: 10.4324/9781003023081 [DOI] [Google Scholar]
28.R Core Team. R: A Language and Environment for Statistical Computing. (2020). [Google Scholar]
29.Wickham H., François R., Henry L. & Müller K. dplyr: A Grammar of Data Manipulation. (2020). [Google Scholar]
30.Sherman J. easypackages: Easy Loading and Installing of Packages. (2016). [Google Scholar]
31.Habel K., Grasman R., Gramacy R. B., Mozharovskyi P. & Sterratt D. C. geometry: Mesh Generation and Surface Tessellation. (2019). [Google Scholar]
32.Wickham H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag; New York, 2016). [Google Scholar]
33.Kassambara A. ggpubr: ‘ggplot2’ Based Publication Ready Plots. (2020). [Google Scholar]
34.Jiang H., et al. huge: High-Dimensional Undirected Graph Estimation. (2020). [Google Scholar]
35.Csardi G. & Nepusz T. The igraph software package for complex network research. InterJournal Complex Sy, 1695 (2006). [Google Scholar]
36.Sievert C. & Shirley K. LDAvis: Interactive Visualization of Topic Models. (2015). [Google Scholar]
37.Mimno D. mallet: A wrapper around the Java machine learning tool MALLET. (2013). [Google Scholar]
38.Hornik K. NLP: Natural Language Processing Infrastructure. (2020). [Google Scholar]
39.Wickham H. & Hester J. readr: Read Rectangular Text Data. (2020). [Google Scholar]
40.Stephenson P. refimpact: API Wrapper for the UK REF 2014 Impact Case Studies Database. (2017). [Google Scholar]
41.Urbanek S. rJava: Low-Level R to Java Interface. (2020). [Google Scholar]
42.Erichson N. B., Voronin S., Brunton S. L. & Kutz J. N. Randomized Matrix Decompositions Using {R}. J. Stat. Softw. 89, 1–48 (2019). [Google Scholar]
43.van der Maaten L. J. P. & Hinton G. E. Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). [Google Scholar]
44.van der Maaten L. J. P. Accelerating t-SNE using Tree-Based Algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014). [Google Scholar]
45.Krijthe J. H. {Rtsne}: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation. (2015). [Google Scholar]
46.Roberts M. E., Stewart B. M. & Tingley D. {stm}: An {R} Package for Structural Topic Models. J. Stat. Softw. 91, 1–40 (2019). [Google Scholar]
47.Coppola A., Roberts M., Stewart B. & Tingley D. stmCorrViz: A Tool for Structural Topic Model Visualizations. (2016). [Google Scholar]
48.Wickham H. stringr: Simple, Consistent Wrappers for Common String Operations. (2019). [Google Scholar]
49.Jones T. textmineR: Functions for Text Mining and Topic Modeling. (2019). [Google Scholar]
50.Wickham H. tidyr: Tidy Messy Data. (2020). [Google Scholar]
51.Silge J. & Robinson D. tidytext: Text Mining and Analysis Using Tidy Data Principles in R. JOSS 1, (2016). doi: 10.1007/s00216-016-9710-x [DOI] [PubMed] [Google Scholar]
52.Feinerer I. & Hornik K. tm: Text Mining Package. (2020). [Google Scholar]
53.Feinerer I., Hornik K. & Meyer D. Text Mining Infrastructure in R. J. Stat. Softw. 25, 1–54 (2008). [Google Scholar]
54.Grün B. & Hornik K. {topicmodels}: An {R} Package for Fitting Topic Models. J. Stat. Softw. 40, 1–30 (2011). [Google Scholar]

PLoS One. doi: 10.1371/journal.pone.0264914.r001

Decision Letter 0

Lutz Bornmann

24 Feb 2021

PONE-D-20-30429

How research data delivers non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

PLOS ONE

Dear Dr. Wong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Apr 10 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

We look forward to receiving your revised manuscript.

Kind regards,

Lutz Bornmann

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

https://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and

https://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. We note that the grant information you provided in the ‘Funding Information’ and ‘Financial Disclosure’ sections do not match.

When you resubmit, please ensure that you provide the correct grant numbers for the awards you received for your study in the ‘Funding Information’ section.

3. We note you have included a table to which you do not refer in the text of your manuscript. Please ensure that you refer to Table 1, 5 and 6 in your text; if accepted, production will need this reference to link the reader to the Table.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: Partly

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript "How research data delivers non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies" analyzes the different kinds of non-academic impact, research data from impact case studies from the REF have had.

Overall, the manuscript should be of interest to the readers of PLoS One. The applied methodology seems to be valid. However, some revisions should be made before publication.

The existing literature about non-academic impact in relation to REF should be discussed. A quick search in Web of Science [ ts=(REF and ("societal impact" or "non-academic impact") ) ] reveals previous studies that would be suitable for inclusion in such a short literature overview. The DOIs of 13 out of the 14 papers are: 10.1186/s13063-020-04425-9, 10.1093/bjc/azz076, 10.1007/s11024-019-09384-3, 10.1080/03075079.2018.1455082, 10.1002/berj.3572, 10.1093/reseval/rvz015, 10.1016/j.joi.2019.01.008, 10.1007/978-3-319-95723-4_4, 10.1371/journal.pone.0173152, 10.1007/s11192-016-2115-y, 10.1177/1748048516655718, 10.1093/reseval/rvv007, and 10.1093/reseval/rvu028.

Lines 159 and 160 refer to Krippendorff’s Alpha as "Kalpha". However, Table 2 uses "Alpha" as a column header. That is somewhat confusing.

Lines 181-183 contain duplicated phrases ("... types of impacts that are most commonly linked most commonly linked to research data. ... revealed a set of different types of impact different types of impact ...") and do not have a proper ending. This paragraph has to be rephrased.

Some table numbers seem to be wrong. Please check the cross-referencing in lines 192, 196, 233, and 238.

Table 3 defines eleven types of impact but Table 4 contains only seven types of impact plus a type named "other". Are the remaining types grouped into "other"? If so, this should be explained in the text. This new group "other" seems rather redundant as there has already been another group with "Other Kind of General Public Impact"

Line 201 has a duplicated explanation of the abbreviation WRQoL, once in line and once in the footnote. One explanation is enough.

On Table 6, the impact developments "Mobile App" and "Unclear/Uncertain are missing in comparison with Table 5 without any explanation.

After adding the short literature overview mentioned above, the authors should highlight their contributions beyond the knowledge from prior literature.

I appreciate that the authors are providing data and code. However, the manuscript should include more information about the used program, i.e., which version of R was used, and which packages were loaded? The code loads a long list of packages. These packages have recommended citations that should be included in the manuscript, see for example: citation('ggplot2').

Reviewer #2: This paper examines case studies provided by the Research Excellence Framework (REF). In particular, the paper focuses on highly data-related research activities and distinguishes different types of impact these research activities generate and by which means this is accomplished. This is a highly relevant topic and the idea to use text analyses based on the REF data seems to be promising. However, there are some points (mostly on the methodological approach and how it is described) that should be addressed in a revised version:

The general research question and how the methodological approach is aligned to the research question seems a bit unclear: is it about (1) the impact of providing research data, or (2) the impact of using research data (or something else). The methodological approach does not seem to be able to distinguish these two approaches. If (2) should be the focus of this paper, I am not sure whether focusing on the TF-IDF values above the 3rd quartile (ll. 140-143) is necessary at all (if "data" is mentioned, the case is very likely to make use of some sort of data; what does it mean if the term "data" is mentioned many times in a submission, compared to only once?).

98: "3" -> "3*"

Section 2: A more detailed description of the REF data would be helpful, e.g.: What entities are evaluated in the REF2014 (Papers, Projects, …)? Who evaluates them? What do the *-ratings mean? From which disciplines are the cases / how does the distribution across disciplines look like (if possible)? How long are the narratives / is there a minimum/maximum number of words? This could be placed in an additional subsection in section 2. Also the paragraph from line 167 to line 173 could be moved in this new subsection.

127-129: what does "are … featured prominently" mean here, can this be specified (e.g., if the share of words indicating data-related case exceeds a certain threshold, the case is selected)?

140-142: "… are within the third quartile": Should this be "… are above the third quartile"? The third quartile is a particular number, so a TF-IDF cannot be within the third quartile.

141-143: For the cases identified as data-related, a manual check on what role research data actually play in these cases could be included here. This may also help to address my first comment.

144: Here, the authors could mention (in the text) the number of cases remaining after excluding all cases not having 3* or 4* ratings, and also insert a reference to table 1.

147-151: A short description of the content analysis (how the categories were defined, how they were assigned to the cases) would help the reader to understand the results and also reliability analysis.

159: Are the 52 units all of the units that occur in the data?

Table 2: Where are these categories used? They seem to be similar to the categories presented in Table 3 and Table 5, but they are not identical.

181-183: There seem to be some typos in this paragraph.

Section 3.1: Some additional information (and a reference to the supplementary material) on the impact types presented in the table and how they were defined / assigned to the cases would be helpful to understand the result and to assess their objectivity. In particular, the following questions should be answered: The categories in Table 3 don't occur in Table 2; are Why are the two categories "Government Spending / Efficiency Impact" and "Other Government / Policy Impact" listed separately in the table, while they are merged in lines 189-190? A problem of these impact types may be that they have different levels of specificity (e.g., "Practice Impact" is a very general impact type that may apply to cases from various disciplines, i.e. it is no surprise that it is assigned to many cases; in contrast, "Public Health Impact" or "Environmental Impact" are more specific, since they refer to a particular discipline).

188-192: Can multiple impact types be assigned to one case, or exactly one impact type for each case?

192 & 196: "Table 2" -> should this be Table 4?

201-202: "Work-Related Quality of Life Scale(WRQoL) scale" -> "Work-Related Quality of Life (WRQoL) scale"

Section 3.2: As in section 3.1, some information on how the categories presented in Table 5 were determined and assigned to the cases would be helpful to understand (see the comments on section 3.1). Also the specificity of the categories seems also be problematic here and should be addressed.

238: "Table 4" -> "Table 6"?

302-303: "… emphasises the importance of both analysing and interpreting data, and making this information easily accessible to decision-makers" -> I would agree that these are important issues. But I don't see how this paper backs this up. Instead, the paper focuses on which type of impact is generated and by which means this is done (which is already a highly relevant topic). Thus, I would suggest that this sentence is reformulated accordingly (either the second part removed, or some additional arguments given backing up this claim).

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2022 Mar 10;17(3):e0264914. doi: 10.1371/journal.pone.0264914.r002

Author response to Decision Letter 0

31 Jul 2021

Please see Response to Reviewers letter for point by point responses to comments.

Attachment

Submitted filename: Response to Reviewers (Jensen et al.).docx

Click here for additional data file.^{(25.2KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0264914.r003

Decision Letter 1

Lutz Bornmann

7 Oct 2021

PONE-D-20-30429R1How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studiesPLOS ONE

Dear Dr. Wong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Nov 21 2021 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.

We look forward to receiving your revised manuscript.

Kind regards,

Lutz Bornmann

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: No

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: Previous literature about non-academic impact in relation to REF should be reviewed more inclusively in the introduction.

Table 2 contains main empty rows that probably should be deleted.

I tried to reproduce the results. I obtain an error: 'inputs/ImpactOnly.csv' does not exist in current working directory'.

The error seems to occur in line 96 of the R script refimpact.R:

ImpactCase <<- read_delim("inputs/ImpactOnly.csv", "\\t", escape_double = FALSE, trim_ws = TRUE)

I find only the following files in that directory:

0-GPA.csv CaseStudyID.csv impactid-ukprn.csv UoA.csv

Please check the Git repository.

Reviewer #2: The paper has much improved and all points have been adressed. There are only a few minor points that should be changed/clarified:

Table 1: "in Q3" -> "above Q3"

Table 2: This table is much more understandable now, but now the different dimensions for impact type are missing, while the dimensions for impact pathway are still included. Showing the results also for the impact pathway dimensions, or giving a reason why you don't show them, would make this more consistent.

257-258: "Table 5. Identified ways of developing impact, or impact pathways." -> "Table 5."

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2022 Mar 10;17(3):e0264914. doi: 10.1371/journal.pone.0264914.r004

Author response to Decision Letter 1

10 Oct 2021

Reviewer #1: Previous literature about non-academic impact in relation to REF should be reviewed more inclusively in the introduction.

As noted previously additional literature has been integrated into the introduction and discussion, focusing on other recent analyses of REF2014 case studies, and the inclusion of impact in REF. We would appreciate the reviewer point us to specific papers or publications that should be included for review.

Table 2 contains main empty rows that probably should be deleted.

Empty rows deleted. Please note Word does not track changes to modification of table structure.

Reviewer #1

I tried to reproduce the results. I obtain an error: 'inputs/ImpactOnly.csv' does not exist in current working directory'.

The error seems to occur in line 96 of the R script refimpact.R:

ImpactCase <<- read_delim("inputs/ImpactOnly.csv", "\\t", escape_double = FALSE, trim_ws = TRUE)

I find only the following files in that directory:

0-GPA.csv CaseStudyID.csv impactid-ukprn.csv UoA.csv

Please check the Git repository.

We very much appreciate the reviewer taking the effort to rerun the R script. Line 96 does reference a file called ImpactOnly.csv. But the omission of ImpactOnly.csv is intentional. The universities in the UK could not agree on a single licensing arrangement. As such many universities retained their rights over their impact cases. While we have obtained permission from the REF UK to use the impact cases for our study here, we do not have the right to make copies and redistribute these cases (see Term of Use, section on Copies for research). As such, we have not included the file ImpactOnly.csv in the Git repository – the file contains the entire collection of impact description of cases and redistribution of this file could be construed as a violation of the Term of Use. Having said this, we did provide instructions on how to obtain the collection. The README.md file included in the code distribution, section on “Download the Whole Collection” states:

By default, refimpact.R assumes that the user has manually downloaded the entire collection and saved the file in ukrefimpact/inputs/ImpactOnly.csv. Follow the instruction below to download the whole collection:

• Go to UK Impact Case Studies and click “See all case studies”

• Scroll down and click “None selected”

• Check “Details of impact” and proceed to click “download”

• When download is completed, open CaseStudies.xlsx in Excel

• Rename columns: “Case Study Id” with “ID”, “Unit of Assessment” with “FoR” and “Details of the impact” with “impact”

• Use TRIM(CLEAN(SUBSTITUTE(cell, CHAR(160), “ “))) in Excel to remove all non-printable characters (including line breaks) and extra white spaces in “impact” column

• Remove column “Institution” and “Title”

• Save CaseStudies.xlsx as a tab delimited text file with the name “ImpactOnly.csv”

• Put ImpactOnly.csv in refimpact/inputs

Alternatively, you can turn on automatic download with the fetchUKImpact(auto=TRUE) call. But by default the function will only download the first five impact cases, you can set the value using k=... option.

Please also note that the README.md also states that the R version tested was 3.4.4 under Ubuntu 16.04.5 and OSX 10.13.6. We cannot guarantee the code will run under a different operating systems or environment.

Reviewer #2: The paper has much improved and all points have been addressed. There are only a few minor points that should be changed/clarified:

Table 1: "in Q3" -> "above Q3"

Amended.

Table 2 has referenced both table 3 and 5. We feel that the inclusion of details from table 3 and 5 inside table 2 would make it more difficult to interpret and understand table 2.

257-258: "Table 5. Identified ways of developing impact, or impact pathways." -> "Table 5."

It is unclear to us why the title of table 5 should be omitted. All the tables in the paper include a short title/description. Such an omission would make it inconsistent with the rest of the paper.

Attachment

Submitted filename: Response to Reviewers (Jensen et al.)V2.docx

Click here for additional data file.^{(26.7KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0264914.r005

Decision Letter 2

Lutz Bornmann

17 Nov 2021

PONE-D-20-30429R2How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studiesPLOS ONE

Dear Dr. Wong,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process. Please submit your revised manuscript by Jan 01 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.
A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.
An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

We look forward to receiving your revised manuscript.

Kind regards,

Lutz Bornmann

Academic Editor

PLOS ONE

Journal Requirements:

Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #1: All comments have been addressed

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #1: (No Response)

Reviewer #2: There is only one minor point I would still recommend to change. The title of table 5 should be removed in the text (ll. 257-258), not in the table caption itself (l. 260). All other mentions of tables in the text do not seem to include the titles (e.g. "There are 148 remaining cases as noted in Table 1. A manual review of these 148 cases were conducted to ensure their relevance", ll. 155-157).

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

PLoS One. 2022 Mar 10;17(3):e0264914. doi: 10.1371/journal.pone.0264914.r006

Author response to Decision Letter 2

16 Jan 2022

See upload response to reviewer and editor comments.

Attachment

Submitted filename: Response to Reviewers (Jensen et al.)V3.docx

Click here for additional data file.^{(23.4KB, docx)}

PLoS One. doi: 10.1371/journal.pone.0264914.r007

Decision Letter 3

Lutz Bornmann

22 Feb 2022

How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

PONE-D-20-30429R3

Dear Dr. Wong,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Lutz Bornmann

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

Reviewer #2: (No Response)

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

Reviewer #2: Yes

**********

6. Review Comments to the Author

Reviewer #2: In the latest version of the document, it seems the title of table 5 is still included in the reference to this table in l. 257. Since this is only a minor formatting issue, I recommend to accept the manuscript for publication, but you may want to check this before publication.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #2: No

PLoS One. doi: 10.1371/journal.pone.0264914.r008

Acceptance letter

Lutz Bornmann

1 Mar 2022

PONE-D-20-30429R3

How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

Dear Dr. Wong:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Lutz Bornmann

Academic Editor

PLOS ONE

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Attachment

Submitted filename: Response to Reviewers (Jensen et al.).docx

Click here for additional data file.^{(25.2KB, docx)}

Attachment

Submitted filename: Response to Reviewers (Jensen et al.)V2.docx

Click here for additional data file.^{(26.7KB, docx)}

Attachment

Submitted filename: Response to Reviewers (Jensen et al.)V3.docx

Click here for additional data file.^{(23.4KB, docx)}

Data Availability Statement

The underlying computer code and data for this research are available at

https://github.com/squarcleconsulting/ukrefimpactmined (MIT licence)
https://doi.org/10.5281/zenodo.3543505 (CC-BY 4.0 International Attribution)
This work has benefited from a large number of R packages, including R core language version 3.4.4 [28], dplyr [29], easypackages [30], geometry [31], gglot2 [32], ggpubr [33], hugh [34], igraph [35], LDAvis [36], mallet [37], NLP [38], readr [39], refimpact [40], rJava [41], rsvd [42], Rtsne [43–45] stm [46], stmCorrViz [47], stringr [48], textminR [49], tidyr [50], tidytext [51], tm [52, 53], and topicmodels [54].

[pone.0264914.ref001] 1.Reed M., et al. Evaluating research impact: a methodological framework (in Press). Res. Policy. [Google Scholar]

[pone.0264914.ref002] 2.Beagrie N. & Houghton J. W. The Value and Impact of the European Bioinformatics. (2016). [Google Scholar]

[pone.0264914.ref003] 3.Lateral Economics Ltd. AuScope Infrastructure Program–evaluation of impacts. (2016). [Google Scholar]

[pone.0264914.ref004] 4.Urquhart C. & Dunn S. A bibliometric approach demonstrates the impact of a social care data set on research and policy. Health Info. Libr. J. 30, 294–302 (2013). doi: 10.1111/hir.12040 [DOI] [PubMed] [Google Scholar]

[pone.0264914.ref005] 5.Belter C. W. Measuring the value of research data: A citation analysis of oceanographic data sets. PLoS One 9, (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0264914.ref006] 6.Collins E. Use and Impact of UK Research Data Centres. Int. J. Digit. Curation 6, 20–31 (2011). [Google Scholar]

[pone.0264914.ref007] 7.OECD. Reference Framework for Assessing the Scientific and Socio-Economic Impact of Research Infrastructures. https://www.oecd-ilibrary.org/docserver/3ffee43b-en.pdf?expires=1554432241&id=id&accname=guest&checksum=B5561A1B326FC5BFA51B342D89E7B97E (2019) doi: 10.1787/23074957 [DOI] [Google Scholar]

[pone.0264914.ref008] 8.Wilsdon J., et al. The metric tide: report of the Independent Review of the Role of Metrics in Research Assessment and Management. http://www.hefce.ac.uk/pubs/rereports/Year/2015/metrictide/Title,104463,en.html (2015) doi: 10.13140/RG.2.1.4929.1363 [DOI] [Google Scholar]

[pone.0264914.ref009] 9.Bonaccorsi A., Melluso N., Chiarello F. & Fantoni G. OUP accepted manuscript. Sci. Public Policy (2021) doi: 10.1093/scipol/scab037 [DOI] [PMC free article] [PubMed] [Google Scholar]

[pone.0264914.ref010] 10.HEFCE. Research Excellence Framework (REF): Assessment framework and guidance on submissions. http://www.ref.ac.uk/media/ref/content/pub/assessmentframeworkandguidanceonsubmissions/GOS including addendum.pdf (2011). [Google Scholar]

[pone.0264914.ref011] 11.Manning C. D., Raghavan P. & Schütze H. Introduction to Information Retrieval. (Cambridge University Press, 2008). [Google Scholar]

[pone.0264914.ref012] 12.Baeza-Yates R. & Ribeiro-Neto B. Modern Information Retrieval: The Concepts and Technology behind Search. (Addison-Wesley Publishing Company, 2011). [Google Scholar]

[pone.0264914.ref013] 13.Krippendorff K. Content Analysis: An Introduction To Its Methodology. (Thousand Oaks, CA: Sage, 2004). [Google Scholar]

[pone.0264914.ref014] 14.Neuendorf K. A. The Content Analysis Guidebook. (Thousand Oaks, CA: Sage, 2017). doi: 10.4135/9781071802878 [DOI] [Google Scholar]

[pone.0264914.ref015] 15.Jensen E. & Laurie C. Doing real research: A practical guide to social research. (SAGE: London, 2016). [Google Scholar]

[pone.0264914.ref016] 16.Zeleny M. Management Support Systems: Towards Integrated Knowledge Management. Hum. Syst. Manag. 7, 59–70 (1987). [Google Scholar]

[pone.0264914.ref017] 17.Ackoff R. L. From Data to Wisdom. J. Appl. Syst. Anal. 16, 3–9 (1989). [Google Scholar]

[pone.0264914.ref018] 18.Jessup L. M. & Valacich J. S. Information Systems Today. (Prentice Hall, Upper Saddle River, NJ, 2003). [Google Scholar]

[pone.0264914.ref019] 19.Bocij P., Chaffey D., G A. & Hickie S. Business Information Systems: Technology, Development and Management for the e-Business. (FT Prentice Hall, Harlow, 2003). [Google Scholar]

[pone.0264914.ref020] 20.Rowley J. The wisdom hierarchy: Representations of the DIKW hierarchy. J. Inf. Sci. 33, 163–180 (2007). [Google Scholar]

[pone.0264914.ref021] 21.Jashapara A. Knowledge Management: an Integrated Approach. (FT Prentice Hall, Harlow, 2005). [Google Scholar]

[pone.0264914.ref022] 22.Albert Bandura. Social learning theory. (Prentice Hall Englewood Cliffs, N.J, 1977). [Google Scholar]

[pone.0264914.ref023] 23.Clegg S. R. Frameworks of power. (London: Sage, 1989). [Google Scholar]

[pone.0264914.ref024] 24.Lukes S. Power: A Radical View. (Palgrave Macmillian, 2005). [Google Scholar]

[pone.0264914.ref025] 25.Reichard B., et al. Writing impact case studies: a comparative study of high-scoring and low-scoring case studies from REF2014. Palgrave Commun. 6, 31 (2020). [Google Scholar]

[pone.0264914.ref026] 26.Louder E., Wyborn C., Cvitanovic C. & Bednarek A. T. A synthesis of the frameworks available to guide evaluations of research impact at the interface of environmental science, policy and practice. Environ. Sci. Policy 116, 258–265 (2021). [Google Scholar]

[pone.0264914.ref027] 27.Gow J. & Redwood H. Impact in International Affairs. (Routledge, 2020). doi: 10.4324/9781003023081 [DOI] [Google Scholar]

[pone.0264914.ref028] 28.R Core Team. R: A Language and Environment for Statistical Computing. (2020). [Google Scholar]

[pone.0264914.ref029] 29.Wickham H., François R., Henry L. & Müller K. dplyr: A Grammar of Data Manipulation. (2020). [Google Scholar]

[pone.0264914.ref030] 30.Sherman J. easypackages: Easy Loading and Installing of Packages. (2016). [Google Scholar]

[pone.0264914.ref031] 31.Habel K., Grasman R., Gramacy R. B., Mozharovskyi P. & Sterratt D. C. geometry: Mesh Generation and Surface Tessellation. (2019). [Google Scholar]

[pone.0264914.ref032] 32.Wickham H. ggplot2: Elegant Graphics for Data Analysis. (Springer-Verlag; New York, 2016). [Google Scholar]

[pone.0264914.ref033] 33.Kassambara A. ggpubr: ‘ggplot2’ Based Publication Ready Plots. (2020). [Google Scholar]

[pone.0264914.ref034] 34.Jiang H., et al. huge: High-Dimensional Undirected Graph Estimation. (2020). [Google Scholar]

[pone.0264914.ref035] 35.Csardi G. & Nepusz T. The igraph software package for complex network research. InterJournal Complex Sy, 1695 (2006). [Google Scholar]

[pone.0264914.ref036] 36.Sievert C. & Shirley K. LDAvis: Interactive Visualization of Topic Models. (2015). [Google Scholar]

[pone.0264914.ref037] 37.Mimno D. mallet: A wrapper around the Java machine learning tool MALLET. (2013). [Google Scholar]

[pone.0264914.ref038] 38.Hornik K. NLP: Natural Language Processing Infrastructure. (2020). [Google Scholar]

[pone.0264914.ref039] 39.Wickham H. & Hester J. readr: Read Rectangular Text Data. (2020). [Google Scholar]

[pone.0264914.ref040] 40.Stephenson P. refimpact: API Wrapper for the UK REF 2014 Impact Case Studies Database. (2017). [Google Scholar]

[pone.0264914.ref041] 41.Urbanek S. rJava: Low-Level R to Java Interface. (2020). [Google Scholar]

[pone.0264914.ref042] 42.Erichson N. B., Voronin S., Brunton S. L. & Kutz J. N. Randomized Matrix Decompositions Using {R}. J. Stat. Softw. 89, 1–48 (2019). [Google Scholar]

[pone.0264914.ref043] 43.van der Maaten L. J. P. & Hinton G. E. Visualizing High-Dimensional Data Using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008). [Google Scholar]

[pone.0264914.ref044] 44.van der Maaten L. J. P. Accelerating t-SNE using Tree-Based Algorithms. J. Mach. Learn. Res. 15, 3221–3245 (2014). [Google Scholar]

[pone.0264914.ref045] 45.Krijthe J. H. {Rtsne}: T-Distributed Stochastic Neighbor Embedding using Barnes-Hut Implementation. (2015). [Google Scholar]

[pone.0264914.ref046] 46.Roberts M. E., Stewart B. M. & Tingley D. {stm}: An {R} Package for Structural Topic Models. J. Stat. Softw. 91, 1–40 (2019). [Google Scholar]

[pone.0264914.ref047] 47.Coppola A., Roberts M., Stewart B. & Tingley D. stmCorrViz: A Tool for Structural Topic Model Visualizations. (2016). [Google Scholar]

[pone.0264914.ref048] 48.Wickham H. stringr: Simple, Consistent Wrappers for Common String Operations. (2019). [Google Scholar]

[pone.0264914.ref049] 49.Jones T. textmineR: Functions for Text Mining and Topic Modeling. (2019). [Google Scholar]

[pone.0264914.ref050] 50.Wickham H. tidyr: Tidy Messy Data. (2020). [Google Scholar]

[pone.0264914.ref051] 51.Silge J. & Robinson D. tidytext: Text Mining and Analysis Using Tidy Data Principles in R. JOSS 1, (2016). doi: 10.1007/s00216-016-9710-x [DOI] [PubMed] [Google Scholar]

[pone.0264914.ref052] 52.Feinerer I. & Hornik K. tm: Text Mining Package. (2020). [Google Scholar]

[pone.0264914.ref053] 53.Feinerer I., Hornik K. & Meyer D. Text Mining Infrastructure in R. J. Stat. Softw. 25, 1–54 (2008). [Google Scholar]

[pone.0264914.ref054] 54.Grün B. & Hornik K. {topicmodels}: An {R} Package for Fitting Topic Models. J. Stat. Softw. 40, 1–30 (2011). [Google Scholar]

PERMALINK

How research data deliver non-academic impacts: A secondary analysis of UK Research Excellence Framework impact case studies

Eric A Jensen

Paul Wong

Mark S Reed

Roles

Abstract

1 Introduction

2 Methods

Table 1. Different stages of information retrieval to identify candidate cases.

Table 2. Krippendorff’s alpha for variables analysed in this study.

3 Results

3.1 Types of impact associated with research data

Table 3. Identified types of impact.

Table 4. Prevalence of different types of impact associated with research data.

3.2 How impact develops from research data

Table 5. Identified ways of developing impact, or impact pathways.

Table 6. Prevalence of impact development pathways.

4 Discussion

Acknowledgments

Data and code availability

Data Availability

Funding Statement

References

Decision Letter 0

Lutz Bornmann

Roles

Author response to Decision Letter 0

Decision Letter 1

Lutz Bornmann

Roles

Author response to Decision Letter 1

Decision Letter 2

Lutz Bornmann

Roles

Author response to Decision Letter 2

Decision Letter 3

Lutz Bornmann

Roles

Acceptance letter

Lutz Bornmann

Roles

Associated Data

Supplementary Materials

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases