Skip to main content
PLOS ONE logoLink to PLOS ONE
. 2020 Aug 28;15(8):e0238071. doi: 10.1371/journal.pone.0238071

Research data sharing in the Australian national science agency: Understanding the relative importance of organisational, disciplinary and domain-specific influences

Claire M Mason 1,*, Paul J Box 2, Shanae M Burns 1
Editor: Laurentiu Rozylowicz3
PMCID: PMC7454993  PMID: 32857794

Abstract

This study delineates the relative importance of organisational, research discipline and application domain factors in influencing researchers’ data sharing practices in Australia’s national scientific and industrial research agency. We surveyed 354 researchers and found that the number of data deposits made by researchers were related to the openness of the data culture and the contractual inhibitors experienced by researchers. Multi-level modelling revealed that organisational unit membership explained 10%, disciplinary membership explained 6%, and domain membership explained 4% of the variance in researchers’ intentions to share research data. However, only the organisational measure of openness to data sharing explained significant unique variance in data sharing. Thus, whereas previous research has tended to focus on disciplinary influences on data sharing, this study suggests that factors operating within the organisation have the most powerful influence on researchers’ data sharing practices. The research received approval from the organisation’s Human Research Ethics Committee (no. 014/18).

Introduction

Even though most researchers agree that data sharing supports scientific progress [1], actual levels of data sharing amongst researchers are relatively low [25]. Back in 2014, Tenopir et al. reported that fewer than 16% of the researchers made all of their data available. In spite of new policies, infrastructure and initiatives to promote research data sharing, more recent research confirms that less than one third of researchers share their data publicly [5].

Surveys exploring the barriers and enablers for sharing of research data have now been carried out in a number of research organisations, countries and disciplinary communities. These studies tell us, for instance, that researchers are less willing to share their research data if other researchers might use their data to publish before them, if they have to expend significant effort in order to share the data and if they believe that their data could be misinterpreted [4, 68]. While these factors represent the most immediate or proximal concerns of researchers, they reflect the institutional arrangements surrounding research practice such as the systems of rewards, rules and regulations, external legal systems and informal, epistemic community norms and conventions [9, 10]. While the digitization of data and work has made research data sharing possible, the slow take-up of research data sharing and data re-use has led to the recognition that optimal levels of data sharing will not occur without reshaping institutional arrangements so that data sharing is both facilitated and incentivized. To this end, research institutions (Universities, government, grant agencies, journal publishers) are now redesigning their infrastructure, regulation, policies and reward systems. Within government, industry and research, significant investment is being made in initiatives to support research data sharing, including for example, the adoption of FAIR data principles [11] and the CoreTrustSeal [12]. In spite of these efforts, most research data is still not publicly available [5, 7]. The failure to achieve greater impact from these initiatives may reflect the fact that the research carried out to date does not allow us to determine which institutions (those operating within research organisations, those associated with disciplinary communities or those from the application domain) have the greatest impact on researchers’ data sharing practices. This study represents a first effort to answer this question using multi-level modelling. Using survey data collected from researchers working in a national science organisation, we carried out modelling to estimate the proportion of variance in researchers’ intentions to share data that could be explained by their organisational unit, discipline and application domain.

Organisational unit

Organisational work units are known to be an important source of influence on employees’ attitudes and behaviours [1315]. Data sharing research confirms that this proposition also applies to researchers’ data sharing behaviour. For example, Huang et al. [16] found that amongst biodiversity scientists, respondents whose institute/university or funding agencies encouraged data sharing were more willing to share their research data and Sayogo and Pardo [8] reported that organisational involvement (in particular, organisational support for data management practices) predicted the likelihood that researchers would publish their data sets. Tenopir et al’s research [7, 17] finds that the majority of respondents believe that their organisation could do more to facilitate data sharing, by providing processes, training and funding to support long term data management. Organisational factors are therefore seen to play a role in either facilitating or hindering data sharing.

Discipline

Disciplinary influences on data sharing have also been reported by researchers and they have been estimated to explain between 19 and 5% of the variance in researchers’ data sharing practices [18, 19]. Researchers who work with data from human participants (e.g., social scientists and health researchers) are known to have lower levels of data sharing than researchers from other disciplines [1] which can be attributed to the special ethical and legal requirements pertaining to human data [5, 20]. Another driver of differences between research disciplines in their data sharing practices is the distinction between ‘big science’ and ‘small science’. Heidorn [21] argues that the large datasets collected by researchers from big science disciplines (such as astronomy, oceanography and physics) are more economical to curate than are the many small datasets captured in small science disciplines (such as biogeochemistry and psychology). Furthermore, researchers working in big science project often use the same instruments and have a greater need and motivation to coordinate efforts around data. To enable these big science initiatives, scientific infrastructure is put in place to support data storage and reuse of data [21]. In small science disciplines, research is generally driven by individual investigators or small teams who tend to collect small scale and more heterogeneous datasets. These factors mitigate against data sharing amongst researchers from small science disciplines. Rules surrounding disciplinary data repositories, funders’ policies (requiring researchers to provide a data management plan) and journal policies (requiring authors to share their data by submitting it to a data repositories) are another source of differences between disciplines which have been found to be correlated with researchers’ willingness to share data [17, 22, 23]. Together, the differences in ethics requirements, the economics of data storage and sharing (in big science versus small science disciplines) and the disciplinary-specific nature of data repositories, funders’ policies and journal policies will lead to disciplinary differences in data sharing practices.

Application domain

The third potentially important source of institutional influence on data sharing practices is the researcher’s application domain (or ‘domain’). Whereas the researcher’s discipline represents the academy or branch of knowledge that the researcher uses, the domain is the field or industry sector in which the knowledge is being applied [24]. This domain is formally classified as the field of research nominated by the researcher (e.g., in a grant application) but is also reflected in the type of organisation funding the research (e.g., a health provider vs a transport company). Each industry sector has unique legislative, regulatory and policy frameworks which establish rules and norms for data management and sharing which are reflected in the data sharing policies and IP requirements of research funders [9].

There is already evidence to suggest that these factors influence data sharing practices. While government agencies are enacting policies to mandate data sharing (because they generally fund research to inform public policy or achieve public good outcomes), private sector organisations usually fund research for private benefit and retain intellectual property rights which limit research data sharing [25]. Requirements from funding agencies to share data have been identified by researchers as important influences on their data sharing behaviour [10, 16, 22]. On the other hand, the growth in economic opportunities for commercialising data have led some industry actors to exploit new legal rights and mechanisms which allow them to maintain control over scientific data that used to be more accessible [26]. Tenopir et al’s [17] study found that researchers working in different sectors (e.g., government, not-for-profit, academic, commercial) reported significantly different levels of organisational involvement, training and assistance with data management and storage. Researchers working for government were more likely to report that their organisation had processes for data management and storage and researchers working in the commercial sector were slightly less willing to share their data than researchers employed in other sectors [17].

Thus, external influences on researchers’ data sharing practices can be delineated into three types: organisational, disciplinary and domain. From the surveys and interviews that have been carried out with researchers to date, we know that all three are seen to be important sources of influence on data sharing practices [7, 8, 10, 16, 22, 23]. The goal of this study was to establish the relative importance of these three sources by differentiating their impacts on research data sharing empirically.

Methodological approach

Differentiating the effects of organisational unit, disciplinary background and research domain on data sharing behaviour is a nontrivial methodological problem. When there is dependence among observations (e.g., when individuals are subject to the same higher-level influences) standard statistical formulas will underestimate the sampling variance and therefore lead to biased significance tests with an inflated Type I error rate. In such instances, multi-level modelling is required [27].

Two attempts to model the institutional influences on data sharing using multi-level modelling have been carried out to date but this work has focused on identifying variance in data sharing associated with the researcher’s disciplinary background. Kim and Stanton [19] found that that 19% of the total variance in data sharing behaviours could be explained by researchers’ disciplinary membership. A second study by Kim and Yoon [18] found that 5% of the variance could be explained by disciplinary membership (with the availability of disciplinary repositories explaining 13% of the disciplinary variance in data sharing). However, this modelling assumes that researchers with the same disciplinary background experience the same institutional influences. In practice, researchers from the same discipline may work in different application domains and (with the growth of multidisciplinary and transdisciplinary research) there may be researchers from multiple disciplines within the same organisational unit. In consequence, to understand institutional influences on data sharing we need to model disciplinary, organisational and domain factors separately.

Some combinations of discipline and domain will occur more commonly than others (e.g., astronomy researchers are not likely to work in the environmental domain) so we used partially crossed multi-level modelling [28] to estimate the relative importance of disciplinary, organisational and domain factors. This analysis allows us to quantify the proportion of variance in data sharing explained by organisational unit, discipline and domain membership. We also tested the explanatory power of specific organisational (data culture and peer encouragement), discipline (journal publishers’ requirements, availability of data repositories) and domain (contractual and regulatory inhibitors) factors in explaining the variance in research data sharing. Since our goal was to delineate external influences on research data sharing, we did not introduce any individual-level (researcher) predictors in the model.

Method

Sample

This research was approved by the CSIRO Social and Interdisciplinary Science Human Research Ethics Committee (approval number 014/18). Participants took part in an online survey, reading the information sheet approved by the ethics committee and then clicking "I consent" to confirm that they were willing to take part in the research. Participants who selected "I do not consent" went to an exit page and did not complete the survey.

The online survey was sent to research employees at the CSIRO, a national science agency with offices across Australia. The CSIRO has characteristics in common with both Universities and government organisations. Many of CSIRO’s employees have spent time working in Universities and they commonly collaborate with University researchers (e.g., in the delivery of research projects, writing of research publications and the supervision of PhD and postdoctoral students). However, whereas researchers at Universities have a teaching role, researchers in the CSIRO, an Australian federal government agency, are funded under the Science and Industry Research Act 1949 to address national objectives and support Australian industry, the community and the relevant Minister. The CSIRO is Australia’s largest patent holder and CSIRO employees publish more than 3,000 refereed CSIRO journal articles and reviews per year [29]. Based on normalised citation impact, CSIRO’s areas of research strength include social sciences, environment/ecology, plant and animal science, biology and biochemistry, engineering, microbiology, agricultural science, space science, and clinical medicine [29]. The CSIRO also manages infrastructure (e.g., The Australia Telescope National Facility) and biological collections (e.g., The Atlas of Living Australia, Australia’s national biodiversity database) for the benefit of research and industry. On 30 June 2017, CSIRO had a total of 5,565 staff (FTE of 4,990), of whom approximately 27% are classified as research scientists, 19% are classified as research managers and 32% are classified as research project staff [30]. Based on advice from CSIRO’s Information Services team, organisational units and roles that were not likely to be dealing with research data (e.g., finance and human resources) were removed from the organisation’s email list before the email containing the link to the online survey was sent out (by CSIRO’s Chief Scientist) to 3,658 CSIRO employees.

Eight hundred and six employees agreed to participate, representing a 22% response rate but only 381 respondents provided sufficient data to match them to an organisational unit, discipline and application domain. The sample for the multi-level analyses was further reduced because tests of inter-rater agreement require that the number of respondents in each group should be equal to or greater than the number of response options on the Likert scales [31] and requiring that all organisational, discipline and domain groups be represented by 7 or more respondents reduced the sample size to n = 354. The gender, age, discipline and sector diversity of the survey respondents is reported in Table 1. The sample was more male dominated (81%) than the organisation as a whole (60%) but this may reflect the gender composition of employees who work with research data. The sample provided representation from a range of research disciplines (n = 12) and application domains (n = 11).

Table 1. Characteristics of survey participants (n = 381).

Characteristics Number of Participants
Gender
    Male 242
    Female 100
    Prefer not to answer 11
    Missing 28
Age
    Under 25 years 3
    25 to 34 years 30
    35 to 44 years 120
    45 to 54 years 123
    55 to 64 years 60
    65 or more years 13
    Missing 32
Research Discipline
    Mathematics 11
    Physical Sciences 22
    Chemical Sciences 26
    Earth Sciences 41
    Environmental Sciences 106
    Biological Sciences 47
    Agricultural and Veterinary Sciences 43
    Information and Computing Sciences 19
    Engineering 27
    Technology 11
    Medical and Health Sciences 20
    Studies in Human Society 8
Application Domain
    Environment (private sector) 15
    Environment (public sector) 89
    Health Care (private sector) 23
    Health Care (public sector) 14
    Manufacturing (private sector) 23
    Natural resources (private sector) 42
    Natural resources (public sector) 37
    Primary Industries (private sector) 50
    Primary Industries (public sector) 42
    Science (private sector) 17
    Science (public sector) 29

Procedure

The survey was conducted online, with a link to the survey provided in an email from the organisation’s Chief Scientist. The email explained that the survey would inform the organisation’s data governance strategy and that double movie passes would be awarded to ten randomly identified survey participants. The survey was kept open for three weeks and two reminder emails were sent prior to closing the survey. CSIRO’s Information Services team provided a report on the number of data deposits made to the DAP (the CSIRO’s Data Access Portal) by each organisational unit. The research team integrated these data with the survey responses, using organisational unit as the linking variable.

Measures

Organisational unit, discipline and application domain

Survey participants were asked to select (from a list) the option which best described the organisational unit that they worked in, their research discipline (using the Australian and New Zealand Standard Research Classification, [32]) and their application domain (using Australian Government thesaurus of government functions, [33]). Since we expected domain factors to vary depending on whether researchers were working in the public sector or the private sector, participants also specified whether they primarily worked with either Industry or Government. Thus, application domain was coded according both to the sphere in which they were operating (e.g., primary industry) and whether they were working in the public or private sector.

Intentions to share data

Intentions to share data were assessed by asking researchers to report the likelihood that they would share research data with a list of potential targets outside of the relevant project team (researchers in their own organisational unit, researchers in other organisational units, researchers in their own discipline, research collaborators outside of CSIRO, research funders and the general public). The items were rated on a 5-point Likert scale ranging from “Extremely unlikely” to “Extremely likely”.

Peer support

We adapted three items from Curty’s [34] measure of social influence for data re-use to create a measure of peer support for data sharing, e.g., “My peers (in CSIRO) encourage me to share data”. The items were rated on a 7-point Likert scale ranging from “Strongly disagree” to “Strongly agree”.

Open data culture

To assess shared attitudes towards data sharing within organisational units we developed six items, each reflecting the belief that data should be made as openly available as possible in order to support scientific integrity and public benefit, for example, “Open data improves scientific integrity”. Each item was rated on a 7-point Likert scale ranging from “Strongly disagree” to “Strongly agree”.

Regulative pressure by journal publishers

Kim and Stanton’s [19] four item measure was used to assess this disciplinary factor, which assesses whether or not journals require researchers to share their data when their work is published (e.g., “Journals require researchers to share data”) on a 7-point Likert scale ranging from “Strongly disagree” to “Strongly agree”.

Data repositories

Kim and Stanton’s [19] three item measure was used to assess the availability of data repositories. To ensure that the items reflected a disciplinary factor, we introduced the scale with the words “In my discipline…” which was followed by each question (e.g., “Data repositories are available for researchers to share data”). The items were rated on a 7-point Likert scale ranging from “Strongly disagree” to “Strongly agree”. All the items were replicated from the ‘Data repository’ measure [19].

Contractual and regulatory inhibitors

To assess the impact of domain factors on data sharing, we asked researchers to rate the extent to which factors in their industry sector or government area inhibited their ability to share data, using a 7-point Likert scale ranging from “Not at all” to “A great deal”. A principal component analysis carried out on the five items revealed that it formed two separate factors. We labelled the first factor contractual inhibitors (e.g., “Contractual conditions i.e., the terms of the contract under which the data were generated or used”) and the second factor regulatory inhibitors (e.g., “Privacy requirements”).

Data deposits in the organisational repository

The CSIRO has an organisational repository known as the Data Access Portal which provides a secure mechanism for depositing data and software code. When employees publish their data on the platform, they are required to report which organisational unit they work in. We obtained a report which allowed us to count how many collections had been published on the portal for each organisational unit and we linked this measure with the survey data. However, the measure was highly positively skewed (many organisational units had either no deposits or very few deposits whereas a small number of units made very frequent deposits). Since extreme scores can have too much impact in analyses, we converted the measure of data deposits into a categorical variable (organisational units were classified as either having no deposits, fewer than five deposits, five to nine deposits or ten or more deposits).

Statistical analysis

The statistical modelling was carried out in R [35]. The multilevel package [36] was used for tests of within-group agreement and reliability while the lme4 package [37] was used to test the multi-level model since it is particularly well-suited to dealing with partially crossed data structures [38]. Since the data were not balanced (not all combinations of organisational unit, disciplinary and domain membership were represented in the data) we fitted our models using restricted maximum likelihood estimation (REML), a modification of maximum likelihood estimation that is more precise for mixed-effects modelling. However, when comparing the fit of alternative models it was necessary to use the standard maximum likelihood estimation. We used the ANOVA function in lmerTest [39] to obtain provide p-values for each explanatory factor in the multi-level model since Luke [40] reports that the Kenward-Roger and Satterthwaite approximations provided produce acceptable Type 1 error rates even for smaller samples. For all statistical procedures, an alpha level of 0.05 (two-tailed) was used to determine statistical significance.

Prior to carrying out our analyses we cleaned the data, checking that the data were normally distributed (as noted above, the measure of data deposits was highly skewed and was converted to a categorical variable) and removing records of participants who (a) did not specify which organisational unit, discipline and domain they worked in or (b) who came from organisational units where none of the respondents reported working with research data. This gave us a sample of 381 respondents and 31 organisational units for the initial analyses (factor analysis and correlations). The sample size for the multi-level analyses was further reduced because we removed data from organisational units, disciplines and domains that were represented by fewer than seven respondents. This left us with survey data from 354 researchers, who collectively represented 28 organisational units, 12 disciplinary groups and 11 application domains.

Additional decisions regarding our statistical procedures are described in the results section, as they emerged in the course of our analyses.

Results

Before commencing the multi-level modelling, we performed a principal component analysis to check the construct validity of the measurement items. The solution suggested extracting seven factors from the data and when we ran the principal component factor analysis with varimax rotation the seven factors explained 73% of the variance. The rotated factor solution exhibited good simple structure, with all items loading above 0.62 on their intended construct and none loading above 0.27 on other factors (see Table 2). Alpha coefficients for each scale are reported in the diagonal of Table 3. All measures displayed satisfactory reliability and validity (alpha coefficients of 0.70 or higher).

Table 2. Pattern matrix generated from the principal components analysis.

Item F1 F2 F3 F4 F5 F6 F7 h2
The following statements represent alternative ways of thinking about the role of data in the CSIRO. Please choose the response option which best describes your level of agreement with this statement.
    Data should be open and accessible by default (access to data should only be restricted where privacy, confidentiality or IP issues require it) 0.73 0.22 0.11 0.02 0.14 -0.07 0.07 0.63
    Our data should be harnessed for public good 0.71 0.15 0.01 0.13 -0.02 -0.07 -0.06 0.55
    Data sharing can have global and intergenerational benefits 0.73 0.24 0.02 0.06 0.07 -0.10 0.03 0.61
    The integrity of our research is improved when our data are available for others to review 0.84 0.10 0.07 0.07 -0.03 0.01 -0.09 0.74
    Open data improves scientific integrity 0.87 0.03 0.06 0.04 0.08 0.04 -0.07 0.78
    Making our data more accessible will improve the quality of our research 0.81 0.11 0.07 0.02 0.02 -0.05 0.01 0.67
In the next 12 months, how likely is it that you will share data from one of your research projects with:
    Colleagues in my business unit 0.06 0.66 0.08 0.27 0.03 0.20 -0.16 0.59
    Colleagues in other business units 0.17 0.74 -0.03 0.12 0.07 0.15 0.03 0.62
    Colleagues in my research discipline 0.18 0.82 0.12 0.13 0.11 -0.18 -0.12 0.80
    Other researchers 0.16 0.80 0.10 0.11 0.14 -0.13 -0.10 0.73
    Research collaborators/partners outside of CSIRO 0.10 0.70 0.05 0.05 0.06 -0.17 -0.06 0.54
    General public 0.25 0.64 0.10 0.02 0.12 -0.20 -0.04 0.54
In my discipline…
    Data sharing is mandated by journals’ policy 0.11 0.10 0.87 0.10 0.20 -0.01 -0.04 0.83
    Data sharing policy of journals is enforced 0.06 0.08 0.89 0.09 0.14 0.02 0.03 0.83
    Journals require researchers to share data 0.07 0.10 0.92 0.01 0.13 -0.02 -0.05 0.87
    Journals can penalize researchers if they do not share data 0.06 0.04 0.83 0.00 0.01 -0.07 0.00 0.69
In my opinion, my peers (in CSIRO)…
    Encourage me to share data 0.10 0.17 0.03 0.89 0.11 -0.06 -0.07 0.85
    Are supportive of the sharing of data 0.14 0.14 0.06 0.90 0.09 -0.08 -0.07 0.87
    Often share data 0.05 0.18 0.09 0.86 0.12 0.00 -0.07 0.81
In my discipline…
    Researchers can easily access data repositories 0.08 0.16 0.18 0.12 0.85 -0.05 -0.07 0.81
    Data repositories are available for researchers to share data 0.11 0.16 0.13 0.12 0.89 -0.07 -0.06 0.86
    Researchers have the data repositories necessary to share data 0.01 0.09 0.15 0.08 0.89 -0.04 -0.07 0.84
Within this area, how much do each of the following factors inhibit your ability to share data?
    Contractual conditions i.e., the terms of the contract under which the data were generated or used -0.10 -0.17 -0.08 -0.07 -0.03 0.84 0.17 0.79
    Constraints imposed by the ownership or licensing arrangements of third party data -0.05 -0.02 0.04 -0.04 -0.06 0.85 0.16 0.76
    Ethical restrictions 0.05 -0.14 -0.01 -0.18 -0.11 -0.02 0.84 0.78
    Privacy requirements -0.07 -0.21 -0.08 -0.14 -0.09 0.47 0.63 0.70
    Other legislation, regulation and policy (e.g., anti-trust concerns) -0.10 -0.08 0.01 0.07 -0.05 0.29 0.75 0.67

Alpha coefficients for each scale are shaded grey.

Table 3. Organisational unit-level correlations among survey measures and data deposits (N = 31).

Mean (SD) 1 2 3 4 5 6 7
1. Intentions to share data 3.90 (0.59) 0.86
2. Peer support 4.39 (0.50) 0.56** 0.91
3. Open data culture 3.46 (0.77) 0.21 0.15 0.89
4. Journals 3.85 (0.74) 0.36 0.25 0.48** 0.92
5. Repositories 3.67 (0.49) 0.15 -0.06 0.50** 0.43* 0.90
6. Contractual inhibitors 4.21 (0.49) 0.03 0.17 -0.44* -0.03 -0.19 0.79
7. Regulatory inhibitors 1.90 (0.42) -0.11 -0.19 0.01 -0.08 -0.07 0.41* 0.73
8. Data deposits 1.58 (0.46) 0.21 0.08 0.39* 0.03 0.07 -0.38* -0.35

* p <0.05

** p <0.01.

p <0.10.

Standardized alpha coefficients for each survey measure are reported in the diagonal.

We also aggregated these measures to the organisational unit level so that we could check whether they were correlated with the real-world measure of data sharing (number of deposits on the organisational data repository, classified as either none, fewer than 5, five to nine deposits or ten or more deposits). The correlations among the survey measures and the categorical measure of data deposits are shown in Table 3. We found that openness of the data culture (r = 0.39, p <0.05) and contractual inhibitors (r = -0.38, p <0.05) correlated significantly with organisational unit deposits. Regulatory inhibitors were marginally significantly correlated with data deposits (r = -0.35, p <0.10) but intentions to share data were not significantly correlated with data deposits (r = 0.21, p >0.05) and nor were the disciplinary measures (journals and repositories).

Estimating within-group agreement and reliability

Before carrying out multi-level analyses it is necessary to check whether the measures exhibit within-group agreement and between-group variance. If researchers from the same discipline provide similar ratings when asked about the level of regulatory pressure from journals but differ in their ratings when compared to researchers from other disciplines it supports treating the measure as a construct pertaining to the research discipline. To assess the level of within-group agreement, we calculated the multi-item rwg(j) statistic [41]. By convention, values at or above 0.70 are considered good agreement [38] but we also tested the statistical significance of the rwg(j) values by simulating rwg(j) values from a uniform null distribution for user supplied values of (a) average group size, (b) number of items in the scale, and (c) number of response options on the items. The results of these tests (see Table 4) indicated that there was greater agreement within organisational unit, disciplinary and domain groups for the relevant organisational, disciplinary and domain factors than would be expected by chance.

Table 4. Mean rwg(j) values and intraclass correlations for survey measures (n = 354).

Variable rwg(j) ICC(1) ICC(2)
Organisational unit groups
    Intentions to share data 0.7609** 0.1916 0.7634
    Peer support 0.6795* 0.1165 0.6421
    Openness 0.8834** 0.1680 0.7331
Disciplinary groups
    Intentions to share data 0.7096** 0.1234 0.8171
    Journals 0.8115** 0.0404 0.5722
    Repositories 0.7368** 0.1185 0.8102
Domain groups
    Intentions to share data 0.7264** 0.1063 0.8048
    Regulatory inhibitors 0.5838** 0.1444 0.8539
    Contractual inhibitors 0.5602** 0.2295 0.9116

* Denotes that the rwg(j) value is above the upper expected 95% confidence interval estimated using rwg.j.sim.

** Denotes that the rwg(j) value is above the upper expected 99% confidence interval estimated using rwg.j.sim.

Interclass correlations (ICCs) were calculated to check that each measure had significant between-group variance (see Table 4). The ICC(1) statistic represents the proportion of variance in the measure which is explained by the grouping factor whereas the ICC(2) represents another way of measuring agreement in group members’ ratings of the constructs. According to James [42], the median ICC(1) reported for group-level constructs is 0.12 and values between 0.05 and 0.20 are acceptable. All but the measure of regulative pressure from journals met this standard. The ICC(2) values were also acceptable for all measures except regulative pressure by journal publishers and peer support (values above 0.70 are generally agreed to represent sufficiently high agreement to support aggregation, [43]). Based on the low intraclass correlations for the measure of regulatory pressure by journal publishers, we did not include this measure in the multi-level modelling. We retained the measure of peer support since it demonstrated acceptable within-group agreement on the rwg(j) statistic and acceptable between-group variance.

Testing the random effects model

The first step in building a multi-level model is to estimate the random effects model (in which there are no predictors but there is a random intercept variance term for the different grouping variables and all combinations thereof). The model explains intentions to share data of researcher i in organisational unit j, and discipline k and domain l as follows:

Yijkl=γ0+uj+vk+wl+xjk+yjl+zkl+ajkl+eijkl

There are eight random effects in Equation (1):

uj ~N(0,σ2u) which is the random effect of organisational unit j

vk ~ N(0,σ2v) which is the random effect of discipline k

wl ~ (0,σ2w) which is the random effect of domain l

xjk ~ N(0,σ2 which is the random effect of the interaction between organisational unit j and discipline k

yjl ~ N(0,σ2y) which is the random effect of the interaction between organisational unit j and domain l

zkl ~ N(0,σ2z) which is the random effect of the interaction between discipline k and domain l

ajkl ~ N(0,σ2a) which is the random effect of the three-way interaction between organisational unit j, discipline k and domain l, and

eijkl ~ N(0, σ2e) which is the random effect of researcher i in organisational unit j and discipline k and domain l.

This model provides estimates of the variability in the intercepts, or in other words, the proportion of variance in intentions to share data that is associated with organisational unit, research discipline and application domain (and potentially all possible combinations of these variables).

While it is generally recommended that the possibility of interactions between crossed factors should be tested in cross-classified random effects modelling [44], it is also well known that complex models incorporating all possible interactions often fail to converge [45]. When we attempted to test the ‘maximal’ full random effects model, it had singular fit, a common outcome when the random effects structure is too complex for the data. In such cases, if there is no theoretical reason for expecting a random effect to be significant, the most complex element should be removed from the model [45]. Following these guidelines, we tested the random effects model again, gradually removing the most complex elements (first the three-way interaction, then the organisational unit by discipline interaction and finally the discipline by domain interaction term). At this point (with the three random effects of interest and a significant organisational unit by domain interaction included) the model converged.

The resulting model included the three random effects of interest (organisational unit, domain and discipline) and an organisational unit by domain interaction. The organisational unit by domain interaction was not of theoretical interest so we tested whether the fit of the model worsened when the organisational unit by domain interaction was removed (filtering the dataset to ensure that there were at least five respondents for each unique organisational unit and domain combination) and we found that it did not (χ2 = 0.8392, p = 0.36). With three random effects remaining in the model, we then checked whether the fit of the model worsened if the random effect for domain (which explained the least variance) was removed. This test revealed that model fit worsened significantly when the random effect for domain was not included (χ2 = 4.38, p <0.05). Importantly, this test supports our hypothesis that data sharing reflects the combined effect of individual, organisational unit, discipline and domain influences.

The output from this random effects model is presented in Table 5 below. We can calculate the proportion of variance explained by each grouping factor by dividing the corresponding variance component by the total of all variance components in the model. For example, the variance in intercepts for organisational unit membership is 0.07986 which represents 10.42% of the total variance (0.07986 + 0.04697 + 0.03048 + 0.60909) in intentions to share data. Organisational unit membership is therefore most important since it explains 10.42% of the total variance in intentions to share data. Disciplinary membership explains 6.13% of variance in data sharing and domain membership explains 3.98% of the variance in data sharing. The remaining 79.47% of variance is residual variance, attributable to either individual researcher factors or error. It is worth noting the difference between these variance estimates and the intraclass correlations (ICC(1)) reported earlier (19%, 12% and 11% respectively). The intraclass correlations overestimate the variance attributable to each grouping factor because they do not take into account the effects due to the other (correlated) grouping factors. Similarly, Kim and Stanton’s higher estimate of variance due to disciplinary factors (19.1%) is probably inflated because they did not model other grouping factors (university or departmental membership, domain membership) that would have contributed to non-independence in their data.

Table 5. Random effects model explaining intentions to share data (n = 354).

Random effects Name Variance Std.Dev.
    Organisational unit (Intercept) 0.07986 0.2826
    Discipline (Intercept) 0.04697 0.2167
    Domain (Intercept) 0.03048 0.1746
    Residual 0.60909 0.7804
Fixed effects Coefficient SE t ratio
    Data sharing (γ00) 3.4978 0.1114 31.41***

*** p <0.001.

† The 354 researchers represented in the dataset represented 28 organisational units, 12 research disciplines and 11 application domains.

Testing the full model

The next step involves testing the full model in which the explanatory variables (open data culture, peer support, data repositories, contractual inhibitors and regulatory inhibitors) are tested as predictors of organisational-, disciplinary- and domain-specific variance in intentions to share data.

With the explanatory variables in the model, the total unexplained variance in the model was reduced from 0.7664 (random effects model) to 0.6763 (full model), indicating that the organisational unit, disciplinary and domain factors explained 12% of the variance in data sharing (see Table 6). However, only one of the predictors explained significant unique variance in data sharing. The openness of the data culture (organisational unit members agreeing that data sharing can have global and intergenerational benefits and that data sharing supports scientific integrity) was a significant predictor of intentions to share data (t = 2.19, p <0.05). None of the other factors in the model explained significant unique variance.

Table 6. Full model explaining intentions to share data (n = 354).

Random effects Variance Std.Dev.
    Organisational unit 0.0255 0.1598
    Discipline 0.0171 0.1308
    Domain 0.0157 0.1253
    Residual 0.6180 0.7861
Fixed effects Beta Standard error t
    Data sharing (γ00) 0.7818 1.0735 0.73
    Peer support 0.1529 0.1389 1.10
    Openness 0.3363 0.1534 2.19*
    Data repositories 0.1947 0.1293 1.51
    Contractual inhibitors -0.0447 0.1049 -0.42
    Regulatory inhibitors -0.1982 0.1511 -1.31

* p <0.05.

† The 354 researchers represented in the dataset represented 28 organisational units, 12 research disciplines and 11 application domains.

Discussion

The goal of this research was to delineate the relative importance of organisational, disciplinary and domain effects on research data sharing. To date, these factors have not been clearly differentiated and in consequence, estimates of the importance of different variables pertaining to these factors (such as disciplinary norms or organisational rewards or contractual conditions) are likely to have been biased. By using a multilevel modelling technique which differentiates these influences statistically, we found that there is an independent effect from organisational unit, research discipline and application domain on researchers’ intentions to share data. Furthermore, whereas previous research has tended to focus on the role of disciplinary norms and resources in influencing data sharing practices [e.g., 18, 19, 46, 47], this study suggests that factors operating at the organisational unit level have the most powerful influence on researchers’ data sharing practices.

This study is also the first to demonstrate that self-report measures pertaining to data sharing are correlated with real-world data sharing behaviour. The measure of intentions to share data was not correlated with organisational units’ data deposits but this finding is likely to reflect the fact that researchers share their data via a wide range of channels (the researchers in our survey reported using emails, ftp sites, Dropbox and internal shared drives to share their research data). In light of the sample size for this analysis (n = 31) it was extremely encouraging to find that some of the survey measures (namely, openness of data culture and contractual inhibitors) were correlated with this real-world data sharing behaviour.

The findings from the random effects model (which revealed that researchers’ organisational unit, disciplinary and domain membership each explain unique variance in intentions to share data) are just as important as the findings from the full model. The statistical power of multi-level models is influenced both by the number of groups in the sample and the number of individuals in each group [48] making it especially challenging to achieve high statistical power when assessing effects for multiple, crossed institutional factors. The fact that open data culture emerged as a significant predictor not only reflects the fact that organisational unit explains more of the variance in data sharing behaviour but also the larger number of organisational units represented in our sample. Power in a multilevel analysis reflects the number of observations at the level of the effect being detected [49] and our sample provided observations from 28 organisational units but only 12 research disciplines and 11 application domains. Therefore, the failure to observe significant effects for the disciplinary and domain variables in the model may be due to low statistical power and we recommend retaining these factors for further investigation with a larger sample.

Limitations

Some limitations associated with this study should be acknowledged before considering the implications of the research. Since we collected data from only one organisation, we were not able to investigate how much influence between-organisation factors have on research data sharing. In addition, we were not able to model the full organisational structure. In the organisation where this study was conducted, researchers are structured within Business Units, Research Programs and Research Groups. We carried out our analyses on Programs because our initial analyses indicated that little additional variance was explained by Business Units and Groups. Even with these compromises, the power for our analyses was low.

Second, as is common in surveys of this nature [19, 22], the survey had a low response rate (22%) which means that the responses may not have been representative of all researchers in the organisation. Perhaps more important, our model was tested in a government research organisation. Researchers working in government organisations may experience less autonomy than researchers in Universities, who are generally free to direct their own research agenda. Researchers in government organisations may also be more likely to work in multidisciplinary organisational units than researchers in a university setting. Both factors might explain why the organisational unit emerged as such an important predictor of variance in data sharing in this study. Furthermore, government employees may be under less pressure to publish in academic journals, making the effect of disciplinary factors less important. Nevertheless, we believe it has been useful to explore institutional factors affecting research data sharing in a government science organisation, since most research on this subject has focused on researchers employed in Universities. Future research is needed to explore whether our findings generalise to other organisations and settings.

Practical implications

This study was intended to provide insights that could be used to guide the design of interventions to facilitate research data sharing within CSIRO. The fact that we were able to differentiate the effect of discipline, organisational unit and domain membership on intentions to share data suggests that a whole of ecosystem approach will be needed to achieve optimal levels of data sharing. However, of the three sources of influence that we investigated, it was those within the organisation (i.e., organisational unit membership) which were most strongly related to intentions to share data. This understanding has some important implications for those seeking to improve organisational approaches to maximising the utility of data resources.

First, it suggests that improvements in data sharing practices can be achieved within organisational units, without having to rely on or influence change in external organisations or institutions (e.g. clients, academies or professional bodies). Second, it suggests that a one size fits all approach to improving data sharing within an organisation is not likely to be most effective. Instead, initiatives should be co-designed with researchers since they need to reflect local conditions and work practices. Fortunately, research suggests that there are multiple levers which organisations units can choose from to support data sharing, such as training, rewards, policies and infrastructure and services to support data management [1, 7, 8, 17, 22, 50, 51].

Second, our findings point to the importance of culture (specifically, shared beliefs about the public and scientific value of data sharing) as a driver of data sharing practices. Having an open data culture was correlated with the real-world measure of data sharing (deposits in the organisation’s data repository) and explained significant unique variance in researchers’ intentions to share data. This finding suggests that initiatives to support sharing are likely to be more successful when they emphasize the intrinsic benefits of data sharing (scientific integrity and public benefit) rather than extrinsic reasons for sharing data (such as funder requirements or organisational efficiencies). Hard interventions (such as rules, rewards and policies) may serve as a signal which helps to shape the data culture [51] but they should not crowd out intrinsic motivations to support data sharing. The importance of intrinsic motivation for data sharing has been found in other studies besides this one. For example, Brooks, Heidorn, Stahlman and Chong [52] found that researchers emphasize common good and the potential for transformative science when explaining their efforts to support data sharing in the context of institutionalized pressures and economic pressures constraining data sharing.

Theoretical implications and future directions

This study replicates and extends Kim and Stanton’s [19] efforts to model the role of institutional factors in influencing researchers’ data sharing. Not only did we replicate their finding that researchers’ disciplinary backgrounds can explain variance in their intentions to share data, this variance could be differentiated from that explained by organisational-unit and domain. Our study also extends prior research by testing these factors as a driver of data sharing in a non-traditional research organisation (i.e., not in a university setting) and demonstrating that self-report measures pertaining to data sharing (data culture and lack of contractual inhibitors) are correlated with real world data sharing.

Data culture appears to be an especially important determinant of research data sharing. Culture reflects a shared view on ‘how we do things around here’ and because it reflects taken-for-granted assumptions and norms it tends to be good at predicting discretionary behaviours (such as data sharing). However, our findings are based on research carried out within one organisation. Further research is needed both to test the generalisability of our findings and to determine whether data culture is most powerful at the organisational unit level or whether between-organisation differences in data culture also influence data sharing practices. Exploring data culture across organisations may reveal other dimensions of data culture (e.g., risk-avoidance) that are relevant for data practices.

Conclusion

Research data sharing is important because of the scientific and broader public benefits which flow from this behaviour. However, it is also of interest because of the challenges associated with inducing researchers to invest personal effort towards sharing data (so that its inherent value can be realised) when the benefits flow to others (other researchers, society, future generations, [53]). In such contexts, it is appropriate to consider how organisational, disciplinary and domain factors can be utilised to facilitate the desired behaviour. However, ultimately, shared beliefs and values within the researcher’s local work environment may be most influential in shaping this socially-valued outcome.

Supporting information

S1 File. Discipline aggregation.

(XLSX)

S2 File. Domain aggregation.

(XLSX)

S3 File. Orgunit aggregation.

(XLSX)

Data Availability

The raw data are stored on CSIRO's Data Access Portal. However, under the conditions of the original ethics application, only members of the research team approved by the ethics committee are allowed to access these data. Approval would need to be granted by the ethics committee (who can be contacted at csshrec@csiro.au) for researchers seeking to verify our findings. We have published the aggregated data (for all organisational units, disciplines and domain groups with 10 or more respondents) which is allowed under the informed consent arrangements. These data are publicly available on CSIRO's Data Access Portal at https://doi.org/10.25919/5ed5e83bb35b3.

Funding Statement

The author(s) received no specific funding for this work.

References

  • 1.Tenopir C, Dalton EE, Allard S, Frame M, Pjesivac I, Birch B, et al. Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLoS ONE. 2015;10(8):e013826. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 2.Andreoli-Versbach P, Mueller-Langer F. Open access to data: An ideal professed but not practised. Research Policy. 2014;43(9):1621–33. [Google Scholar]
  • 3.Douglass K, Allard S, Tenopir C, Wu L, Frame M. Managing scientific data as public assets: Data sharing practices and policies among full‐time government employees. Journal of the Association for Information Science and Technology. 2014;65(2):251–62. [Google Scholar]
  • 4.Fecher B, Friesike S, Hebing M, Linek S, Sauermann A. A reputation economy: results from an empirical survey on academic data sharing. arXiv preprint arXiv:150300481. 2015.
  • 5.Unal Y, Chowdhury G, Kurbanoğlu S, Boustany J, Walton G. Research data management and data sharing behaviour of university researchers'. Information Research: an international electronic journal. 2019;24(1). [Google Scholar]
  • 6.Lawrence-Kuether MA. Beyond the Paywall: Examining Open Access and Data Sharing Practices Among Faculty at Virginia Tech Through the Lens of Social Exchange: Virginia Tech; 2017. [Google Scholar]
  • 7.Tenopir C, Allard S, Douglass K, Aydinoglu A, Wu L, Read E, et al. Data sharing by scientists: Practices and perceptions. PLoS ONE. 2011;6(6):e21101 10.1371/journal.pone.0021101 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Sayogo DS, Pardo TA. Exploring the determinants of scientific data sharing: Understanding the motivation to publish research data. Government information quarterly. 2013;30:S19–S31. [Google Scholar]
  • 9.David PA. Towards a cyberinfrastructure for enhanced scientific collaboration: Providing its 'soft' foundations may be the hardest part. Oxford, United Kingdom: Oxford Internet Institute; 2004. Contract No.: Research Report No. 4.
  • 10.Pham-Kanter G, Zinner DE, Campbell EG. Codifying collegiality: recent developments in data sharing policy in the life sciences. PloS one. 2014;9(9):e108451 10.1371/journal.pone.0108451 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3:160018 10.1038/sdata.2016.18 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Board CoreTrustSeal. CoreTrustSeal Foundation Statutes and Rules of Procedure (Version 1.0). Zenodo. 2018, February 14. [Google Scholar]
  • 13.Mason CM, Griffin MA. Group absenteeism and positive affective tone: A longitudinal study. Journal of Organizational Behavior. 2003;24:667–87. [Google Scholar]
  • 14.George JM. Personality, affect, and behavior in groups. Journal of Applied Psychology. 1990;75:107–16. [Google Scholar]
  • 15.Gibson CB. Do they do what they believe they can? Group efficacy and group effectiveness across tasks and cultures. Academy of Management Journal. 1999;42(2):138–52. [Google Scholar]
  • 16.Huang X, Hawkins BA, Lei F, Miller GL, Favret C, Zhang R, et al. Willing or unwilling to share primary biodiversity data: results and implications of an international survey. Conservation Letters. 2012;5:399–406. [Google Scholar]
  • 17.Tenopir C, Rice NM, Allard S, Baird L, Borycz J, Christian L, et al. Data sharing, management, use, and reuse: Practices and perceptions of scientists worldwide. PLoS ONE. 2020;15(3). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Kim Y, Yoon A. Scientists' data reuse behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology. 2017;68(12):2709–19. [Google Scholar]
  • 19.Kim Y, Stanton JM. Institutional and individual factors affecting scientists' data-sharing behaviors: A multilevel analysis. Journal of the Association for Information Science and Technology. 2016;67:776–99. [Google Scholar]
  • 20.Carusi A, Jirotka M. From data archive to ethical labyrinth. Qualitative Research. 2009;9(3):285–98. [Google Scholar]
  • 21.Heidorn PB. Shedding light on the dark data in the long tail of science. Library Trends. 2008;57(2):280–99. [Google Scholar]
  • 22.Enke N, Thessen A, Bach K, Bendix J, Seeger B, Gemeinholzer B. The user's view on biodiversity data sharing—Investigating facts of acceptance and requirements to realize a sustainable use of research data. Ecological Informatics. 2012;11:25–33. [Google Scholar]
  • 23.Piwowar HA, Chapman WW. A review of journal policies for sharing research data. Open Scholarship: Authority, Community, and Sustainability in the Age of Web 20—Proceedings of the 12th International Conference on Electronic Publishing; 25–27 June 2008; Toronto, Canada2008.
  • 24.Bourdieu P. Le champ scientifique. Actes de la recherche en sciences sociales. 1976;2/3:88–104. [Google Scholar]
  • 25.Campbell EG, Bendavid E. Data-sharing and data-withholding in genetics and the life sciences: Results of a national survey of technology transfer officers. J Health Care L & Pol'y. 2002;6:241. [PubMed] [Google Scholar]
  • 26.Reichman JH, Uhlir PF. A contractually reconstructed research commons for scientific data in a highly protectionist intellectual property environment. Law and Contemporary problems. 2003;66(1/2):315–462. [Google Scholar]
  • 27.Hox JJ, Maas CJM. Multilevel Analysis. Encyclopedia of Social Measurement. 2005;2:785–93. [Google Scholar]
  • 28.Luo W, Kwok O-m. The Impacts of Ignoring a Crossed Factor in Analyzing Cross-Classified Data. Multivariate Behavioral Research. 2009;44(2):182–212. 10.1080/00273170902794214 [DOI] [PubMed] [Google Scholar]
  • 29.CSIRO. CSIRO Annual Report 2018–19. Canberra, Australia: CSIRO; 2019. [Google Scholar]
  • 30.CSIRO. Our People 2019, 23 January [Available from: https://www.csiro.au/en/About/Our-impact/Reporting-our-impact/Annual-reports/16-17-annual-report/part3/Our-people.
  • 31.Brown RD, Hauenstein NM. Interrater agreement reconsidered: An alternative to the rwg indices. Organizational research methods. 2005;8(2):165–84. [Google Scholar]
  • 32.Australian Bureau of Statistics. 1297.0—Australian and New Zealand Standard Research Classification (ANZSRC). Canberra, Australia: Australian Bureau of Statistics, 2008. [Google Scholar]
  • 33.National Archives of Australia. Australian Governments' Interactive Functions Thesaurus. Third Education ed2013.
  • 34.Curty R. Beyond “Data Thrifting”: An Investigation of Factors Influencing Research Data Reuse In the Social Sciences [PHD]: Syracuse University; 2015. [Google Scholar]
  • 35.A language and environment for statistical computing [Internet]. 2018. Available from: https://www.R-project.org/.
  • 36.Multilevel: Multilevel functions. [Internet]. 2016. Available from: https://CRAN.R-project.org/package=multilevel.
  • 37.Bates D, Maechler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015;67(1):1–48. [Google Scholar]
  • 38.Bliese P. Multilevel Modeling in R (2.2)–A Brief Introduction to R, the multilevel package and the nlme package. October; 2016. [Google Scholar]
  • 39.Kuznetsova A, Brockhoff PB, Christensen R, Haubo B. lmerTest Package: Tests in Linear Mixed Effects Models. Journal of Statistical Software. 2017;82(13). [Google Scholar]
  • 40.Luke SG. Evaluating significance in linear mixed-effects models in R. Behavior Research Methods. 2017;49(4):1494–502. 10.3758/s13428-016-0809-y [DOI] [PubMed] [Google Scholar]
  • 41.James LR, Demaree RG, Wolf G. Estimating within-group interrater reliability with and without response bias. Journal of Applied Psychology. 1984;69:85–98. [Google Scholar]
  • 42.James LR. Aggregation bias in estimates of perceptual agreement. Journal of Applied Psychology. 1982;67:219–29. [Google Scholar]
  • 43.Lindell MK, Brandt CJ, Whitney DJ. A Revised Index of Interrater Agreement for Multi-Item Ratings of a Single Target. Applied Psychological Measurement. 1999;23(2):127–35. [Google Scholar]
  • 44.Shi Y, Leite W, Algina J. The impact of omitting the interaction between crossed factors in cross-classified random effects modelling. British Journal of Mathematical and Statistical Psychology. 2010;63:1–15. 10.1348/000711008X398968 [DOI] [PubMed] [Google Scholar]
  • 45.Bates D, Kliegl R, Vasishth S, Baayen H. Parsimonious Mixed Models. 2015. [Google Scholar]
  • 46.Jeng W. Qualitative Data Sharing Practices in Social Sciences [Ph.D.]. Ann Arbor: University of Pittsburgh; 2017. [Google Scholar]
  • 47.Faniel IM, Zimmerman A. Beyond the data deluge: A research agenda for large-scale data sharing and reuse. The International Journal of Digital Curation. 2011;6(1):58–69. [Google Scholar]
  • 48.Maas CJM, Hox JJ. Sufficient Sample Sizes for Multilevel Modeling. Methodology. 2005;1(3):86–92. [Google Scholar]
  • 49.Snijders T. Power and sample size in multilevel linear models. In: Everitt BS, Howell DC, editors. Encyclopedia of Statistics in Behavioral Science. 3 Chicester: Wiley; 2005. p. 1570–3. [Google Scholar]
  • 50.Haendel MA, Vasilevsky NA, Wirz JA. Dealing with data: A case study on information and data management literacy. PLoS biology. 2012;10(5):e1001339 10.1371/journal.pbio.1001339 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Neylon C. Building a culture of data sharing: policy design and implementation for research data management in development research. Research Ideas and Outcomes. 2017;3:e21773. [Google Scholar]
  • 52.Brooks CF, Bryan Heidorn P, Stahlman GR, Chong SS. Working beyond the confines of academic discipline to resolve a real-world problem: A community of scientists discussing long-tail data in the cloud. First Monday. 2016;21(2). [Google Scholar]
  • 53.Sanderson T, Reeson A, Box P. Understanding and unlocking the value of public research data: OzNome social architecture report. Canberra, Australia: CSIRO; 2017. [Google Scholar]

Decision Letter 0

Laurentiu Rozylowicz

16 Apr 2020

PONE-D-20-07169

Modelling organisational, disciplinary and domain-specific sources of influence in research data sharing

PLOS ONE

Dear Dr Mason,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

Both reviewers made valuable comments on the manuscript, and I'm advising you to include as much as possible in the revised version. The title of the manuscript needs to be more focused, the paper rationale and objectives should be more clear and in line with reported results. The methodology section needs an extensive revision, as some analyses are not reported or not clear, and the sampling is not well explained to the readers. As the reviewers pointed out, data behind the analyses are not available for readers. Please include the dataset in the next version, e.g., by anonymizing the data. If not possible, please be more explicit about why this is the case.

We would appreciate receiving your revised manuscript by May 31 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Laurentiu Rozylowicz, Ph.D.

Academic Editor

PLOS ONE

Journal Requirements:

When submitting your revision, we need you to address these additional requirements.

1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at

http://www.journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf and http://www.journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please include additional information regarding the survey or questionnaire used in the study and ensure that you have provided sufficient details that others could replicate the analyses. For instance, if you developed a questionnaire as part of this study and it is not under a copyright more restrictive than CC-BY, please include a copy, in both the original language and English, as Supporting Information.

3.  We note that you have indicated that data from this study are available upon request. PLOS only allows data to be available upon request if there are legal or ethical restrictions on sharing data publicly. For more information on unacceptable data access restrictions, please see http://journals.plos.org/plosone/s/data-availability#loc-unacceptable-data-access-restrictions.

In your revised cover letter, please address the following prompts:

a) If there are ethical or legal restrictions on sharing a de-identified data set, please explain them in detail (e.g., data contain potentially sensitive information, data are owned by a third-party organization, etc.) and who has imposed them (e.g., an ethics committee). Please also provide contact information for a data access committee, ethics committee, or other institutional body to which data requests may be sent.

b) If there are no restrictions, please upload the minimal anonymized data set necessary to replicate your study findings as either Supporting Information files or to a stable, public repository and provide us with the relevant URLs, DOIs, or accession numbers. For a list of acceptable repositories, please see http://journals.plos.org/plosone/s/data-availability#loc-recommended-repositories.

We will update your Data Availability statement on your behalf to reflect the information you provide.

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Partly

Reviewer #2: No

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The manuscript addresses a relevant research topic for nowadays science: data-sharing practices. Overall, the manuscript is presented in an intelligible fashion and written in standard English, while the statistical analysis was performed appropriately and rigorously. Even if the manuscript has merits, there are several aspects that need authors attention and impede the potential publication.

I would start by stressing the irony surrounding this paper: comparing the results reported by the authors and their justification referring to the availability of the data-set supporting this manuscript.

#1. The title of the manuscript is too general (e.g. it should stress that it is a research conducted on an Australian governmental agency). From a certain perspective, some would argue that it could potentially mislead readers.

#2. The problem or the rationale of the paper is clouded by the diffuse writing. It should clearly answer "why the effort?" and also the contribution to the field.

#3. The abstract contains extremely detailed information on the results (e.g. that organizational unit membership explained 10%, disciplinary membership explained 6%, and domain membership explained 4% of the variance in researchers’ intentions to share research data) and ignores for instance the limits, the contribution to the field, etc. Additionally, it refers to specific concepts that cannot be understood appropriately unless the entire content is read (e.g. contractual and regulatory inhibitors).

#4. The theory section does not support the variables introduced into the statistical models. Recent literature on the topic is missing. I would suggest the authors to re-shape and re-frame the theory part of their paper.

#5. The authors do not explain why academic and non-academic data are equivalent (see section "Application domain").

#6. It is safe to delete Figure 1 - it is redundant, it does not provide any useful information to the readers.

#7. Background information on the respondents is missing (publication records, research impact, tenure, experience etc.). Also, it is not self-evident how representative are respondents for the Australian research community.

#8. Paragraph between lines 170 - 180 is not clear. Authors should make clear their point there.

#9. Authors should make their methodology clear. See the below questions for illustration:

- How were the employees recruited? Why CSIRO? Is it relevant for Australia?

- What is the total number of employees at CSIRO? Did the authors send emails to everybody? What is the meaning of 22%? Is the sample relevant for the entire agency? But for the entire Australia?

- Authors should explain the methodological limits for this paper's results and findings in the context of working with a convenience sample.

- Some of the disciplines are better represented in the sample. Does that impact upon the results? What is the structure of the agency's workforce on these research disciplines and application domains?

- Lines 200 - 205: How many were there? What were the criteria applied by the Information Services expert on selecting the "active research roles". Did he do a good job, given that some of the respondents reported 0 time spent on research data?

- Some of the text in the Methods sections is redundant or common knowledge, e.g. "The statistical modelling was carried out using R [29], an opensource language and environment for statistical computing."

- Text between lines: 298 - 305 should be moved somewhere in the above section, as it does not report any results. It is only technical procedure.

#10. Some parts of the text in the discussion section are not supported by the results, e.g. "The failure to observe significant effects for the other variables in the model may simple reflect low statistical power."

#11. What is the share of research productivity in Australia accounted for by the government organizations and not universities? Authors do not provide any context information.

Reviewer #2: This paper considers whether organisational units and application domains could influence data sharing practices among researchers. This is clearly a very important issue as data sharing can elevate levels of transparency in scientific endeavours as well as even increase the overall trust and confidence in scientific methods and conclusions. I applaud authors’ efforts to investigate this important question in a unique sample of non-academic researchers.

I must, however, admit that I find it a bit ironic that a manuscript discussing data sharing practices is submitted for publication without openly shared data. I have noted that this is due to ethics application stating that data would not be shared unless it is for validation. Could the data be uploaded to a password protected website with access given to researchers when necessary? I think this could be a good compromise between ethical restrictions and ensuring that data is ready to be shared when needed. If this is not possible, would authors be willing to at least share their R code along with the outputs?

Personally, I do not find the paragraph regarding the influence of discipline on data sharing too convincing. Authors argue that data sharing is more common in “big science” disciplines than “small science” disciplines. By default, data would be shared within teams with bigger teams being associated with data being shared to a bigger group of people. It would be helpful if authors could describe at least one study in detail which supports the argument that small science disciplines are less likely to share data than big science disciplines, beyond referring to explanations relating to norms. The analysis conducted by authors do not shed more light on whether this argument is true or not. In this case, perhaps it would be better to focus on how norms within disciplines can influence data sharing practices.

In lines 221-228, intentions to share data, authors report that all items loaded on a single factor, yet no factor analysis is reported in the manuscript. I am somewhat surprised that authors would treat these as a single factor especially there may be different motivations for sharing data within own organisational unit (e.g., collaboration) vs research funders (e.g., requirement) vs general public (e.g., seeing data sharing as important). Alpha of 0.71 may be reflective of this. Have authors considered treating these items as separate outcome variables? Factor analysis should also be reported for contractual and regulatory inhibitors (relating to the domain influences).

Finally, I cannot comment much on the analytic approach as I have not been trained in multilevel modelling, but could authors report on what sample sizes would be required to reach an adequate power (i.e., 80% or above) in the current analysis? Also, I am not sure why values relating to explained variance in lines 405-408 do not match those in Table 4.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: No

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 28;15(8):e0238071. doi: 10.1371/journal.pone.0238071.r002

Author response to Decision Letter 0


22 Jun 2020

Thank you to all for your helpful feedback. Our detailed responses are in the attached cover letter.

Attachment

Submitted filename: Response to Reviewers PONE-D-20-07169(b).docx

Decision Letter 1

Laurentiu Rozylowicz

24 Jul 2020

PONE-D-20-07169R1

Research data sharing in the Australian national science agency: Understanding the relative importance of organisational, disciplinary and domain-specific influences

PLOS ONE

Dear Dr. Mason,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

The paper is much improved and it can be accepted. However, there are some formating issues who needs to be considered during a quick revision:

- check if the manuscript is formatted according to PLOS ONE Submission Guidelines (see instructions for authors https://journals.plos.org/plosone/s/submission-guidelines). Please pay attention to tables (see the instructions here https://journals.plos.org/plosone/s/tables)

- please use leading zero in the paper (0.27 and not .27 see, e.g. ln 319 but also in many other places, including some tables)

- please proofread the paper to be sure everything is in the right place and correct.

Please submit your revised manuscript by Sep 07 2020 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.

If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

We look forward to receiving your revised manuscript.

Kind regards,

Laurentiu Rozylowicz, Ph.D.

Academic Editor

PLOS ONE

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: (No Response)

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: I Don't Know

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: No

Reviewer #2: No

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: I thank the authors for the time invested in improving their manuscript. They have made a decent job in addressing the previous comments I made (except for the *data availability* remarks). Consequently, on my part, I am satisfied with the current form of the manuscript.

However, I would *stress* a few rather formal aspects that the authors should seriously take into consideration prior potential publishing their manuscript.

#1. Authors are strongly encouraged to adhere to PLOS ONE writing style format (details on how to prepare the manuscript are available with the PLOS ONE Submission guidelines)

#2. The statistical reporting as well as table format should correspond to the requirements of PLOS ONE (see the Submission guidelines)

#3. The authors must adhere to the *Data Availability* policy of PLOS ONE.

#4. The data supporting the reported results should be available to any interested part without restrictions. The authors made available their data on a data access portal but with restrictions (e.g. any interested part will receive the dataset files within 48 hours). On top of that, the data files should allow the replication of *all* the results reported in the manuscript.

#5. While preparing their data files, authors should make sure that information on human subjects is properly anonymized. Details on preparing the dataset files are available on PLOS ONE webpage (see: *Data Availability* section).

#6. While assessing the final form of their manuscript, the authors are encouraged to follow PLOS ONE Submission Guidelines.

Reviewer #2: I understand the limitations associated with ethical approval and I am happy in this case with authors including data availability statement.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: No

Reviewer #2: Yes: Karolina Urbanska

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Aug 28;15(8):e0238071. doi: 10.1371/journal.pone.0238071.r004

Author response to Decision Letter 1


6 Aug 2020

Our Cover letter details our response to reviewer and editor comments. In summary:

We have revisited PLOS One’s submission guidelines and made the following improvements to the manuscript:

1. Minor improvements to readability

2. A leading zero has been inserted for all decimal numbers less than one.

3. Use of square vs round brackets for citations in the text has been corrected

4. Minor formatting errors in the reference list have been corrected.

5. Vertical gridlines and single spacing has been applied to the tables.

6. The “Analyses” subsection of the Method is now labelled “Statistical Analysis” and contains additional information regarding the statistical procedures that were employed and our data data screening process.

Decision Letter 2

Laurentiu Rozylowicz

10 Aug 2020

Research data sharing in the Australian national science agency: Understanding the relative importance of organisational, disciplinary and domain-specific influences

PONE-D-20-07169R2

Dear Dr. Mason,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Laurentiu Rozylowicz, Ph.D.

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

Reviewers' comments:

Acceptance letter

Laurentiu Rozylowicz

17 Aug 2020

PONE-D-20-07169R2

Research data sharing in the Australian national science agency: Understanding the relative importance of organisational, disciplinary and domain-specific influences

Dear Dr. Mason:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Laurentiu Rozylowicz

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 File. Discipline aggregation.

    (XLSX)

    S2 File. Domain aggregation.

    (XLSX)

    S3 File. Orgunit aggregation.

    (XLSX)

    Attachment

    Submitted filename: Response to Reviewers PONE-D-20-07169(b).docx

    Data Availability Statement

    The raw data are stored on CSIRO's Data Access Portal. However, under the conditions of the original ethics application, only members of the research team approved by the ethics committee are allowed to access these data. Approval would need to be granted by the ethics committee (who can be contacted at csshrec@csiro.au) for researchers seeking to verify our findings. We have published the aggregated data (for all organisational units, disciplines and domain groups with 10 or more respondents) which is allowed under the informed consent arrangements. These data are publicly available on CSIRO's Data Access Portal at https://doi.org/10.25919/5ed5e83bb35b3.


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES