Skip to main content
PLOS One logoLink to PLOS One
. 2020 Jul 8;15(7):e0230697. doi: 10.1371/journal.pone.0230697

Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows

Ariel Deardorff 1,*
Editor: Vasilis J Promponas2
PMCID: PMC7343163  PMID: 32639955

Abstract

Introduction

As biomedical research becomes more data-intensive, computational reproducibility is a growing area of importance. Unfortunately, many biomedical researchers have not received formal computational training and often struggle to produce results that can be reproduced using the same data, code, and methods. Programming workshops can be a tool to teach new computational methods, but it is not always clear whether researchers are able to use their new skills to make their work more computationally reproducible.

Methods

This mixed methods study consisted of in-depth interviews with 14 biomedical researchers before and after participation in an introductory programming workshop. During the interviews, participants described their research workflows and responded to a quantitative checklist measuring reproducible behaviors. The interview data was analyzed using a thematic analysis approach, and the pre and post workshop checklist scores were compared to assess the impact of the workshop on the computational reproducibility of the researchers’ workflows.

Results

Pre and post scores on a checklist of reproducible behaviors did not change in a statistically significant manner. The qualitative interviews revealed that several participants had made small changes to their workflows including switching to open source programming languages for their data cleaning, analysis, and visualization. Overall many of the participants indicated higher levels of programming literacy, and an interest in further training. Factors that enabled change included supportive environments and an immediate research need, while barriers included collaborators that were resistant to new tools, and a lack of time.

Conclusion

While none of the workshop participants completely changed their workflows, many of them did incorporate new practices, tools, or methods that helped make their work more reproducible and transparent to other researchers. This indicates that programming workshops now offered by libraries and other organizations contribute to computational reproducibility training for researchers.

Introduction

Scientific reproducibility–the idea that a scientific finding can be reproduced by others–has been called the “supreme court of the scientific system” [1]. However, there is a growing body of evidence that much of the research currently produced in academia cannot be reproduced. In a 2005 article, “Why Most Published Research Findings are False,” John Ioannidis argued that most research results were likely false due to incorrect study design and statistical power [2]. This article led to several large-scale reproducibility studies including the Psychology Reproducibility Project, which analyzed 100 psychology studies and found that a large portion of them had findings weaker than what was claimed [3]. Failure to reproduce the results of a scientific study is not limited to one scientific domain. A 2016 survey in Nature found that more than 70% of researchers have tried and failed to reproduce another scientist’s study, and even more alarming, more than half had failed to reproduce their own research [4]. One of the challenges of addressing the “reproducibility crisis” is that there are many competing definitions of reproducibility, each with their own solution. This paper focuses on computational reproducibility, defined as “the ability of a second researcher to receive a set of files, including data, code, and documentation, and to recreate or recover the outputs of a research project, including figures, tables, and other key quantitative and qualitative results” [5]. Computational reproducibility is a growing focus within the biomedical sciences and this aspect of reproducibility was the target of a recent National Academies report [6].

Computational reproducibility is particularly important in the biomedical sciences as nearly all fields and subdisciplines have grown increasingly data-intensive. Biomedical researchers increasingly must apply many of the skills that historically have been those of computer scientists, whether their work involves processing complicated files from electronic health records, collecting large genomic datasets, or building models of complex cellular relationships [6]. As their work changes, researchers must learn new methods and tools. It is no longer possible to rely on manual workflows in Excel; instead, researchers must learn new programming-based workflows to process and analyze their data. When these new workflows rely on open source programming languages such as R and Python and incorporate version control and detailed documentation, this work can be far more computationally reproducible [79]. Unfortunately, scientific training in this area has not always kept pace and many biomedical scientists do not learn basic programming or computational best practices as part of their graduate training. These researchers often struggle to engage in emerging areas of research and must seek out additional courses and workshops in order to use these new techniques [10].

At the University of California, San Francisco (UCSF), the library has partnered with the non-profit organization The Carpentries to teach introductory programming workshops to biomedical researchers with the goal of improving programming literacy and computational reproducibility. These Software Carpentry workshops are two-day hands-on events that cover basic computing skills including the programming languages R or Python, version control with Git, and scripting in Unix [11]. The Carpentries regularly assesses participant learning in the workshops and has conducted a long-term assessment measuring uptake of certain behaviors associated with computational reproducibility, including use of programming languages and version control. The April 2020 analysis of the overall long-term survey results revealed that after a workshop 66% of respondents had started using a programming language or the command line [12]. In addition, 57% of participants “agreed” or “strongly agreed” that they had been able to make their analysis more reproducible. While this data is encouraging, these measures are self-reported, and it is difficult to tell how participants are using their skills to improve the computational reproducibility of their work. Since partnering with the Carpentries in 2016, UCSF has taught 12 workshops to over 700 UCSF researchers. While demand for the workshops is always high, there is a lack of evidence regarding the extent to which participation in the workshops leads researchers to adopt better computational practices. The aim of this study was to assess the impact of introductory programming workshops on biomedical researchers’ workflows. More specifically, the goal was to discover if participants would change their workflows to incorporate new tools and methods learned in the workshops and thereby make their research practices more computationally reproducible.

Methods

To learn about the impact of programming workshops on researcher workflows, this study used a mixed methods approach, consisting of semi-structured in-depth interviews that included questions about the researchers’ workflows as well as a quantitative checklist measuring reproducible behaviors. The author interviewed fourteen biomedical researchers about their workflows before they participated in a two-day introductory programming workshop, and again three months after they had completed the workshop. This study was approved by the University of California, San Francisco Human Research Protection Program Institutional Review Board (#18–25691). All participants provided written informed consent prior to participation. An analysis of a subset of the pre-workshop qualitative data as well as the methods below were reported in an earlier publication [13].

Study recruitment

The author recruited fourteen UCSF researchers who registered for a two-day library-led introductory programming workshop in March 2019. These workshops covered an introduction to Git, Unix, and either R or Python (with approximately 36 registered for the R track and 36 registered for the Python track) and were open to anyone affiliated with the university. The number of participants was selected based on research indicating that 12 interviews is generally sufficient to gather most major themes in a qualitative study [14]. The inclusion criteria specified that participants must be currently involved in research and planning on staying at UCSF for 6 months (in order to reach them for follow-up interviews). These criteria left a total of 59 possible participants out of the 72 enrolled in the course. The author used stratified random sampling to select potential participants, and contacted 7 participants registered for the R workshop and 7 registered for the Python workshop. Of the initial 14 participants selected, 7 did not respond and 2 declined to participate. These were replaced by an additional 9 random participants until 14 were reached.

The interviews

In January and February 2019, the author performed in-depth semi-structured interviews with participants before they took the programming workshop. Participants were asked to draw their research process and describe the tools and methods they used, pain points in their workflows, and what they were hoping to learn in the workshop (workflow drawing template in S1 File, pre-workshop interview protocol in S2 File). At the end of the interview, the author administered a brief 6 question checklist of computationally reproducible practices (checklist in S1 Checklist). This checklist was compiled based on a literature review of recommended practices for computational reproducibility, as well as the Carpentries long-term survey [5,7,12,1523]. The checklist questions asked whether as part of their workflows participants used programming languages like R, Python, or the command line for data acquisition, processing, or analysis; transformed step-by-step workflows into scripts or functions; used version control to manage code; used open source software; shared their code publicly; and shared their computational workflow or protocols publicly. Each question was given a score of 1 point for a total possible score of 6 points for a workflow that incorporated all of the elements.

In June of 2019, three months after they had completed the workshop, the author performed follow-up interviews with the original participants. The follow-up interviews focused on their thoughts on the workshop, changes they had made to their research workflows, plans for future changes, factors that enabled or prevented workflow changes, and any suggestions they had for improving future workshops (post-workshop interview protocol in S3 File). Participants responded once again to the reproducible checklist to see if there had been a measurable change.

The pre and post workshop interviews were conducted at either the UCSF Parnassus or Mission Bay campus and ranged from 20 to 45 minutes. Before the first interview, participants signed a consent form stating that they agreed to be recorded and that they understood that their anonymized data would be shared. The interviews were recorded and the audio was transcribed by the online service Rev.com. After transcription, the author read through each interview transcript while listening to the audio recordings to ensure the content was faithfully reproduced, addressing any errors (for example, “are” instead of “R”) as they were discovered. Finally, the author redacted names of people and groups and generalized research topics to preserve anonymity.

Data analysis

To analyze the quantitative data, the author converted the checklist totals to numeric scores and performed a two-tailed t-test to measure for statistically significant changes in behaviors before and after the workshop.

The qualitative data was analyzed using the applied thematic analysis framework–a methodology inspired by grounded theory, positivism, interpretivism, and phenomenology [24]. For the pre and post workshop interviews, the author used an inductive approach to read through the transcripts, identify major themes, and create corresponding structural and thematic codes (code book available in S1 Data). These codes were then elaborated in the codebook and applied using an iterative approach. All coding was performed using the online data analysis tool Dedoose.

Results

Participant demographics

The majority (9 of 14) of research participants were postdoctoral researchers, followed by three research staff, one graduate student, and one faculty member. These demographics were in line with the typical audience of a UCSF programming workshop. The departmental representation was also similar to a typical workshop with a larger group from neurology (3 of 14), developmental and stem cell biology (3 of 14), and immunology (2 of 14), and the balance coming from orthopedic surgery, neuroscience, neurological surgery, anatomy, pharmacy, and bioethics. As the workshops were marketed to beginning programmers, 13 of 14 described themselves a “novice” or “beginner” programmer, and only one participant considered themselves to be an “intermediate” programmer. During the second round of interviews in June 2019, one of the original participants declined to participate as they had not attended both days of the workshop, and another did not respond to emails. Therefore, only 12 researchers participated in the post-workshop interviews.

Checklist scores before and after the workshop

The average score for the pre-workshop checklist was 1.6 out of 6 and ranged from 0 to 3, indicating lower levels of applied best practices in computational reproducibility. Of the six questions, participants scored highest on the use of programming language like R, Python, or the command line at some point in their workflow (n = 7) and using an open source tool (n = 7) (see Table 1). Five participants said they shared their code publicly, three said they shared their computational workflows publicly, one person said they transformed step-by-step workflows into scripts, and none of the participants used version control.

Table 1. Total checklist scores before and after the workshop.

Question PreTest Total (n = 14) PostTest Total (n = 12)
Use programming languages like R, Python, or the command line for data acquisition, processing, or analysis 7 8
Transform step-by-step workflows into scripts or functions 1 2
Use version control to manage code 0 2
Use open source software 7 10
Share your code publicly 5 2
Share your computational workflow or protocols publicly 3 2

Three months after completing the workshop, the average score increased from 1.6 to 2.2 and ranged from 0 to 6, however this was not a statistically significant difference (t(11) = -1.04, p = 0.318). The post-workshop scores revealed increases in using programming languages like R or Python in their work (n = 8), transforming step-by-step workflows into scripts (n = 2), using version control (n = 2), using open source tools (n = 10), and decreases in sharing their code publicly (n = 2), and sharing their computational workflows publicly (n = 2). The interviews revealed that these decreased scores were likely due to a better understanding of what it means to share code and computational workflows, rather than an actual change in workflows. Previously, participants had mostly considered sharing to mean exchanging code or protocols with others in their lab or team. After the workshop, participants had a better understanding of what public code sharing could look like (for example sharing code on GitHub or in a repository like Zenodo) and realized that this was not part of their current workflow.

Changes in participant workflows

While their overall checklist scores didn’t change much, many of the participants indicated in the qualitative interviews that they had changed their workflow to incorporate programming languages in new ways throughout, including switching the tools they used for data cleaning, visualization, and analysis. One participant reported that they started using R to clean their data instead of their previous manual data cleaning in Excel. Two of the Python workshop attendees started using Python to plot figures and visualize data (although one said that they still preferred R). For data analysis, two of the participants shared that they had switched from proprietary tools like SPSS and Stata to R, saying that attending the workshop had helped them make the case to their collaborators that R was just as powerful a tool. Finally, two of the participants who enrolled in the Python class with an R background decided that R would actually be adequate for their work, with one stating that “actually my Python workshop has convinced me to go to quit Python…”

Participants had also made changes to make their workflows more programmatic, using the command line to download large datasets or using GitHub to find and share code. Four of the researchers reported that they started using Unix at some point to search and retrieve data files in their computers or download large datasets. One shared that “I didn’t know anything about Terminal before so now I know what that is and how to basically use that. I think that has helped.” Three participants stated that they had looked for code on GitHub and one of the more computationally advanced researchers had changed their workflow to incorporate sharing code with their collaborators on GitHub, quoting:

“I'm using GitHub as well to share code with the people I work, on the team I am working. So it's also an interesting thing because for instance, some of the things that they do is I code something in R. I may build R shiny, I can use an interface for someone that is not fluent in R that can use whatever process. And, and now you can host that in the GitHub and they can directly just run one line of code and run that differently from GitHub and that's very useful. So to share code and so forth. So I've been using that as well.”

Plans for future workflow changes

While not all the researchers had been able to make changes in the three months since the workshop, many of them had ideas for ways they would change their workflows in the future. At least half of participants shared that they wanted to use R or Python for their data analysis going forward. Specific plans included using R to characterize cell types for sequencing data, analyzing images in Python, and trying Python to analyze single cell data. For some researchers, this meant taking ownership of a step that had previously been done by a collaborator. Describing their relationship with their team’s bioinformaticist, one said:

“So, I think, I don't know if I'll be doing all of it, but at least more together. Really, I was handing everything off to her. She'd do everything on her computer and then I'd only see figures weeks later. And so this would actually be like handling the data myself, doing some in R, if I can't, or am having issues like helping her, helping me kind of troubleshoot those things. Or even if she eventually does do some of the analysis, I'll know what she's done specifically.

Other researchers planned to switch to R or Python for their data visualization, indicating that they thought they could make better publication-quality figures. Two of the Python leaners decided that they would eventually make the switch from R to Python for most of their analysis as they felt it was a simpler language that would have more applicability outside of academia. Finally, four of the participants expressed interest in using GitHub to share and version their scripts once they were writing more of their own code or had more autonomy. One researcher shared:

“But in the future, if I were running my own projects, and I were more familiar, I would definitely try to use GitHub and all that because from what I saw, it's a good way to connect with other people on projects and I know people who use it as well. So just for project and practice, it would have been good for a personal growth thing. But for research with my PI, yeah. She's not familiar, so I don't think we would use it.”

Increased programming literacy

While not every participant made changes to their workflow, the majority of them came out of the workshops with new insight into the language and fundamentals of programming. When asked about their biggest takeaway, several participants spoke about the fact that the workshop had helped de-mystify a complicated topic. One reported that they felt more comfortable talking about programming with their collaborators, and that they felt “like I could understand a little bit more what the more informatics people in my lab are doing day to day and then they talk about stuff and I'm like ‘Oh, I know those words’ like you sort of get to know the techniques that they're using a little bit more and the different software and stuff.” Another felt they had started to see new ways to apply programming to their work. Another shared that learning Unix had helped them understand how their computer actually worked: “It's like being given a map of where you live suddenly and then, oh, there, that’s where stuff is. So that was very useful.”

Interest in further training

A major takeaway from the study was that the workshop prompted many of the participants to seek more training opportunities. For some participants, the workshops helped clarify what they needed to learn. One person said, “I do feel like before it just all felt like very much a black box and now I at least kind of feel like I know what I need to learn.” For others, the workshop helped them lay a foundation on which they wanted to build. One participant indicated that after learning the basics of R, “I'm not as anxious so I think that's also why I have been comfortable signing up for courses [that] are just like more data intensive. So that's definitely useful. And I want to attend more of those.” Almost all participants said that they had explored further training and many had already taken another workshop. Three participants mentioned a specific single-cell sequencing data analysis workshop offered at UC Davis, another said they had registered for an Ed-X data science course, and a third said they were looking into the UC Berkeley extension courses.

Enablers and barriers to change

When asked what helped them implement the workshop content into their workflows, the participants cited a number of factors. Some participants reported being able to quickly apply their new programming skills because they had an immediate research need. One postdoctoral researcher shared that they wanted to analyze RNA sequencing data and that they were the only one on their team who had the necessary programming skills. Others were able to practice on relevant datasets and scripts and see how they might use it with their own work. One of the most commonly cited factors was a supportive research environment; examples included a principal investigator (PI) who didn’t know programming but saw the benefit for the lab, and a group of colleagues who already knew programming. One person shared, “I think it's both support from people in the lab. Seeing postdocs and graduate students who either have learned this and are really good or are actively learning it right now and working toward these goals. Made me be like, ‘Oh, I would really like to have that skill.’ I think that was really vital.”

The participants mentioned many different barriers to effectively incorporating more programming into their workflows. For some, it was because their PI/collaborators were resistant to new tools or needed to be convinced to adopt new methods. In one instance, the researcher recounted that they were unable to use GitHub because their PI preferred email. Another shared that their PI “doesn't have the knowledge of this, so he probably doesn't know how useful it can be, and it doesn't matter for them, they just need the results. If you visualize the data nicely, that's enough for them.” Several participants also shared that they hadn’t been able to implement new procedures because they didn’t want to make changes in the middle of a project or had been stuck in the wet lab stage of their project. Overall the biggest barrier to change was a lack of time. While several participants saw the benefit of programming, they were not able to make time to learn enough to implement the new tools. One researcher summed up this problem:

“So after, right after attending the workshop, I started using Python. But the thing is obviously I had to fix some problems. So, because I had the, I knew the basics, but then when I needed to do my own things and ask specific questions, I was not able to. And so, in order to be quick and to get my things done without wasting too much time …. I just continued with my usual way of proceeding.”

Discussion

The researchers in this study did not register for the programming workshops because they wanted their work to be more computationally reproducible; they registered so they could understand the work of their collaborators, work with new kinds of research, or analyze their data independently [13]. Many of the participants, however, did come away from the workshops with new practices, tools, or methods that made their work more reproducible and transparent to other researchers. Some participants switched from manual data cleaning in Excel to scripted data cleaning in R. Others moved from expensive proprietary software to open source tools. Some realized that they could share their code and protocols outside their labs. Finally, some started versioning their code to better track their work. While very few of the participants completely changed their workflows or adopted all the techniques after attending the workshop, all of them learned or tried something new. As with many aspects of reproducibility, computational reproducibility is often achieved through gradually adopting better practices, and all the participants were able to make some slight adjustment to their practices.

The results of this study are similar to many of the findings of the Carpentries’ own long-term survey which included respondents from a wide variety of disciplines and career stages. Like the Carpentries’ survey revealed, most of the UCSF researchers’ implemented workflow changes involved integrating R and Python in new ways (66% of Carpentries respondents) rather than using version control to manage code (31%), or transforming step-by-step workflows into scripts (25%) [12]. The fact that many of the UCSF researchers’ comments revealed an increase in programming literacy can be compared with the 79% of Carpentries respondents who indicated more confidence using the tools taught in the workshop. Finally, like the UCSF researchers, respondents to the Carpentries survey were also highly motivated to seek further instruction, with 89% reporting that they “agreed” or “strongly agreed” that they were motivated to seek more knowledge. As the Carpentries survey does not ask about what might have helped or hindered participants from implementing new skills, it may be that those identified in this study–including the enabling factors of having an immediate research need and the barriers of lack of time—also apply to researchers outside the biomedical sciences.

Beyond concrete workflow changes, one of the major outcomes of the workshop was enabling the biomedical research participants to start using the tools and approaches of computer scientists. Many of them expressed new understanding of how their computers work, how to write a useable script, or what a scripted analysis could look like. They shared moments of revelation when finally understanding the language of their collaborators, or getting a glimpse of new ways of working. As biomedical research becomes increasingly data intensive, it is important that biomedical researchers–many of whom never learned to program—see tools like programming as attainable and essential parts of their workflow. While the participants in this study appear to have started down that path, they also struggled to make the switch to fully integrate programming, citing lack of time to truly learn new practices. Wrapped up in these statements was the idea that the “science” always took first priority and that learning to program was less essential, a potentially dangerous assumption that could prevent biomedical researchers from making full use of the tools available to them.

While some biomedical graduate programs are teaching programming as part of the curriculum, there is still a need to train the current postdoctoral researchers, research staff, and faculty who need these crucial skills in order to continue excelling in their areas of research. Introductory programming workshops like the ones offered by the Carpentries can be an excellent way for libraries and other organizations to jumpstart this learning process, but organizers and attendees should keep in mind that one workshop cannot teach all the skills necessary, and learners might need ongoing support to make the switch to new practices. Ideally, an introductory programming workshop should give researchers a taste of programming basics and possibilities and point them in the direction of further learning. It will likely be a gradual process, but any new practices can help make research more reproducible.

Limitations

The results of this study are drawn from a small cohort of biomedical researchers at UCSF.

The data are representative of UCSF Carpentries workshops, but given the overrepresentation of postdoctoral researchers they might not reflect the experience of other biomedical researchers or researchers outside of the health sciences. While the participants in this study had similar demographic characteristics to the workshops overall (in terms of roles and departments), it is possible that the recruited participants who declined to participate or did not respond to the research invitation might have had different goals or expectations than those who did. The analysis and coding for this project was also performed solely by the author, and a different researcher might have interpreted slightly different themes. Finally, because of the small sample size, the quantitative analysis lacked appropriate power. A more well-defined checklist administered to a larger group might have revealed richer, more accurate data.

Conclusion

Introductory programming workshops can be an excellent way for libraries and other organizations to contribute to biomedical research reproducibility. While none of the participants in this study completely transformed their workflows, many adopted new tools and practices including switching from proprietary to open source tool, incorporating more programmatic elements, and versioning their code. Participants also gained new insight into the fundamentals of programming and ideas and plans for changing their workflows in the future. These results indicate that researchers who learn new computational skills are able to improve their workflows, produce more transparent results, and contribute to more reproducible science.

Supporting information

S1 Checklist. Reproducibility checklist.

(PDF)

S1 File. Workflow drawing template.

(PDF)

S2 File. Pre-workshop interview protocol.

(PDF)

S3 File. Post-workshop interview protocol.

(PDF)

S1 Data. Qualitative analysis codebook.

(CSV)

Acknowledgments

The author would like to thank Kristine R. Brancolini, Marie Kennedy, and the Institute for Research Design in Librarianship (IRDL) for guidance on this project. Additional thanks to Savannah Kelly, Jill Barr-Walker, and Catherine Nancarrow for feedback on this manuscript.

Data Availability

All data and accompanying metadata files are available from the Dryad repository at https://doi.org/10.7272/Q6RV0KW6

Funding Statement

The author received no specific funding for this work.

References

  • 1.Stodden V. The Scientific Method in Practice: Reproducibility in the Computational Sciences. Rochester, NY: Social Science Research Network; 2010. February Report No.: ID 1550193. Available: https://papers.ssrn.com/abstract=1550193 [Google Scholar]
  • 2.Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Medicine. 2005;2: e124 10.1371/journal.pmed.0020124 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Collaboration OS. Estimating the reproducibility of psychological science. Science. 2015;349: aac4716 10.1126/science.aac4716 [DOI] [PubMed] [Google Scholar]
  • 4.Baker M. 1,500 scientists lift the lid on reproducibility. Nature News. 2016;533: 452 10.1038/533452a [DOI] [PubMed] [Google Scholar]
  • 5.Kitzes J, Deniz F, Turek D. The Practice of Reproducible Research: Case Studies and Lessons from the Data-Intensive Sciences. Oakland, CA: University of California Press; 2018. Available: https://www.practicereproducibleresearch.org/ [Google Scholar]
  • 6.National Academies of Sciences E. Reproducibility and Replicability in Science. 2019. 10.17226/25303 [DOI] [PubMed] [Google Scholar]
  • 7.Sandve GK, Nekrutenko A, Taylor J, Hovig E. Ten Simple Rules for Reproducible Computational Research. PLOS Computational Biology. 2013;9: e1003285 10.1371/journal.pcbi.1003285 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 8.Markowetz F. Five selfish reasons to work reproducibly. Genome Biology. 2015;16: 274 10.1186/s13059-015-0850-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Samsa G, Samsa L. A Guide to Reproducibility in Preclinical Research. Acad Med. 2018. [cited 3 Aug 2018]. 10.1097/ACM.0000000000002351 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Wilson G, Aruliah DA, Brown CT, Hong NPC, Davis M, Guy RT, et al. Best Practices for Scientific Computing. PLOS Biology. 2014;12: e1001745 10.1371/journal.pbio.1001745 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.About Us. In: Software Carpentry [Internet]. [cited 1 Dec 2017]. Available: http://software-carpentry.org//about/
  • 12.Jordan KL, Michonneau F. Analysis of The Carpentries Long-Term Surveys (April 2020) v2. Zenodo; 2020. April 10.5281/zenodo.3753528 [DOI] [Google Scholar]
  • 13.Deardorff A. Why do biomedical researchers learn to program? An exploratory investigation. Journal of the Medical Library Association. 2020;108: 29–35. 10.5195/jmla.2020.819 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Guest G, Bunce A, Johnson L. How Many Interviews Are Enough?: An Experiment with Data Saturation and Variability. Field Methods. 2006;18: 59–82. 10.1177/1525822X05279903 [DOI] [Google Scholar]
  • 15.Jordan K. Analysis of The Carpentries Long-Term Impact Survey. Zenodo; 2018. July 10.5281/zenodo.1402200 [DOI] [Google Scholar]
  • 16.Stodden V, McNutt M, Bailey DH, Deelman E, Gil Y, Hanson B, et al. Enhancing reproducibility for computational methods. Science. 2016;354: 1240–1241. 10.1126/science.aah6168 [DOI] [PubMed] [Google Scholar]
  • 17.Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD, Breckler SJ, et al. Promoting an open research culture. Science. 2015;348: 1422–1425. 10.1126/science.aab2374 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Garijo D, Kinnings S, Xie L, Xie L, Zhang Y, Bourne PE, et al. Quantifying Reproducibility in Computational Biology: The Case of the Tuberculosis Drugome. PLOS ONE. 2013;8: e80278 10.1371/journal.pone.0080278 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Science Code Manifesto. [cited 6 Jul 2018]. Available: http://sciencecodemanifesto.org/
  • 20.Gorgolewski KJ, Poldrack RA. A Practical Guide for Improving Transparency and Reproducibility in Neuroimaging Research. PLOS Biology. 2016; 13. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Lakens D, Hilgard J, Staaks J. On the reproducibility of meta-analyses: six practical recommendations. BMC Psychology. 2016;4 10.1186/s40359-016-0126-3 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 22.Denaxas S, Direk K, Gonzalez-Izquierdo A, Pikoula M, Cakiroglu A, Moore J, et al. Methods for enhancing the reproducibility of biomedical research findings using electronic health records. BioData Mining. 2017;10 10.1186/s13040-017-0151-7 [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 23.Stodden V, Miguez S. Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research. Journal of Open Research Software. 2014;2 10.5334/jors.ay [DOI] [Google Scholar]
  • 24.Guest G, MacQueen K, Namey E. Applied Thematic Analysis. 2455 Teller Road, Thousand Oaks California 91320 United States: SAGE Publications, Inc; 2012. 10.4135/9781483384436 [DOI] [Google Scholar]

Decision Letter 0

Vasilis J Promponas

28 Apr 2020

PONE-D-20-06264

Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows

PLOS ONE

Dear Ms Deardorff,

Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.

In particular, even though both expert reviewers find the work interesting, they highlight the fact that the data used for the analysis is quite limited. Please, consider using data already available from the Software Carpentry impact assessment programme; if this is not possible (due to the different goals of your study) at least try to discuss/compare your findings in the context of the more generic assessments available therein. I would expect that a revised version of the manuscript will carefully address all points raised by the reviewers, which I believe will strengthen the main message of your work.

We would appreciate receiving your revised manuscript by Jun 12 2020 11:59PM. When you are ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.

If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter.

To enhance the reproducibility of your results, we recommend that if applicable you deposit your laboratory protocols in protocols.io, where a protocol can be assigned its own identifier (DOI) such that it can be cited independently in the future. For instructions see: http://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols

Please include the following items when submitting your revised manuscript:

  • A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). This letter should be uploaded as separate file and labeled 'Response to Reviewers'.

  • A marked-up copy of your manuscript that highlights changes made to the original version. This file should be uploaded as separate file and labeled 'Revised Manuscript with Track Changes'.

  • An unmarked version of your revised paper without tracked changes. This file should be uploaded as separate file and labeled 'Manuscript'.

Please note while forming your response, if your article is accepted, you may have the opportunity to make the peer review history publicly available. The record will include editor decision letters (with reviews) and your responses to reviewer comments. If eligible, we will contact you to opt in or out.

We look forward to receiving your revised manuscript.

Kind regards,

Vasilis J Promponas

Academic Editor

PLOS ONE

Journal requirements:

When submitting your revision, we need you to address these additional requirements:

1.    Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found at http://www.plosone.org/attachments/PLOSOne_formatting_sample_main_body.pdf and http://www.plosone.org/attachments/PLOSOne_formatting_sample_title_authors_affiliations.pdf

2. Please provide additional details regarding participant consent. In the ethics statement in the Methods and online submission information, please ensure that you have specified whether consent was informed.

3. Thank you for stating the following in the Competing Interests section:

"I have read the journal's policy and the author of this manuscript has the following competing interests: the author is the co-chair of the Library Carpentry Advisory Group, one of the advisory groups of the Carpentries organization. Since completing this research, the author has taken on a project expanding outreach for the Carpentries in California academic libraries."

Please confirm that this does not alter your adherence to all PLOS ONE policies on sharing data and materials, by including the following statement: "This does not alter our adherence to  PLOS ONE policies on sharing data and materials.” (as detailed online in our guide for authors http://journals.plos.org/plosone/s/competing-interests).  If there are restrictions on sharing of data and/or materials, please state these. Please note that we cannot proceed with consideration of your article until this information has been declared.

Please include your updated Competing Interests statement in your cover letter; we will change the online submission form on your behalf.

Please know it is PLOS ONE policy for corresponding authors to declare, on behalf of all authors, all potential competing interests for the purposes of transparency. PLOS defines a competing interest as anything that interferes with, or could reasonably be perceived as interfering with, the full and objective presentation, peer review, editorial decision-making, or publication of research or non-research articles submitted to one of the journals. Competing interests can be financial or non-financial, professional, or personal. Competing interests can arise in relationship to an organization or another person. Please follow this link to our website for more details on competing interests: http://journals.plos.org/plosone/s/competing-interests

4. Thank you for stating the following in the Acknowledgments Section of your manuscript:

"The author would like to thank the Institute for Research Design in Librarianship (IRDL) for support with this project. IRDL is partially funded by the Institute of Museum and Library Services grant RE-40-16- 0120-16."

We note that you have provided funding information that is not currently declared in your Funding Statement. However, funding information should not appear in the Acknowledgments section or other areas of your manuscript. We will only publish funding information present in the Funding Statement section of the online submission form.

Please remove any funding-related text from the manuscript and let us know how you would like to update your Funding Statement. Currently, your Funding Statement reads as follows:

"The author received no specific funding for this work."

 

[Note: HTML markup is below. Please do not edit.]

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

2. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: The work presented here focuses on the impact that computational training workshops can have to the reproducibility practices employed by researchers in Life Sciences. In particular, the work describes a study performed by surveying the practices of biomedical researchers at UCSF before and after their participation to an introductory programming workshop. The survey itself was performed through short interviews (20-40 minutes long), with a randomized selection of 14 participants from within a set of 52 relevant cases (i.e. people involved in research, and planning on staying at UCSF for 6 months at least). The interviews were transcribed and both quantitative and qualitative data were analyzed using the online data analysis tool Dedoose. The overall analysis showed that there were notable (but not necessarily statistically significant) changes to the participants' research practices. These include a transition from commercial to open source tools, increased programming literacy and a definite interest for further training.

Overall the paper addresses an open question within the training communities, and especially within Life Sciences, which is to quantify and assess the impact of participation to a computational workshop regarding the practices employed by researchers. The work reflects similar studies being done by initiatives such as ELIXIR (within Europe) and the Carpentries (at a more global level), with the additional value of including a short interview aspect to the more standardized surveys. Overall it's an interesting work, but there are a few points that may be worth re-assessing

1. A main concern is about the overall impact/power of this approach, especially keeping the following two points in mind:

- The survey was performed on learners that participated in Software Carpentry workshops (although this point is not specifically mentioned in the text, I believe that this is an implied statement, also supported by the description of the workshops - page 6, line 125).

- The Carpentries maintain a fairly rigorous impact assessment programme, with a number of reports and analyses available on their website (https://carpentries.org/assessment/) under Zenodo DOIs, and including an "Analysis of Software and Data Carpentry’s Pre- and Post-Workshop Surveys" and a fairly recent "Analysis of The Carpentries Long-Term Surveys" (April 2020)

Although the work present here is much more detailed and focused on the reproducibility aspects (as opposed to the more streamlined and "templated" analysis that the Carpentries offer), at the same time the main driving point do share a significant overlap. Given that, due to the small sample size, the quantitative analysis lacks appropriate power, as also highlighted within the manuscript, it may be worth investigating how the interview data could be complemented by data gathered through the Carpentries impact assessment programme. In particular, this would hopefully underline the insights gained through this work, and connect to a larger number of research communities that are transitioning to more computational approaches (such as the Social Sciences, similarly to Life Sciences)

2. Another point is with regards to the existing bias due to the academic background of the learners. Although there is a representation of several different levels within the cohort, there is a significant bias in the "postdoctoral researcher" level with significantly less participation from other levels. Although this is not surprising nor unexpected, it would also make sense to include a stratified analysis of the results as it would be interesting to see the differences in impact from the workshop at different levels of academia.

3. A final point to highlight is that the actual analysis of the data (i.e. the computational notebook and/or the script) is not available. It is understood that the Dedoose platform was used for analyzing such data; however, I believe that, if the specific computational steps were available, it would greatly increase the further re-use of both data and insights gained through this work.

Reviewer #2: This is an interesting study. We all know that it is good for biomedical researchers to have some computing knowledge in order to make their research reproducible and make the right choice in terms of tools (and libraries), but I have not come across many studies that actually talk about it.

Initially I was a bit worried about the sample and it's small size, in terms of statistical power, but then the authors have actually mentioned about them as limitations. So, readers will be aware of this when they read this. I checked all the forms etc uploaded in DRYAD.

**********

6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Fotis E. Psomopoulos

Reviewer #2: Yes: Shakuntala Baichoo

[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files to be viewed.]

While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email us at figures@plos.org. Please note that Supporting Information files do not need this step.

PLoS One. 2020 Jul 8;15(7):e0230697. doi: 10.1371/journal.pone.0230697.r002

Author response to Decision Letter 0


18 May 2020

I have detailed my response to reviewer comments in the file called "Response to Reviewer Comments." In brief, I added more information from the Carpentries long-term survey to compare the results of my study with that data. I have also published additional stages of my analysis process in Dryad to make the process more transparent.

Attachment

Submitted filename: ResponseToReviewers.docx

Decision Letter 1

Vasilis J Promponas

23 Jun 2020

Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows

PONE-D-20-06264R1

Dear Dr. Deardorff,

We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.

Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.

An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.

If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.

Kind regards,

Vasilis J Promponas

Academic Editor

PLOS ONE

Additional Editor Comments (optional):

I would like to bring to your attention that Plos One does not copyedit accepted manuscripts. Therefore, you should make sure to thoroughly check your manuscript for any remaining typos etc.

One such case is at "Line 56" where there is a period missing in the end of the sentence.

Reviewers' comments:

Reviewer's Responses to Questions

Comments to the Author

1. If the authors have adequately addressed your comments raised in a previous round of review and you feel that this manuscript is now acceptable for publication, you may indicate that here to bypass the “Comments to the Author” section, enter your conflict of interest statement in the “Confidential to Editor” section, and submit your "Accept" recommendation.

Reviewer #1: All comments have been addressed

Reviewer #2: All comments have been addressed

**********

2. Is the manuscript technically sound, and do the data support the conclusions?

The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented.

Reviewer #1: Yes

Reviewer #2: Yes

**********

3. Has the statistical analysis been performed appropriately and rigorously?

Reviewer #1: Yes

Reviewer #2: Yes

**********

4. Have the authors made all data underlying the findings in their manuscript fully available?

The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified.

Reviewer #1: Yes

Reviewer #2: Yes

**********

5. Is the manuscript presented in an intelligible fashion and written in standard English?

PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here.

Reviewer #1: Yes

Reviewer #2: Yes

**********

6. Review Comments to the Author

Please use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters)

Reviewer #1: (No Response)

Reviewer #2: Author has addressed all comments, especially w.r.t. Software Carpentry workshops' info and has worked on the list of references.

**********

7. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.

If you choose “no”, your identity will remain anonymous but your review may still be made public.

Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy.

Reviewer #1: Yes: Fotis Psomopoulos

Reviewer #2: No

Acceptance letter

Vasilis J Promponas

26 Jun 2020

PONE-D-20-06264R1

Assessing the impact of introductory programming workshops on the computational reproducibility of biomedical workflows

Dear Dr. Deardorff:

I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.

If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.

If we can help with anything else, please email us at plosone@plos.org.

Thank you for submitting your work to PLOS ONE and supporting open access.

Kind regards,

PLOS ONE Editorial Office Staff

on behalf of

Dr. Vasilis J Promponas

Academic Editor

PLOS ONE

Associated Data

    This section collects any data citations, data availability statements, or supplementary materials included in this article.

    Supplementary Materials

    S1 Checklist. Reproducibility checklist.

    (PDF)

    S1 File. Workflow drawing template.

    (PDF)

    S2 File. Pre-workshop interview protocol.

    (PDF)

    S3 File. Post-workshop interview protocol.

    (PDF)

    S1 Data. Qualitative analysis codebook.

    (CSV)

    Attachment

    Submitted filename: ResponseToReviewers.docx

    Data Availability Statement

    All data and accompanying metadata files are available from the Dryad repository at https://doi.org/10.7272/Q6RV0KW6


    Articles from PLoS ONE are provided here courtesy of PLOS

    RESOURCES