ABSTRACT
Methods for analyzing data in a reproducible manner are often viewed as impenetrable to scientists more familiar with laboratory research. The Riffomonas YouTube channel is committed to teaching these scientists and others how to engage in reproducible research using modern data science tools.
ANNOUNCEMENT
As high-throughput data generation becomes more common in microbiology and other disciplines, there is a significant need for laboratory scientists to develop data science skills (1). Unfortunately, traditional undergraduate and graduate biology training programs are often deficient in opportunities for scientists to develop the skills necessary to analyze large datasets in a reproducible and robust manner (2, 3). Numerous organizations seek to fill this void, including the Carpentries, Codecademy, and DataCamp (4). There are also numerous video tutorials available on YouTube. Although the content available through these platforms is popular, there has been a gap in content that emphasizes project-based learning.
The Riffomonas YouTube channel (https://www.youtube.com/c/RiffomonasProject) seeks to fill this gap. I started consistently posting videos at the beginning of the coronavirus disease 2019 (COVID-19) pandemic in April 2020. As of the end of November 2022, the channel had 11,327 subscribers and included 285 videos that had been viewed 635,947 times. The majority of these are 264 videos in the “Code Club” playlist (5) (Table 1). Other videos are related to a previously described tutorial series on reproducible research (6) and series in which reproducible research practices are used to address topical questions. Code Club videos are typically between 20 and 30 min long. The code that is developed in the videos is available through a website (https://riffomonas.org/code_club/) and the channel’s GitHub-hosted account (https://github.com/riffomonas).
TABLE 1.
Playlists found on the Riffomonas YouTube channela
| Topic | Playlist title | No. of videos |
|---|---|---|
| Data science | Data visualization with R’s tidyverse and allied packages | 146 |
| Data manipulation within R’s tidyverse and other packages | 116 | |
| Data analysis with base R | 39 | |
| Tools for reproducible data analysis | 33 | |
| Working at the command line | 26 | |
| Literate programming with R markdown | 18 | |
| Machine learning with mikropml R package | 16 | |
| Version control with Git and GitHub | 15 | |
| Scientific writing | 15 | |
| Project organization | 3 | |
| Project-based series | All Code Club videos since 2 April 2020 | 265 |
| Microbiome data analysis and visualization | 86 | |
| ASV/OTUb sensitivity and specificity analyses | 67 | |
| Visualizing COVID-19 vaccination attitudes | 31 | |
| Climate change data visualization | 29 | |
| Evaluating rarefaction and its alternatives | 18 | |
| Drought index visualization | 17 | |
| Reproducible research tutorial series | 14 | |
| Commemorating Juneteenth 2022 with a visualization | 5 | |
| 2018 MLB All Star Break data analysis sprint | 4 |
Because most videos cover more than one topic, they are found in multiple playlists. Playlists and counts were current as of 1 December 2022. Playlists can be found under the Playlists tab at https://www.youtube.com/c/RiffomonasProject.
ASV, amplicon sequence variant; OTU, operational taxonomic unit.
The channel name, Riffomonas, comes from the concept of riffing, in which musical themes are adapted to achieve a similar sound, albeit perhaps in different contexts (6). This is to emphasize the value of reproducibility not only for recreating a set of results but for applying a method with a different data set (7). The channel covers topics related to reproducible data analysis practices, including R programming, data visualization, project organization, version control, command line programming, workflow tools, and scientific publishing (Table 1). Each video includes a brief introduction followed by me live coding to achieve a goal. I emphasize the use of live coding to modulate the rate of instruction and to show viewers my own coding practices. Observing an experienced analyst make mistakes normalizes some level of failure and demonstrates the strategies they can use to resolve their own mistakes. Viewers are encouraged to follow along with each video and to apply the new information to their own project.
Each video emphasizes a specific topic but includes other content that is selected to review topics covered in recent videos. Although videos can be watched individually, they often form a project arc (Table 1). For example, between July 2020 and July 2021, I formulated a research question, obtained and analyzed data to answer the question, and wrote a paper that was published in mSphere (8). This series of 67 videos covered every topic from creating the initial directory on my computer to house the project files through reviewing the proofs of the published manuscript. Other project arcs have included visualizing microbiome data, modeling microbiome data using machine learning tools, analyzing the impacts of rarefying microbiome data, and other topics. Going forward, the Riffomonas channel will continue to post project-based content to help researchers develop their reproducible research skills.
Data availability.
The Riffomonas YouTube channel is available at https://www.youtube.com/c/RiffomonasProject. The code developed in the Code Club videos is available at https://riffomonas.org/code_club/ and the channel’s GitHub-hosted account (https://github.com/riffomonas).
ACKNOWLEDGMENTS
I am grateful to the audience of the Riffomonas channel for their feedback on topics that I should cover in future episodes.
Contributor Information
Patrick D. Schloss, Email: pschloss@umich.edu.
Irene L. G. Newton, Indiana University, Bloomington
REFERENCES
- 1.Barone L, Williams J, Micklos D. 2017. Unmet needs for analyzing biological big data: a survey of 704 NSF principal investigators. PLoS Comput Biol 13:e1005755. doi: 10.1371/journal.pcbi.1005755. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Schloss PD. 2018. Identifying and overcoming threats to reproducibility, replicability, robustness, and generalizability in microbiome research. mBio 9:e00525-18. doi: 10.1128/mBio.00525-18. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Williams JJ, Drew JC, Galindo-Gonzalez S, Robic S, Dinsdale E, Morgan WR, Triplett EW, Burnette JM, III, Donovan SS, Fowlks ER, Goodman AL, Grandgenett NF, Goller CC, Hauser C, Jungck JR, Newman JD, Pearson WR, Ryder EF, Sierk M, Smith TM, Tosado-Acevedo R, Tapprich W, Tobin TC, Toro-Martínez A, Welch LR, Wilson MA, Ebenbach D, McWilliams M, Rosenwald AG, Pauley MA. 2019. Barriers to integration of bioinformatics into undergraduate life sciences education: a national study of US life sciences faculty uncover [sic] significant barriers to integrating bioinformatics into undergraduate instruction. PLoS One 14:e0224288. doi: 10.1371/journal.pone.0224288. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Wilson G. 2016. Software carpentry: lessons learned. F1000Res 3:62. doi: 10.12688/f1000research.3-62.v2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Hagan AK, Lesniak NA, Balunas MJ, Bishop L, Close WL, Doherty MD, Elmore AG, Flynn KJ, Hannigan GD, Koumpouras CC, Jenior ML, Kozik AJ, McBride K, Rifkin SB, Stough JMA, Sovacool KL, Sze MA, Tomkovich S, Topcuoglu BD, Schloss PD. 2020. Ten simple rules to increase computational skills among biologists with code clubs. PLoS Comput Biol 16:e1008119. doi: 10.1371/journal.pcbi.1008119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Schloss PD. 2018. The Riffomonas reproducible research tutorial series. J Open Source Educ 1:13. doi: 10.21105/jose.00013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Leek JT, Peng RD. 2015. Reproducible research can still be wrong: adopting a prevention approach. Proc Natl Acad Sci USA 112:1645–1646. doi: 10.1073/pnas.1421412111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Schloss PD. 2021. Amplicon sequence variants artificially split bacterial genomes into separate clusters. mSphere 6:e0019121. doi: 10.1128/mSphere.00191-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Data Availability Statement
The Riffomonas YouTube channel is available at https://www.youtube.com/c/RiffomonasProject. The code developed in the Code Club videos is available at https://riffomonas.org/code_club/ and the channel’s GitHub-hosted account (https://github.com/riffomonas).
