Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience

Azza E Ahmed; Phelelani T Mpangase; Sumir Panji; Shakuntala Baichoo; Yassine Souilmi; Faisal M Fadlelmola; Mustafa Alghali; Shaun Aron; Hocine Bendou; Eugene De Beste; Mamana Mbiyavanga; Oussema Souiai; Long Yi; Jennie Zermeno; Don Armstrong; Brian D O'Connor; Liudmila Sergeevna Mainzer; Michael R Crusoe; Ayton Meintjes; Peter Van Heusden; Gerrit Botha; Fourie Joubert; C Victor Jongeneel; Scott Hazelhurst; Nicola Mulder

doi:10.12688/aasopenres.12847.1

. 2018 Apr 18;1:9. [Version 1] doi: 10.12688/aasopenres.12847.1

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience

Azza E Ahmed ^1,^2,^#, Phelelani T Mpangase ^3,^#, Sumir Panji ⁴, Shakuntala Baichoo ⁵, Yassine Souilmi ⁶, Faisal M Fadlelmola ¹, Mustafa Alghali ¹, Shaun Aron ³, Hocine Bendou ⁷, Eugene De Beste ⁷, Mamana Mbiyavanga ⁴, Oussema Souiai ⁸, Long Yi ⁷, Jennie Zermeno ⁹, Don Armstrong ⁹, Brian D O'Connor ¹⁰, Liudmila Sergeevna Mainzer ^9,¹¹, Michael R Crusoe ¹², Ayton Meintjes ⁴, Peter Van Heusden ⁷, Gerrit Botha ⁴, Fourie Joubert ¹³, C Victor Jongeneel ⁹, Scott Hazelhurst ^3,¹⁴, Nicola Mulder ^4,^a

¹Centre for Bioinformatics and Systems Biology, Faculty of Science, University of Khartoum, Khartoum, Sudan

²Department of Electrical and Electronic Engineering, Faculty of Engineering, University of Khartoum, Khartoum, Sudan

³Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa

⁴Computational Biology Division, Integrative Medical Biosciences, University of Cape Town, Cape Town, South Africa

⁵Department of Digital Technologies, University of Mauritius, Reduit, Mauritius

⁶Australian Centre for Ancient DNA, University of Adelaide, Adelaide, Australia

⁷South African National Bioinformatics Institute, University of the Western Cape, Cape Town, South Africa

⁸Institut Pasteur De Tunis and Institut Superieur des Technologies Médicales de Tunis, University Tunis Al Manar, Tunis, Tunisia

⁹Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Champaign, IL, USA

¹⁰Genomics Institute, University of California, Santa Cruz, Santa Cruz, CA, USA

¹¹National Centre for Supercomputing Applications, University of Illinois at Urbana-Champaign, Champaign, IL, USA

¹²Common Workflow Project, Vilnius, Lithuania

¹³Centre for Bioinformatics and Computational Biology, Department of Biochemistry, Genetics and Microbiology, University of Pretoria, Pretoria, South Africa

¹⁴School of Electrical & Information Engineering, University of the Witwatersrand, Johannesburg, South Africa

Email: nicola.mulder@uct.ac.za

Competing interests: MRC in his role as CWL Community Engineer, has had his salary supported in the past by grants from Seven Bridges Genomics, Inc to its employers. The other authors declare that they have no competing interests.

Contributed equally.

Roles

Azza E Ahmed: Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Phelelani T Mpangase: Software, Validation, Writing – Original Draft Preparation, Writing – Review & Editing

Sumir Panji: Conceptualization, Project Administration, Resources, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Shakuntala Baichoo: Software, Supervision, Validation, Writing – Review & Editing

Yassine Souilmi: Software, Supervision, Validation, Writing – Review & Editing

Faisal M Fadlelmola: Resources, Supervision, Writing – Original Draft Preparation, Writing – Review & Editing

Mustafa Alghali: Software, Validation, Writing – Review & Editing

Shaun Aron: Software, Validation, Writing – Review & Editing

Hocine Bendou: Software, Validation, Writing – Review & Editing

Eugene De Beste: Software, Validation, Writing – Review & Editing

Mamana Mbiyavanga: Software, Validation, Writing – Review & Editing

Oussema Souiai: Software, Validation, Writing – Review & Editing

Long Yi: Software, Validation, Writing – Review & Editing

Jennie Zermeno: Software, Validation, Writing – Review & Editing

Don Armstrong: Software, Validation, Writing – Review & Editing

Brian D O'Connor: Software, Validation, Writing – Review & Editing

Liudmila Sergeevna Mainzer: Conceptualization, Project Administration, Resources, Supervision, Writing – Review & Editing

Michael R Crusoe: Software, Validation, Writing – Review & Editing

Ayton Meintjes: Conceptualization, Software, Validation, Writing – Review & Editing

Peter Van Heusden: Conceptualization, Software, Validation, Writing – Review & Editing

Gerrit Botha: Conceptualization, Software, Validation, Writing – Review & Editing

Fourie Joubert: Conceptualization, Project Administration, Resources, Software, Supervision, Writing – Review & Editing

C Victor Jongeneel: Conceptualization, Project Administration, Resources, Supervision, Writing – Review & Editing

Scott Hazelhurst: Conceptualization, Project Administration, Resources, Supervision, Writing – Review & Editing

Nicola Mulder: Conceptualization, Funding Acquisition, Project Administration, Resources, Supervision, Writing – Review & Editing

PMCID: PMC7194140 PMID: 32382696

Abstract

The need for portable and reproducible genomics analysis pipelines is growing globally as well as in Africa, especially with the growth of collaborative projects like the Human Health and Heredity in Africa Consortium (H3Africa). The Pan-African H3Africa Bioinformatics Network (H3ABioNet) recognized the need for portable, reproducible pipelines adapted to heterogeneous compute environments, and for the nurturing of technical expertise in workflow languages and containerization technologies. To address this need, in 2016 H3ABioNet arranged its first Cloud Computing and Reproducible Workflows Hackathon, with the purpose of building key genomics analysis pipelines able to run on heterogeneous computing environments and meeting the needs of H3Africa research projects. This paper describes the preparations for this hackathon and reflects upon the lessons learned about its impact on building the technical and scientific expertise of African researchers. The workflows developed were made publicly available in GitHub repositories and deposited as container images on quay.io.

Keywords: Bioinformatics, hackathon, workflow, reproducible, pipeline, capacity building

Introduction

As an inherently interdisciplinary science, bioinformatics depends upon complementary expertise from biomedical scientists, statisticians and computer scientists ¹. This opportunity for collaborative projects also creates a need for avenues to exchange knowledge ¹. Hackathons, along with codefests and sprints, are emerging as an efficient mean for driving successful projects ². They can be in the form of science hackathons that aim to derive research plans and scientific write up ³, community-driven software development ⁴, and data hackathons or datathons ⁵. In addition to the scientific and technical outcomes, these intensive and focused activities offer necessary skills development and networking opportunities to young and early career scientists.

On the African continent, there is generally limited access to such events. However, with the growing capacity for Africans to generate genomic data, the need to analyze these data locally by African scientists, is also growing. H3ABioNet ⁶, the Bioinformatics Network within the H3Africa initiative ⁷, has invested in capacity building via different approaches ⁸. The H3ABioNet Cloud Computing hackathon was a natural extension of the network’s efforts in developing Standard Operating Procedures (SOPs) via its Network Accreditation Task Force (NATF) ⁹; aimed at building and assessing capacity in genomic analysis. This also follows other efforts by the H3ABioNet Infrastructure Working Group (ISWG) towards setting up infrastructure at various H3ABioNet Nodes at the hardware, software, networking, and personnel level. The H3ABioNet Cloud Computing hackathon, therefore, provided an excellent opportunity to assess the computational skills capacity development of the network through training, learning and adoption of novel technologies ( Figure 1). These technologies included workflow languages for reproducible science, containerization of software, and creation of computational products that can be used in heterogeneous computing environments encountered by African and international scientists in the form of standalone servers, cloud allocations and High Performance Computing (HPC) resources.

Figure 1. — (SOPs: Standard Operating Procedures).

In this paper, we discuss the organization of the H3ABioNet Cloud Computing hackathon, the interactions between the participants, and the lessons learnt. A paper describing the technical aspects of the pipelines developed has been in preparation (unpublished paper- Baichoo et. al. 2018 ¹), whereas the code and pipelines themselves have been made publicly available via H3ABioNet’s GitHub page in the following repositories: ( h3agatk, h3abionet16S, h3agwas and chipimputation) as well as container images hosted on Quay.io.

H3ABioNet Cloud Computing Hackathon Activities

Prior to the H3ABioNet Cloud Computing hackathon, H3ABioNet, via its Infrastructure Working Group (ISWG), formed a Cloud Computing task force to investigate cloud computing technologies, to familiarize H3ABioNet members with current cloud implementations and gauge their suitability for H3Africa data analyses. The H3ABioNet Cloud Computing hackathon was one of the first deliverables of this task force, with the specific objective to test and implement the use of four analysis workflows that can be ported on multiple compute platforms. Figure 1 shows this hackathon within the broader H3Africa context and provides a broad overview of the planning and execution of this activity, with details in the following subsections.

Pre-hackathon preparations

The computational pipelines put forward for development during the H3ABioNet Cloud Computing hackathon were identified based on the data being generated by different H3Africa projects and the SOPs used for the H3ABioNet Node Accreditation exercises. Reproducibility and portability were also identified as key features for the workflows, due to the heterogeneous computational platforms available in Africa. H3ABioNet Nodes that used or helped develop current H3ABioNet workflows and SOPs were part of the planning team, as well as other nodes that had technically strong scientists who were willing to extend their skills.

In the course of planning for the H3ABioNet Cloud Computing hackathon, two technical areas were identified where additional expertise was required. These were containerization technology such as Docker, and the writing of genomic pipelines in popularly used workflow languages and newly emerging community-standards like Nextflow ¹⁰ and the Common Workflow Language (CWL) ¹¹, respectively. While expertise for Nextflow already existed within the network, two collaborators from outside of Africa were interested to join the project given their expertise in running genomic pipelines in cloud environments, containerization of code ¹² and developing CWL ¹¹. They subsequently joined the planning and participated in the hackathon.

The H3ABioNet Cloud Computing hackathon was announced on the internal H3ABioNet consortium mailing list as a call for interested applicants and in some cases, individuals were invited based on their specific expertise. Most of the participants selected were early career scientists with strong computational skills, an understanding of genomic pipelines and willingness to work in teams. The pipelines for the Cloud Hackathon where divided into four “streams”: 1) Stream A: variant calling from whole genome sequencing (WGS) and whole exome sequencing (WES) data ( https://github.com/h3abionet/h3agatk), 2) Stream B: 16S rDNA Diversity Analysis ( https://github.com/h3abionet/h3abionet16S), 3) Stream C: Genome Wide-association studies (Illumina array data) ( https://github.com/h3abionet/h3agwas) and 4) Stream D: SNP Imputation and phasing using different reference panels ( https://github.com/h3abionet/chipimputation). Successful applicants were given a choice to select a project stream based on their skills and interest or if unsure, assigned to a specific stream. Streams A and B decided to use CWL for their pipeline development, whereas Streams C and D opted to use Nextflow Language, due to their prior experience using Nextflow.

Vital in setting up the teams for each of the streams was that each team had a balanced composition. This included bioinformaticians with strong computational skills to create the Docker containers and implement the workflow languages, knowledge in the specific genomic analyses and computational tools required, and strong system administration skills to assist with the installation of numerous software components. We also included bioinformaticians with experience in running the workflows or components of the workflows, and software developers who could assist with creating Docker containers, troubleshoot and implement workflow languages.

To maximize the learning experience, upon selection, participants were given prerequisite tutorials and materials (Github, Nextflow, CWL, Docker and the SOPs) to go through. Communication and planning infrastructure in the form of Slack channels and Trello boards were created beforehand with all the participants added in order to allow them to brainstorm and share ideas with team members before the hackathon began ( Table 1). Fortnightly planning meetings were held starting from 3 months in advance in order for hackathon participants to get involved in planning their proposed tools and to get to know one another and develop a working rapport before the start of the hackathon.

Table 1. Communication channels used for the hackathon.

Channel	Link	Purpose
Mailing list	-	Group wide announcements and communications
Mconf	https://mconf.sanren.ac.za/	Online meetings
Slack	https://slack.com/	Inner group discussions and chat
Trello	https://trello.com/	Plan goals and activities, and track progress
GitHub	https://github.com/	Code repository and version control

Open in a new tab

The hackathon ran in August 2016 and was hosted at the University of Pretoria Bioinformatics and Computational Biology Unit in South Africa. The choice of the hackathon venue was based on availability of Unix/Linux desktop machines with the facility for sudo/root access enabling participants to install software and deploy Docker containers for testing. Besides the local machines, participants also had access to cloud computing platforms such as Azure and Amazon, Nebula (made available by the National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign), and the African Research Cloud (through a collaboration with the University of Cape Town eResearch initiative). After the hackathon, more testing was also done on EGI Federated Cloud resources (through an agreement contract with the University of Khartoum).

Hackathon week activities

The initial day of the H3ABioNet Cloud Computing hackathon was dedicated to introductions, expectations by the participants and practical tutorials covering the use of CWL, Nextflow and creation of Docker containers to ensure all participants had the same basic level of knowledge. The teams had a breakout session where overall milestones for the streams during the hackathon week were refined, tasks were identified and assigned to team members and Trello boards updated with the specific tasks. Each stream reported back on their progress and overall work plan for the coming hackathon days. For the remaining days of the hackathon, participants were split into their respective streams to work on developing and containerizing their pipelines as well as creating the related documentation. To ensure a successful hackathon with concrete outcomes, the streams spent the first 30 minutes of each hackathon day reviewing their prior progress and updating their Trello boards and reporting to the group what they will be working on. At the end of the day each stream provided a progress report to the whole group on what they had achieved, what they struggled with and what they will be working on. The start and end of day reporting proved useful as it allowed groups that had encountered and solved an issue to share the implemented solution with another stream, and for different streams to work together to solve any shared issues encountered, thus speeding up the development of the pipelines. Area experts and collaborators would switch between the streams to provide necessary technical expertise.

Communication during the hackathon was facilitated by Slack integration with Trello (for tasks managements and progress tracking) and code developed was pushed to GitHub (for live code integration). Table 1 lists the various communication media used during the hackathon. Some groups also utilized Google docs for documenting their progress prior to migrating documentation into github README files. Remote participation in the hackathon was facilitated through the MConf conference system. One stream had a participant with very strong coding skills working remotely from the US; who managed to make progress on the corresponding workflow when the other group members were not working due to the big time difference between the USA and South Africa (SA). This ensured continuous development on the workflow when the team in SA would clock off and provide a to-do list which was accomplished by the participant from the USA. Noticeable during the hackathon was the team spirit created and the increasingly later end time for the days with most days ending at 8:30 pm as participants continued working after the different streams provided their daily reports. All participants wished for an extra day or two to complete their pipelines.

Post-hackathon feedback and actions

After the week-long hackathon at the University of Pretoria, members of each stream continued working on their respective pipelines communicating via Slack and Trello. Meetings were held over MConf every two weeks to report on the progress of each pipeline. Upon completion, each group handed their pipeline to other groups to test on different platforms to avoid any bias in implementation and improve the documentation and hence facilitate the ease of use of the four pipelines developed.

Discussion

The H3ABioNet Cloud Computing Hackathon was aimed at producing portable, cloud-deployable Docker containers for a variety of bioinformatics workflows including variant calling, 16S rDNA diversity analysis, quality control, genotype calling, and imputation and phasing for genome-wide association studies. Dockerization provides a method to package and manage software, tools and workflows within a portable environment/container, similar to virtualization but with a smaller computing overhead compared to virtualization. Docker containers can easily be developed and deployed on computing environments (especially cloud based infrastructure) and can be used by a variety of groups to ensure reproducible analysis using the same tools, software versions, and workflows used.

The novelty of the H3ABioNet Cloud Computing Hackathon was that all the participants selected were involved in the latter stages of the planning and the setting of some of the outcomes for the hackathon. Critical recommendations during the hackathon planning meetings were that the resulting Docker containers and pipelines developed should be compatible with heterogeneous African research compute environments with portability and good documentation being key. This is especially important considering the fact that access to Cloud computing environments within Africa is still in its infancy. Hence, it was decided that development and testing of the pipelines should occur on a single machine, with the ability to be ported to a cluster or an HPC environment, and ultimately tested and deployed on cloud-based platforms (Amazon, Microsoft Azure, EGI FedCloud, IBM Bluemix, and the new African Research Cloud initiative).

Lessons learnt and concluding remarks

The opportunity to link people physically and focus solely on one project has been highly effective in providing the main outline and proof of concept outputs. However, once people were back home, continuing the tasks has been a challenge. Clearly defining the roles and commitment of all the participants in the papers reporting the results should encourage them to complete the work, and increase their accountability.

The communication and management tools used for this hackathon were important as these tools facilitated interaction between and across team members and enabled the participants to continue to work in a structured manner once back at their respective institutions, despite time zones differences.

The H3ABioNet Cloud Computing Hackathon has been an important milestone for the Network as it brought together people with various skills to work on focused projects. It signalled the shift from capacity building to utilizing the capacity developed in order to tackle problems specific to the heterogeneous African computing environments, as defined and implemented by the mostly African participants.

As software packages and computing environments evolve with varying build cycles and new bioinformatics tools become available, we envision that hackathons to keep these pipelines current, adopt new technology implementations such as Singularity, and develop new workflows such as for RNA-Seq analysis will occur. The pipelines developed during the H3ABioNet Cloud Computing hackathon will be used for training and data analyses for intermediate level bioinformatics workshops, and for scientific collaborations requiring bioinformatics expertise for data analysis such as with the H3Africa genotyping chip and GWAS analyses. Future H3ABioNet hackathons would also provide an opportunity to utilize the skills of trained bioinformaticians at intermediate and advanced levels, who would not otherwise attend bioinformatics training workshops, to come together to derive practical solutions that are of benefit to the African and wider scientific community.

Data and software availability

All data underlying the results are available as part of the article and no additional source data are required.

The four pipelines are available publicly via H3ABioNet’s github organization page https://github.com/h3abionet in the following repositories: (h3agwas, chipimputation, h3agatk and h3abionet16S) as well as container images on quay.io at https://quay.io/organization/h3abionet_org.

All code is available under MIT license, except for the h3agtk pipeline which is available under Apache license 2.0

Note

¹Full author list for unpublished study: Shakuntala Baichoo, Yassine Souilmi, Sumir Panji, Gerrit Botha, Ayton Meintjes, Scott Hazelhurst, Hocine Bendou, Eugene de Beste, Phelelani T. Mpangase, Oussema Souiai, Mustafa Alghali, Long Yi, Brian D. O’Connor, Michael Crusoe, Don Armstrong, Shaun Aron, Fourie Joubert, Azza E. Ahmed, Mamana Mbiyavanga, Peter van Heusden, Lerato E. Magosi, Jennie Zermeno, Liudmila Sergeevna Mainzer, Faisal M. Fadlelmola, C. Victor Jongeneel and Nicola Mulder

Acknowledgments

We acknowledge the advice and help from Ananyo Choudhury from Sydney Brenner Institute for Molecular Bioscience, University of the Witwatersrand, Johannesburg, South Africa.

Funding Statement

H3ABioNet is supported by the National Institutes of Health Common Fund [U41HG006941]. H3ABioNet is an initiative of the Human Health and Heredity in Africa Consortium (H3Africa) programme of the African Academy of Science (AAS. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

[version 1; peer review: 3 approved with reservations]

References

1. Yanai I, Chmielnicki E: Computational biologists: moving to the driver’s seat. Genome Biol. 2017;18(1):223. 10.1186/s13059-017-1357-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
2. Möller S, Afgan E, Banck M, et al. : Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics. 2014;15 Suppl 14:S7. 10.1186/1471-2105-15-S14-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]
3. Groen D, Calderhead B: Science hackathons for developing interdisciplinary research and collaborations. eLife. 2015;4:e09944. 10.7554/eLife.09944 [DOI] [PMC free article] [PubMed] [Google Scholar]
4. Crusoe MR, Brown CT: Channeling Community Contributions to Scientific Software: A sprint Experience. J Open Res Softw. 2016;4(1): pii: e27. 10.5334/jors.96 [DOI] [PMC free article] [PubMed] [Google Scholar]
5. Aboab J, Celi LA, Charlton P, et al. : A “datathon” model to support cross-disciplinary collaboration. Sci Transl Med. 2016;8(333):333ps8. 10.1126/scitranslmed.aad9072 [DOI] [PMC free article] [PubMed] [Google Scholar]
6. Mulder NJ, Adebiyi E, Alami R, et al. : H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa. Genome Res. 2016;26(2):271–7. 10.1101/gr.196295.115 [DOI] [PMC free article] [PubMed] [Google Scholar]
7. H3Africa Consortium, Rotimi C, Abayomi A, et al. : Research capacity. Enabling the genomic revolution in Africa. Science. 2014;344(6190):1346–8. 10.1126/science.1251546 [DOI] [PMC free article] [PubMed] [Google Scholar]
8. Aron S, Gurwitz K, Panji S, et al. : H3abionet: developing sustainable bioinformatics capacity in africa. EMBnet j. 2017;23:e886 10.14806/ej.23.0.886 [DOI] [Google Scholar]
9. Jongeneel CV, Achinike-Oduaran O, Adebiyi E, et al. : Assessing computational genomics skills: Our experience in the H3ABioNet African bioinformatics network. PLoS Comput Biol. 2017;13(6):e1005419. 10.1371/journal.pcbi.1005419 [DOI] [PMC free article] [PubMed] [Google Scholar]
10. Peter A, Crusoe MR, Nebojša T, et al. : Common Workflow Language, v1.0. 2016. 10.6084/m9.figshare.3115156.v2 [DOI] [Google Scholar]
11. Di Tommaso P, Chatzou M, Floden EW, et al. : Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316–9. 10.1038/nbt.3820 [DOI] [PubMed] [Google Scholar]
12. O’Connor BD, Yuen D, Chung V, et al. : The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. [version 1; referees: 2 approved]. F1000Res. 2017;6:52. 10.12688/f1000research.10137.1 [DOI] [PMC free article] [PubMed] [Google Scholar]

AAS Open Res. 2018 Jun 22. doi: 10.21956/aasopenres.13913.r26312

Reviewer response for version 1

C Titus Brown ¹, Rayna Harris ¹

Summary and impression

The article describes the organization and execution of a hackathon in Africa with the goal of producing cloud-deployable Docker containers for four bioinformatic workflows.

The article begins with a brief history of H3ABioNet and the need for bioinformatics infrastructure, training, and community in Africa.

Then the authors describe their pre-hackathon preparations, the weeks of activities during the hackathon, and their lessons learnt. No quantitative assessment of the hackathon was provided, but the authors do provide links to the bioinformatic workflows used.

Hackathons are an increasingly popular way of building tools and community for research and education. This article nicely describes why a hackathon was needed in Africa and how the authors went about organizing and executing their first hackathon. The article contains some useful suggestions for what bioinformatic or social tools can be used to facilitate research and communication. I think it is important to communicate this type of activity to the broader scientific community, however, I have a few concerns.

Major issue

My main concern is that this paper was submitted as a "method article". My understanding is that methods articles should describe new empirical or computational methods that are described in sufficient detail such that they can be reproduced. While this article does describe the authors' strategy for organizing and running a hackathon, this article is written more as a retrospective piece on the event rather than a recipe for running a hackathon. I am not convinced that it should be published as a method paper; unfortunately, there doesn't appear to be a more suitable platform within the AAS journal.

https://aasopenresearch.org/for-authors/article-guidelines/method-articles

Minor issues

The figure legend does include a title, but it does not include a description of the key points nor does it explain the meaning of the arrows. According to the AAS guidelines, "the legend should be sufficiently detailed so that it can stand alone from the main text". Additionally, it is not clear from first glance that "4 portable computational workflows" is the goal of the hackathon. This could be made more clear.
Table 1 provides a nice overview of communication channels. Can you elaborate and add what tools you used for sharing documents (e.g notes, slides, pdf)?
On page 4, the authors state: "Vital in setting up the teams...". Does this paragraph refer to the expertise of the learners/participants or to the people who are leading the stream or both?
On page 5, the first sentence of the discussion provides the first clear statement of the aim of the hackathon (in my opinion). The goal is mentioned in the abstract and intro, but it isn't as clear. It is in this paragraph that I realized that you had working pipelines, but they were not "dockerized". After reading this, the figure made a lot more sense. I recommend revising the abstract and intro to make it clear what the starting point (5 bioinformatic workflows not in the cloud) and the endpoints (4 bioinformatic workflows in the cloud).
The paragraph on "Post-hackathon feedback and actions" seems incomplete or perhaps is mislabelled. This paragraph describes communication and work that extended past the hackathon, but it does not describe any assessment or feedback mechanisms that were used. Also, what happened after groups traded platforms? Was the documentation improved or was this simply the goal?
The article jumps from "Introduction" to "Discussion" without providing a clear a description of some of the results or outcomes. Is there a reason why some statistics regarding participation or progress toward the goal are not reported? If there is a reason why demographic information cannot be provided, that is fine, but a report about how much progress was made toward the goal of dockerization would be useful.
While I appreciate that you provided a link to the GitHub repo and Docker container, I highly recommend getting a DOI for these repositories. The AAS data guidelines page provides a list of providers. Some of your GitHub repositories already have versions, so it should be fairly easy to import these into Zenodo for a DOI. https://aasopenresearch.org/for-authors/data-guidelines.

We confirm that we have read this submission and believe that we have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however we have significant reservations, as outlined above.

AAS Open Res. 2018 May 2. doi: 10.21956/aasopenres.13913.r26376

Reviewer response for version 1

Juan Ruiz-Alzola ¹

I have overall enjoyed reading this paper. The implementation of the hackathon is well explained, and provides good hints for others to replicate both in Africa and in the rest of the World. In particular, a large array of ICT-based collaborative tools has been applied whose use is very interesting everywhere. The motivation is also very relevant, as hands-on knowledge sharing is important to build distributed strong scientific communities that can develop new solutions and keep collaboration active within wide multi-site networks. This is particularly so in the case of bioinformatics, a profoundly interdisciplinary and complex field, key to modern knowledge-based economy, health and welfare, where Africa could leverage important opportunities as far as some hurdles are properly managed. It is in this framework where I see the main contribution of this paper, and I must point out the relevance of the collaboration among African and rest of the World institutions and researchers, in addition to the internal African collaboration. In my view all of this is essential to unlock all the potential, and achieve not only scientific success but a wider interaction of great value for economic development. I’m not getting into any technical detail of the pipelines since, as reported in the paper, they will be presented elsewhere.

Nevertheless I have some constructive criticisms that I’d like to share:

I’ve pointed out that the rationale is only partly explained essentially because some boundary conditions should be explained: how many nodes and researchers are participating, what their access to Internet is like in their sites, what difficulties they’re experiencing in their home institutions to access to knowledge and to interact with other international colleagues, etc. Africa is large with very different situations across countries and within each country. It’d be good that the reader could understand better the everyday situation that motivates an action like this, or a wider program such as H3ABioNet. It’d also be good to explain a bit longer what H3A and H3ABioNet are, and how they become important programs for the scientific development of Africa.
The description of the method is technically sound, but I have some concerns about how realistic it is to expect such sophisticated ICT-based collaboration, difficult anywhere, considering the existing difficulties for high bandwidth connections in many areas of Africa, including many Universities in main cities. Our group is currently involved in a cooperation with research groups in four African countries. Internet access, good enough for sharing medical datasets, for example, or just to hold videoconferences, is something that we cannot take for granted. Again, this is also related to (1) and boundary conditions, but it is my experience that dealing with this sort of issues is key for success and, at least, to replicate the experience elsewhere.
The conclusions are somehow conditioned by what I’ve already pointed out.

In summary, I think this is a very interesting and motivating paper that could benefit of a better explanation of the boundary conditions, and of how the specific difficulties have been overcome, so that others can replicate the experience in the same or in different fields across the continent.

I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above.

AAS Open Res. 2018 Apr 18. doi: 10.21956/aasopenres.13913.r26315

Reviewer response for version 1

Steffen Möller ¹

The paper describes an in-person meeting in Africa to further develop workflows for genomic analyses and distribute that skill throughout the continent. As a Northern European I can only remotely assess the difficulties of computational biology and bioinformatics services in Africa in local areas. I can see how important such Hackathons are to edutRain research groups with difficulties to attend international conferences. And, with some experience in attending and organizing such events, I was very curious about what may be different in Africa. After all, in Europe there are communities having difficulties to commute, too. The European Union has extra funds for graduate students from Eastern European countries for instance. But there are also highly talented high school students who are under age or do not have the funds or scientific contacts to travel/be trained. So, if you have developed principles for African talents to overcome such obstacles to learn about such Hackathons and prepare for them, then I would be very much interested to see what we can adopt over here.

From what I then read in the article, there was not so much that the authors did differently, except that they seem to have done it particularly well. Table 1 describes a whole bunch of communication channels when typically it is a mere Wiki or co-editing site orchestrating the participants. Still, every attendee had to physically attend, except that the event was in Africa describing an African set of resources and there was external expertise flown in. I tend to think that here the event fell short of what could have been (and is often) done to invite remote participation. Also, one could videograph training material to support the further distribution of these Africa-specific analysis skills.

I am also a bit critical about the dominance of online resources that demand good internet connections as in the download of gigabytes of data: Docker. Here I wish more would be done to promote offline services. But I am biased, the authors NM and MRC know me as a contributor to Debian Med, MRC being a Debian contributor himself, which basically means that participants could take a DVD home and perform analyses of their samples without Internet access involving as many local-to-them machines as they like. For the workflows one doesn’t need most of the complicated bits for which one needs computational expertise the article is describing. And that may have helped the post-Hackathon drop in participation, while, of course every event sees that drop and that is a main reason to have such a dedicated time to jointly develop our research environments in the first place.

Concerning the scientific results, I understand that there is a separate paper prepared. Still, can you say as much as if there are scientific papers out there that already employed the pipelines for their analysis? I mean, from the time before the Hackathon? This would emphasize that you are indeed redistributing a very current set of skills with practical acceptance in the community.

There is something else to it all. Hackathons form a social network of trust. And you need trust in locally well-described samples. The genetic diversity of Africa is a gem, but one needs to be aware of the demands for population stratification. And because of the harsh environment conditions, one can expect considerable batch effects on samples taken, for which the emphasis on equally collected control samples is important. You can certainly read about it all, but it will help to hear in person about that study that was ruined because cases and controls were kept separate and one box had the dry ice evaporated early – factor variation and confounding technical parameters cannot be communicated enough. Well-established workflows allow participants to analyse their local data independently, have extra parameters matching local concerns, with matching local controls, which will all improve the quality of the pan-African study at large.

So, I do not think that anything performed for the organisation of this Hackathon was specific to Africa. In the contrary, there should have been some teleconferencing to it. The “initial day to get everyone up to speed” reminds me a bit more of a Summer School than a Hackathon, for which it is not uncommon that most participants already know each other for some time and would not need that day. Could you possible elude a bit more on how the work performed was structured? And how was the external expertise intermingled with your local needs for Africa?

I would want to retitle the paper towards something like “Hackathon on workflows for genomics held in Africa”. The H3ABioNet workflows are nicely accepted everywhere, I tend to think, so, why not have an H3ABioNet workflow Hackathon in Europe with some Africans flying in? Clarifying that a bit more may help the paper.

AAS Open Res. 2018 Apr 21.

Victor Jongeneel ¹

Thanks very much to Steffen Möller for his pointed comments. It is certainly true that this hackathon did not differ in any major way from similar events held in high-income countries, and did not incorporate any features specific to the African context. Its primary value was to expose African scientists to the practical aspects of community development of computer code, and to try to create a community around the maintenance of a set of workflows that implement methods that are useful to the H3Africa research community and beyond.

There are aspects of the work that may not have come across in the paper. For one thing, the workflows implementing haplotyping, imputation, and GWAS analysis were based on work done in the framework of the H3Africa AWIGen project, and are in production for the analysis of data generated by this project. Similarly, the workflow for variant calling from WGS data was used in the analysis of 350 African genomes that has led to the design of a novel genotyping chip optimized for African populations, and the 16S rDNA sequence analysis was derived from work done to analyze bacterial populations present in leg ulcers of sickle cell patients in Nigeria. Therefore, all of the code developed during the hackathon was solidly anchored in existing genomic analysis projects in Africa.

Secondly, the workflows developed in the hackathon serve as practical implementations of Standard Operating Procedures for the H3Africa Accreditation Exercises, which are used to evaluate the capacity of African research groups to analyze complex genomic datasets being produced by its research projects (see Jongeneel et al, PLoS Comput Biol, PMC5453403). Success in taking one of the exercises is considered a landmark for African groups who are preparing to step into the existing gap between data production and data analysis, where the analysis is typically undertaken by First World groups.

It is true that the authors, including myself, could have done a better job at explaining how the hackathon and its products are anchored in the H3Africa research ecosystem. I hope that the above clarifies this.

As a final remark, while highly Internet-dependent tools were used extensively during the hackathon, to my knowledge none required a very high bandwidth. At least two of the participants attended remotely from North America, and were able to contribute substantially in part because of their time zone differences and asynchronous contributions.

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Data Availability Statement

All data underlying the results are available as part of the article and no additional source data are required.

All code is available under MIT license, except for the h3agtk pipeline which is available under Apache license 2.0

[ref-1] 1. Yanai I, Chmielnicki E: Computational biologists: moving to the driver’s seat. Genome Biol. 2017;18(1):223. 10.1186/s13059-017-1357-1 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-2] 2. Möller S, Afgan E, Banck M, et al. : Community-driven development for computational biology at Sprints, Hackathons and Codefests. BMC Bioinformatics. 2014;15 Suppl 14:S7. 10.1186/1471-2105-15-S14-S7 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-3] 3. Groen D, Calderhead B: Science hackathons for developing interdisciplinary research and collaborations. eLife. 2015;4:e09944. 10.7554/eLife.09944 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-4] 4. Crusoe MR, Brown CT: Channeling Community Contributions to Scientific Software: A sprint Experience. J Open Res Softw. 2016;4(1): pii: e27. 10.5334/jors.96 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-5] 5. Aboab J, Celi LA, Charlton P, et al. : A “datathon” model to support cross-disciplinary collaboration. Sci Transl Med. 2016;8(333):333ps8. 10.1126/scitranslmed.aad9072 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-6] 6. Mulder NJ, Adebiyi E, Alami R, et al. : H3ABioNet, a sustainable pan-African bioinformatics network for human heredity and health in Africa. Genome Res. 2016;26(2):271–7. 10.1101/gr.196295.115 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-7] 7. H3Africa Consortium, Rotimi C, Abayomi A, et al. : Research capacity. Enabling the genomic revolution in Africa. Science. 2014;344(6190):1346–8. 10.1126/science.1251546 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-8] 8. Aron S, Gurwitz K, Panji S, et al. : H3abionet: developing sustainable bioinformatics capacity in africa. EMBnet j. 2017;23:e886 10.14806/ej.23.0.886 [DOI] [Google Scholar]

[ref-9] 9. Jongeneel CV, Achinike-Oduaran O, Adebiyi E, et al. : Assessing computational genomics skills: Our experience in the H3ABioNet African bioinformatics network. PLoS Comput Biol. 2017;13(6):e1005419. 10.1371/journal.pcbi.1005419 [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref-10] 10. Peter A, Crusoe MR, Nebojša T, et al. : Common Workflow Language, v1.0. 2016. 10.6084/m9.figshare.3115156.v2 [DOI] [Google Scholar]

[ref-11] 11. Di Tommaso P, Chatzou M, Floden EW, et al. : Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017;35(4):316–9. 10.1038/nbt.3820 [DOI] [PubMed] [Google Scholar]

[ref-12] 12. O’Connor BD, Yuen D, Chung V, et al. : The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. [version 1; referees: 2 approved]. F1000Res. 2017;6:52. 10.12688/f1000research.10137.1 [DOI] [PMC free article] [PubMed] [Google Scholar]

PERMALINK

Organizing and running bioinformatics hackathons within Africa: The H3ABioNet cloud computing experience

Azza E Ahmed

Phelelani T Mpangase

Sumir Panji

Shakuntala Baichoo

Yassine Souilmi

Faisal M Fadlelmola

Mustafa Alghali

Shaun Aron

Hocine Bendou

Eugene De Beste

Mamana Mbiyavanga

Oussema Souiai

Long Yi

Jennie Zermeno

Don Armstrong

Brian D O'Connor

Liudmila Sergeevna Mainzer

Michael R Crusoe

Ayton Meintjes

Peter Van Heusden

Gerrit Botha

Fourie Joubert

C Victor Jongeneel

Scott Hazelhurst

Nicola Mulder

Roles

Abstract

Introduction

Figure 1. Planning and execution of the H3ABioNet Cloud computing hackathon.

H3ABioNet Cloud Computing Hackathon Activities

Pre-hackathon preparations

Table 1. Communication channels used for the hackathon.

Hackathon week activities

Post-hackathon feedback and actions

Discussion

Lessons learnt and concluding remarks

Data and software availability

Note

Acknowledgments

Funding Statement

References

Reviewer response for version 1

C Titus Brown

Rayna Harris

Roles

Reviewer response for version 1

Juan Ruiz-Alzola

Roles

Reviewer response for version 1

Steffen Möller

Roles

Victor Jongeneel

Associated Data

Data Availability Statement

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases