Abstract
Background
With an exponential growth in biological data and computing power, familiarity with bioinformatics has become a demanding and popular skill set both in academia and industry. There is a need to increase students’ competencies to be able to take on bioinformatic careers, to get them familiarized with scientific professions in data science and the academic training required to pursue them, in a field where demand outweighs the supply.
Methods
Here we implemented a set of bioinformatic activities into a protein structure and function course of a graduate program. Concisely, students were given hands-on opportunities to explore the bioinformatics-based analyses of biomolecular data and structural biology via a semester-long case study structured as inquiry-based bioinformatics exercises. Towards the end of the term, the students also designed and presented an assignment project that allowed them to document the unknown protein that they identified using bioinformatic knowledge during the term.
Results
The post-module survey responses and students’ performances in the lab module imply that it furthered an in-depth knowledge of bioinformatics. Despite having not much prior knowledge of bioinformatics prior to taking this module students indicated positive feedback.
Conclusion
The students got familiar with cross-indexed databases that interlink important data about proteins, enzymes as well as genes. The essential skillsets honed by this research-based bioinformatic pedagogical approach will empower students to be able to leverage this knowledge for their future endeavours in the bioinformatics field.
Keywords: Bioinformatics, amino acid sequence, nucleotide sequence, Chimera, homology modeling, pedagogy
A modular inquiry-based semester theme that integrates data science education and bioinformatics in protein structure function courses.
Abbreviations
- BLAST
Basic local alignment search tool
- MSA
Multiple sequence alignment
- ORFs
open reading frames
- OER
Open educational resource
- PDB
Protein databank
- MMB
Master of Medical Biotechnology
Introduction
A seemingly endless stream of big biological data with an average of 2.5 million quintillion/day (Desjardins 2019), generated by next-generation sequencing techniques, is pouring out of life sciences research. When it is coupled with exponential growth in computational power and cloud computing capabilities, the data is producing at a rate much faster than it can be analyzed, thus creating a need for expeditious, effective, and efficient analyses.
This practical omnipresence is creating ample career opportunities in bioinformatics with a compound annual growth rate of 14.10% hitting 28.04 billion by 2028 (Newsmantraa 2023) with a much faster than average 33% job outlook from 2016 to 2026 (U.S. Bureau of Labor Statistics 2024) across multiple industries, especially in the big biotech and pharma sectors. Even academia is seeing an exponential uptick in bioinformatic career prospects, as the discipline itself is intensifying (Black and Stephan 2004, 2005). Bioinformatics data analyses are also identified as one of the most urgent and unmet needs for the successful completion of academic research projects (Black and Stephan 2004, 2005).
However, biotechnology and biomedical students though at the crossroads of protein structural biology, drug discovery, biotech-inspired entrepreneurship and laboratory skills (Amtul et al. 2023) are not prepared at large to take on these career pathways. Primarily, biochemistry textbooks do not keep pace with the rapidly exploding bioinformatics developments or data science. Secondly, biochemistry or biology students don't have a bridging opportunity available to narrow the gap between their biotechnological skillset acquired via their academic programs, and the inflated-in-demand bioinformatic data analysis training (Wu et al. 2016). Thirdly, many of the foundational bioinformatics data types are used in a segregated, less integrated and coherent manner in biology curriculums, creating a gulf between theory and practice (Hack and Kendall 2005). Lastly, a lack of instructors who are willing as well as capable of teaching even basic bioinformatics in classes, and do not emphasize on students’ quantitative skill development, such as students’ basic literacy in computer sciences or statistics (Rosenwald et al. 2012).
As the national numbers of science majors in medical science and biotechnology continue to increase (Board 2016), a solution to this gridlock would be to develop built-in laboratory modules focusing on supplemental research-driven, data-mining bioinformatics exercise tutorials/tools so more students can benefit.
Goals
Thus, the goals of this study are (i) to incorporate bioinformatics into a protein structure and function course or biochemistry curriculum to make interrelatedness among these courses more palpable to the students, (ii) to introduce biomedical graduates to some of the common primarily Internet-based bioinformatics skills, and basic command-line computational tools used by biochemists, (iii) to stimulate students to advance an appreciation of bioinformatics methods to address biological enquiries via a semester-long theme to identify an unknown protein, (iv) to strengthen instructors’ competencies in introducing bioinformatics resources and tools in their classes, and (v) to better prepare science graduates to be able to pursue careers in pharma and biotech sectors in undertaking bioinformatic scientific investigations, or jobs, and to seize the opportunity once they have become an effective part of the bioinformatic workforce.
Hence, using a research-oriented, and problem-solving bioinformatic approach, here we devised, and built a bioinformatics lab module in a protein structure and function course, which is part of a highly in-demand professional Master of Medical Biotechnology (MMB) program (Amtul 2023, 2024). Over the course of the semester, students explored the computer-based analyses of biomolecular data and structural biology structured in five bioinformatics exercises, spanning over the entire length of the semester.
Semester-long theme
A set of cohesive exercises designed to address the semester-long theme by performing a treasure hunt to find different chunks of information about an unknown amino acid and/or nucleotide sequence from various bioinformatics databases. This strategy has at least three main benefits. First, it provides a consistent view of multiple bioinformatics aspects and analyses for a single protein, so students can gain valuable information by looking at the same protein from different biological perspectives. Second, students would be able to acquire the skill set to critically evaluate the interpretation and reporting of the data of a protein implicated in a biomedical condition. Lastly, using a unique independent protein as the base of these exercises provides further confirmation that students were able to genuinely follow the instructions and perform the exercises independently of each other. This strategy familiarizes students with the data types contained in various database entries so that later in the term students should be able to go back to these sites to find associated or additional information for their end of the term group assignments.
Bioinformatic module
Briefly, a module is developed in plain enough language, using freely available computing tools, algorithms, apps and data, to enable students as well as novice instructors to teach protein structure and function capstone concepts to upper-level biology students. Bioinformatic exercises were designed to find, recognize, and characterize, such as manipulating, understanding, and comparing the amino acid sequence, and three-dimensional structure of urease enzyme as the model protein biomolecule using a Basic Local Alignment Search Tool (BLAST) against an assemblage of non‐redundant protein databases (Altschul et al. 1990). Similarly, protein-protein interactions, biochemical pathways, and ligands search by using the direct links to UniProt, ExPASy, ChEMBL, and entries in GenBank from the BLAST output at the PDB site (Guex and Peitsch 1997). Conservation analyses using multiple sequence alignments (MSA), and phylogenesis by aligning the FASTA‐formatted sequences from the UniProt, and ExPASy sites using ClustalW program (Thompson et al. 1994). Enzyme activity comparisons, and active site determination by analysing the kinetic data compiled in the BRENDA, and ENZYME databases as well as on NiceZyme page. Three-dimensional structure construction, ligand identification and docking by using SWISS-MODEL linked to DeepView (also known as Swiss-PdbViewer) and performing the drug screening using UCSF Chimera (Pettersen et al. 2004, Rizvi et al. 2013). Students were also got familiar with the R programming language. To parallel the course content with the bioinformatics exercises as much as possible, one major bioinformatics task was incorporated into each exercise.
Learning objectives
Through these tasks students (i) reinforce their existing concept knowledge about genome (codons), proteins, their mutations, and their structural, functional partners, activators, and ligands, (ii) learn new concepts such as e-value, bit score, open reading frames, FASTA sequence, synteny, and functional genomics; and (iii) improve computational skills which then increases their digital literacy, such as the ability to find and select information, and familiarity with bioinformatic terms and databases.
Learning outcomes
The specific learning outcomes for this lab module are based on the learning outcomes for the course, which are, (i) discuss how protein structure is related to protein function, (ii) apply knowledge of protein-structure relationships to predict the function of a protein or vice versa, (iii) use the Protein Databank (PDB) to obtain information on protein structure, (iv) appraise currently published scientific references and determine protocols for potential future studies, (v) write scientific reports on how the structures of proteins are related to diseased states, (vii) present findings regarding the relationship between protein structure and the diseased state, and (viii) know protein structure relationships and assess how they relate to other diseased states.
This lab module would allow students, specifically, to explore the computational biology territory within the protein structure and function curriculums, in addition to bringing routine practices to the classroom context driven by research, motivation, engagement, and innovation. Student feedback is collected through the discussion board, and possible modifications to the exercises are also discussed.
Materials and methods
Ethics statement
The Research Ethics Board of the University of Windsor approved this study. The students were asked to complete the survey only if they consented to use their course performance data, survey responses, discussion board, and email comments. The consent was built within the survey. Students were also provided access to the detailed consent letter outlining the details of the study, via the course site. Declaration of Helsinki principles were followed throughout the study.
Cohort selection
To balance the need for a fast enough study time, and a large enough sample size, two classes, both taught by the same instructor, and two different graduate assistants, in successive fall terms from September to December (n = 60) were chosen as the student cohort to participate.
Course synopsis and format
The protein structure and function course is a 1.5 credit hours (3.0 units) course consisting of two weekly lectures (3 hours), and a bi-weekly laboratory (1.5 hours) component, all through a 12-week semester offered during fall, and a weeklong summer break occurs in mid-October, optimally between weeks 5 and 6. Our average course enrollment is a little over a hundred students, with a prerequisite of an undergraduate degree in biology, biotechnology, or related disciplines. Students are required to have a basic knowledge of biological sciences, biochemistry, and enzymology. The Bioinformatics module was developed essentially as a project-based and problem-oriented hands-on (Amtul Z 2024) lab module for the graduate students enrolling in the MMB program at the university of Windsor.
Computer/Software required
Windows, Macintosh, or Linux, a Web Browser such as Google Chrome, Safari, or Microsoft Edge, UCSF Chimera (https://www.cgl.ucsf.edu/chimera/download.html), and R program (https://cran.r-project.org), https://www.rstudio.com/products/rstudio/download/#download). These tools, and software packages are generally available via the Internet freely. Students only need to be provided some guidance about the websites, and the appropriate hyperlinks that can be uploaded on the course site.
Exercise design
The format of each of the six exercises was designed to fit a 1.5-hour long lab session and explained as self-guided tasks. Each student is given an unknown protein in the beginning of the term in the form of an unknown amino acid and/or nucleotide sequence. The amino acid sequences were taken from ADDRESS (https://zhanglab.ccmb.med.umich.edu/ADDRESS): A Database of Disease-associated Human Variants Incorporating Protein Structure and Folding Stabilities (Pettersen et al. 2004). Once the protein was identified, students could easily collect background material on it to be included in the end of the term assignment. Each of the exercises was accompanied by a tutorial PowerPoint lecture to help the students achieve the exercises’ goals. Each exercise was focused on one aspect of the required skillset for the students and guided them through the necessary steps to find the required information for a problem in hand. Each exercise began with learning objectives and detailed instructions. These instructions also described the information to look for, expected output, time taken, troubleshooting as well as common mistakes. The instructions lose detail and become more task-oriented by the end of the term once students get confident, and accustomed to working on these databases, as well as be able to independently find the answers to the questions on their own.
Exercises scope
The scope of these exercises covered student's ability to find information about the unknown amino-acid and nucleotide sequences using BLAST, MSA, and phylogenetic tools. To find the proteins that their protein interacts with the gene and protein interaction networks, understanding the concept of open reading frames (ORFs), finding mutations, visualizing and manipulating 3D structures, finding target-associated assays, and ligand efficiency data. Students also learnt to perform protein-ligand docking to find a lead against the target protein. Students were also required to inquire the meaning of bit score and, E-value, learn about the enzyme classification system, and the database accession numbers. Students also learnt to create the heatmaps of their proteins, find bad bonds and angles in the Ramachandran plot, protein interaction data as well and perform gene ontology analysis. Last, but not least students also learnt to download and install the free statistics and graphing R programming and leverage bio3d package to perform protein analysis. Additionally, a brief introduction to R; a programming language commonly used in data analysis and science research was also included in these exercises (Fig. 1). A video module of these exercises has also been developed (Amtul et al. 2024).
Figure 1.
Schematic overview of the bioinformatics exercises and their division into five general themes.
Open-ended, self-guided assessment sheet questions
Below are samples of some of the open-ended, self-guided assessment sheet questions and hints that were built-in each exercise to enhance students’ reflection on each exercise, (i) Did the backward or mutated peptide entry give you a close match to any protein in the UniProt database? (ii) From what organism was it isolated? (iii) Why are there multiple output hits for the apparently same proteins? (iv) Which protein do you find as the close match to your unknown sequence? (v) How Clustal Omega is different from MDGA program in performing MSA? vi) Do you see any amino acid conservation patterns across organisms, (vii) Do you find a correlation between amino acid conservation patterns and protein structure. Some of the questions that made the students critically think about the impact of their searches were (viii) How many possible codons can we have, if we make a modification that excludes the possibility to have two consecutive nucleotides of the same type in DNA? Hint: Search the answer by looking at a codon chart and removing any codons that had consecutive nucleotides of the same type.
Course evaluation
To ascertain if the module let students develop a better insight and understanding of bioinformatics approaches and implications, two evaluation tools were created and used during the module, (i) the use of a post-module survey questionnaire, and (ii) students’ performances in various aspects of the course elements and activities.
Survey questionnaire
At the end of the term, when the grades were finalized, and the instructor had no control over them, all students were encouraged to participate in an online laboratory exit survey questionnaire offered via Qualtrics (https://uwindsor.ca1.qualtrics.com), a cloud-based survey software platform. In the survey, most of the questions were focused on evaluations and students’ perceptions of the course format, self-assessments, confidence in achieving particular learning outcomes, and attributes toward the learning process, and career prospects. For instance, there were 23 questions regarding the course, of which 7 of them were specifically around exercises, for example, how long they took to gather the required information in the specified database, usefulness of the bioinformatics module, acquired skill sets, level of satisfaction, and broader sense of purpose of the knowledge gained for their future career (Amtul 2021). Six of them were about students’ level of agreement with different aspects of the laboratory assessment types, class, and lab activities, six of them were specifically around the research project, and the remaining 4 of them were about student perception of the knowledge and attitudinal benefits. The students were able to select their responses and the level of satisfaction on a scale from 1 to 5, where 5 = strongly agree/high, 4 = agree/good, 3 = neutral/moderate, 2 = disagree/poor, and 1 = strongly disagree/very poor.
Student performance
Discussion board comments, worksheets, and an end-of-the-term team-oriented assignment was used to assess students’ performances in the bioinformatic laboratory module.
Discussion forum participation or comments
It consisted of bi-weekly discussion forum participation, based on students’ respectful engagement in lab activities, discussions, Q&A/brainstorming sessions, preparedness in learning bioinformatic skills, posts, and peer-to-peer review/responses in the course.
Worksheets
The formal worksheets were collected and graded every week. In worksheets, students write the answers to the questions asked while navigating the databases, as per the rubrics.
Assignment
This capstone assignment project allows students to apply what they have learned from the lab module, as one of the main drivers to gauge the impact of bioinformatic exercises and related resources on student learning, and knowledge gain in the form of final course grades. This assignment evaluates concurrently students’ ability to be able to navigate bioinformatic databases independently in the absence of any explicit instructions as well as demonstrates their learning curve in data-mining skills during the term.
Data analysis
Survey responses from the students who completed, and consented to the survey instrument, were used. There was a built-in consent within the survey questionnaire. Student responses are presented as stacked bar graphs on a five-point Likert-type scale, and/or in percentages. Student performance was defined by the student's percentage marks in grades, while 70% was the lowest passing grade.
Results
Students’ survey responses showed positive outcomes of the bioinformatic lab module on student learning, such as their working knowledge of bioinformatics concepts, methods, perception, and interest in applied bioinformatics.
Bioinformatic module overall
A great majority of the students provided positive feedback through the survey questionnaire and comments regarding the bioinformatic exercises and the way those exercises improved their knowledge and understanding about various state-of-the-art tools, algorithms, software, and programs built-in within bioinformatics databases to identify an unknown amino acid and/or nucleic acid sequence (Fig. 2–5). Many students even commented that the bioinformatics lab module was a rewarding experience and addition to the protein structure function course.
Figure 2.
Module format: Bars showing students’ level of satisfaction with different aspects of bioinformatics laboratory module, such as content, pace, workload (A), course assessments, such as discussion forum participation, worksheet, and assignment (B). Students were asked to rate the course material by utilizing a five-point Likert-type scale by choosing one of the five options strongly agree (blue), agree (green), moderate (gray), disagree (red), and strongly disagree (maroon) depicted as a divergent stacked bar graph. Results of the analysis of students’ course performance in 3 separate assessment domains; discussion forum, worksheets, and assignments. The plot shows the average performance of the two student cohorts in percentage, where error bars represent the standard deviations (C).
Figure 5.
Skill acquisition: Bars showing autonomy to strategize and execute an experiment to answer a research question of students’ choice (A), the overarching semester-long project made learning more interesting (B), the impact of bioinformatics on medicine and human health (C), mapping out objectives and methodology gave students a sense of broader purpose and ownership (D), writing the assignment required students to use scientific reasoning throughout the semester (E), students enjoyed exploring, navigating, & analyzing bioinformatic scientific data (F), prefer to introduce bioinformatic exercises separately (G), most likely will pursue further education in bioinformatics (H), and future students should perform this semester-long biomolecule-oriented lab (I). Students were asked to answer the questions by selecting yes (blue), or no (orange) as a divergent stacked bar graph.
Module format
Students agreed with different aspects and quality of the bioinformatics laboratory module. Such as content, pace, workload, and assessment (Fig. 2A). Students also liked diverse types of assessments, such as participating in discussion forums, completing the worksheets, and writing the assignments (Fig. 2B).
Students’ performance in the course
We also assessed knowledge gains concerning proteins and proteomics using different types of assessments as outlined in Fig. 2C. The overall impact of the module was found to be uniformly effective for students at all levels of assessments as determined by their overall course performance in discussion forum participation, worksheet preparation, and assignment writing (Fig. 2C).
Background and bioinformatics knowledge
A vast majority of students came from a diversified academic background, as illustrated in Fig. 3. Students’ answers to some of the questions were significantly different after exposure to the module as compared to before the module (Fig. 4). A great majority of students haven't had previous bioinformatic hands-on experience or any prior knowledge about any of the taught bioinformatics topics (Fig. 4A), that improved drastically after they were done the module (Fig. 4B). This module provided students with first-ever opportunity to have hands-on experience on important and emerging bioinformatic tools, algorithms, and programs beside boosting their confidence level. In particular, students became more confident in identifying any unknown amino acid and nucleotide sequence, homology modelling, ligand docking, performing multiple sequence alignment, phylogenetic analysis, and R programming in the order (Fig. 4C). One of the objectives was to introduce the students to understand the intricacies of using R as a programming language or R computational platform. To address the problems students may face in using the platform for the first time, we have the codes for the students as R-commands to help them get comfortable using R (Fig. 4C), however, surprisingly none of the students showed any need for it.
Figure 3.
Academic background: Pie chart showing students’ academic background in different science disciplines, including pharmacology, medicine, organic chemistry, genetic engineering, bioinformatics, dental surgery, biotechnology, biochemistry, veterinary, medical lab science, botany, and life science.
Figure 4.
Bioinformatics knowledge: Bars showing students’ knowledge about navigating different types of bioinformatic databases before (A) and after (B) taking the semester-long the bioinformatics lab module. Bars showing students’ level of confidence in performing R programming, homology modelling, protein-ligand docking, multiple sequence alignment (MSA), phylogenetic analyses, and identifying an unknown amino acid or nucleotide sequence after taking the bioinformatic lab module (C). Students were asked to rate their responses by utilizing a five-point Likert-type scale by choosing one of the five options very good/high (blue), good/high (green), poor/moderate (gray), low/very poor (red), and not at all (maroon) depicted as a divergent stacked bar graph.
Skill acquisition
It is especially interesting that students found end-of-the-term self-designed assignment, with a semester-long theme, to be helpful in promoting their learning and engagement (Fig. 5A). The documentation of a semester-long bioinformatics research project made learning more interesting (Fig. 5B), as how bioinformatics research can impact human health and medicine (Fig. 5C) and give them a better sense of control and ownership of the assignment (Fig. 5D), guided them to use scientific reasoning (Fig. 5E), as well as navigating, and analyzing scientific data (Fig. 5F). These data suggest that students’ engagement in the assignment provided practice and subsequent learning that was gained across a wide range of bioinformatics tools. This points to gains in all six categories of higher-order cognitive skills, as outlined in Bloom's taxonomy (Engelhart et al. 1956, Anderson et al. 2000). Interestingly, students didn't like the idea of a spoon-feed step-by-step instruction for performing exercises and having a semester-long project with a semester-long theme to be broken down into several independent themes and exercises. Around half of them preferred to have a bit more independence and flexibility in doing those assignments (Fig. 5G).
Education, and career prospects
Two of the questions were about students’ impressions about the usefulness of the module and its impact on their decision to pursue bioinformatics education further. Students’ responses implied that the bioinformatics module presented them with new avenues, and their insight about the module had expanded, and they would more likely explore the opportunities to pursue bioinformatic education (Fig. 5H), as well as recommend that future students should perform these exercises (Fig. 5I).
In short, the assessments suggested that the bioinformatic module achieved most of its goals. Predominantly with respect to an augmented cognizance of the presented bioinformatics tools, and to an improvement in proficiency in the application of these tools (as shown by the change in students’ responses, for example, in Fig. 4A, and 4B), better confidence in the mastery of these tools (Fig. 4C), and a superior degree of curiosity to know more about bioinformatics discipline (Fig. 5E), like the importance of bioinformatic tools and students' eagerness to try these tools to find answers to biological questions (Fig. 5).
Discussion
The blending of bioinformatics lab module within the existing outlines of the protein structure and function course got students familiarized with cross‐indexed bioinformatic databases that contain many relevant interrelated facts about protein sequences, and interactions, structural predictions, metabolic reactions, ligand interactions, and enzyme catalysis with links to metabolic maps, enzyme kinetics, ligand efficiencies, and assays data in tabulated form. The module also boosted students’ overall problem-solving abilities because students could not search for solutions to the problems in a textbook that easily. Instead, students learnt to apply the concepts learnt in the course, as well as labs, in an unfamiliar context, to a new problem in the lab. The ability to search, analyse, and gauge information or data from such databases played a critical role in students’ ongoing learning in the discipline and served as an exceptionally effective tool for those who intend to take up careers in the bioinformatics sciences. Students also get acquainted with the research aims of the bioinformatics-related sub-disciplines of structural informatics, genomics, and protein modelling.
The survey questionnaire results showed that despite being from diverse academic backgrounds, a big majority of students enjoyed the content of the lab exercises and gained enough knowledge to apply them to their end-of-the-term research assignments. Additionally, student-designed original research-themed assignments improved their quantitative analysis, critical thinking, and reasoning skills. The qualities that are highly desired by the prospective employers or research mentors (Acemoglu and Autor 2011, Börner et al. 2018, Biasi et al. 2022).
Although bioinformatics does not replace experimental enquiry and is regarded as a foundation and a predictive discipline. Student feedback on the discussion forum suggested that the students enjoyed this venture and found it as challenging as the wet labs. Students found the lab module invaluable for their educational pursuits and future prospects. In discussion board comments they further emphasized seeing a need for more such modules. Several students were motivated enough by the lab module, and considering continuing further bioinformatic education, as reflected in the below comment from one of the students:
“I was motivated enough to take some certificate courses such as Molecular Docking, Drug Design, R Programming for Bioinformatics, and Python Programming for Bioinformatics from a private institute and completed them successfully. It is definitely a great addition to my resume and will increase my chances to get the job. Your bioinformatic lab module was no doubt the first spark and my main motivation to take these courses.”
Another interesting observation was regarding the granularity level of the tutorials for the individual exercises. Almost all the students were unacquainted with bioinformatics tools, and since the ppt lecture must be completed in a 1:5 h lab session, the exercises were purposely designed to have structure, detailed instructions and specific questions to be precisely answered.
Around half of the students liked the connectivity and coherency between exercises, while the rest preferred to have individually themed exercises that are not necessarily connected. Similarly, to our surprise, more than half of the students responded that they don't like to be spoon-fed, and they would like to find the solution to the problems with more flexibility and freedom. So, we suggest a more high-level tutorial worksheet for future exercises, so it could provide relevant and necessary information and at the same time encourage students to explore the tools and familiarize themselves with different options on their own. There is ‘no one size fits all’ while designing a bioinformatics course. So, probably a mix between the two might be a more ideal solution where some of the exercises are connected and related, while some others are independent.
Thus, the proposed module could be delivered in a loosely structured and open-ended way to stimulate resourcefulness and creative thinking skills. The different competencies, heterogeneity and the considerable diversity in the backgrounds of the course participants is also important. In future, a ‘learner adaptable’ style of curriculum design can be followed based on the students’ knowledge of the subject and their expectations of the course. For example, with smaller cohorts, the more desirable format would be to teach students to come up with their hypotheses or study questions to explore bioinformatic tools.
These exercises can be modified in a number of ways, for example, to demonstrate the correlation between a translated amino acid sequence and the gene's nucleotide sequence that encodes it. Point mutations can be induced into the cDNA sequence to demonstrate the codon degeneration concept, silent mutations to demonstrate a ‘no amino acid change but a codon change phenomenon’, and nonsynonymous mutations that demonstrate how a nucleotide change encodes a different amino acid (Bali and Bebok 2015). Similarly, as a follow-up, open-ended exercise relationship between mutant amino acid and wildtype sequences, and primary structures can also illustrate the subsequent effects of these mutations on protein three-dimensional structure from an alignment of their sequences. By locating these mutations within the protein structure students can also predict the altered functionalities and resulting clinical conditions. Students then can perform the relevant primary literature search, as well as their conservation via phylogenetic analyses to verify their predictions.
Similarly, projects could be developed that extensively mix bioinformatic analyses with practical wet labs, such as the design of primers for polymerase chain reaction to encode urease gene and their application in the amplification and genotyping the urease gene containing plasmids for subsequent multiplication and protein isolation (Amtul et al. 2023).
Similarly, building effective user interfaces, as well as familiarity with hardware in future exercises, can improve students’ marketability and career prospects.
Conclusion
Overall, such hands-on, data-mining exercises equip learners to powerful bioinformatic tools and practice performing complex computational tasks in a rapidly evolving area of analytical biochemistry with reference to health and medicinal research. The conceptual questions posed through exercises teach students to use critical thinking skills for data analyses. These exercises also highlight the influence that proteomics and genomics are exerting on biochemistry as a discipline. The skill to search, describe, assess, and analyze data through these exercises will provide students an impetus to continue their education as well as pursue careers in the data sciences.
Besides students, these exercises can also be used to empower course instructors for their lack of the required teaching expertise, computer literacy, programming language, or command line, working in applied bioinformatics, and hesitancy to plunge big data analyses into their teaching (Wood and Gebhardt 2013).
Supplementary Material
Acknowledgement/Funding
We thank MMB students at Chem/Biochem department at Windsor University, who consented to the surveys and their data. We thank Molecular graphics and analyses performed with UCSF Chimera, developed by the Resource for Biocomputing, Visualization, and Informatics at the University of California, San Francisco, with support from NIH P41-GM103311. An eCampusOntario virtual learning strategy (VLS) grant (5218476) to Zareen Amtul funded this research.
Contributor Information
Zareen Amtul, Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON N9B 3P4, Canada.
Forough Firoozbakht, School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada.
Iman Rezaeian, School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada.
Arham A Aziz, Sir Wilfrid Laurier Secondary School, London, ON N6C 4W7, Canada.
Padmini Gehlaut, Department of Chemistry and Biochemistry, University of Windsor, Windsor, ON N9B 3P4, Canada.
Availability of data and materials
Within the manuscript.
Disclosure
The authors declare that they have no conflict of interest.
Author Contribution
The manuscript was written through the contributions of all authors. All authors have given approval to the final version of the manuscript.
Conflict of interest
None declared.
References
- Acemoglu D, Autor D. Skills, tasks and technologies: implications for employment and earnings. Handbook of Labor Economics. 2011;4:1043–171. [Google Scholar]
- Altschul SF, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol. 1990;215:403–10. [DOI] [PubMed] [Google Scholar]
- Amtul Z. Pedagogy of care: emerging from the crisis. Teaching Culturally and Linguistically Diverse International Students in Open or Online Learning Environments: A Research Symposium, 2021. https://scholar.uwindsor.ca/itos21/session1/session1/1/. [Google Scholar]
- Amtul Z. Creating a virtual bioinformatics lab module OER to augment protein structure function capstone courses. 2023; (Version 1.1). https://openlibrary-repo.ecampusontario.ca/xmlui/handle/123456789/1432.
- Amtul Z. Designing an in inquiry-based semester theme that integrates data science and bioinformatics methods. The Western Conference on Science Education. 2023. https://ir.lib.uwo.ca/wcse/WSCETwentyTwentyThree/fri-july-14/1/. [Google Scholar]
- Amtul Z. Active learning pedagogy: do we need to revisit?. Interchange (revision). 2024:1–20. [Google Scholar]
- Amtul Z, Seifi M, Asif ES et al. Building Industry-Inspired Medical Biotechnology Investigative Laboratories to Enhance Experiential Capstone Courses. J Chem Educ. 2023;100:1486–93. [Google Scholar]
- Amtul Z, Vuu K, Lubrick M et al. Video-Based Bioinformatics Tutorials Developed as An Open Educational Resource to Improve Students’ Understanding and Practice In Data Science Analyses. J Chem Edu. 2024;1–21. https://pubs.acs.org/doi/10.1021/acs.jchemed.4c00250. [Google Scholar]
- Anderson L, Krathwohl D, Bloom B. A taxonomy for learning, teaching, and assessing: a revision of bloom's taxonomy of educational objectives. 2000. https://www.semanticscholar.org/paper/A-Taxonomy-for-Learning%2C-Teaching%2C-and-Assessing%3A-A-Anderson-Krathwohl/23eb5e20e7985fca5625548d2ee6d781a2861d41.
- Bali V, Bebok Z. Decoding mechanisms by which silent codon changes influence protein biogenesis and function. Int J Biochem Cell Biol. 2015;64:58. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Biasi B, Ma S, Arellano-Bover J et al. The Education-Innovation Gap. 2022. https://www.nber.org/system/files/working_papers/w29853/revisions/w29853.rev0.pdf.
- Black GC, Stephan PE. Bioinformatics: recent trends in programs, placements and job opportunities. 2004.
- Black GC, Stephan PE. Bioinformatics training programs are hot but the labor market is not. Biochem Molecular Bio Educ. 2005;33:58–62. [DOI] [PubMed] [Google Scholar]
- Board NS. Science and Engineering Indicators | NCSES | NSF. Arlington, VA, USA, 2016. [Google Scholar]
- Börner K, Scrivner O, Gallant M et al. Skill discrepancies between research, education, and jobs reveal the critical need to supply soft skills for the data economy. P Natl Acad Sci USA. 2018;115:12630–7. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Desjardins J. How much data is generated each day? | World Economic Forum. World Economic Forum and Visual Capitalist. 2019. https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-day-cf4bddf29f/. [Google Scholar]
- Engelhart MD, Furst EJ, Krathwohl DR. Taxonomy of Educational Objectives: The Classification of Educational Goals. Handbook I: The Cognitive Domain. 1956. https://docs.opendeved.net/lib/JQD8WS4P.
- Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 1997;18:2714–23. [DOI] [PubMed] [Google Scholar]
- Hack C, Kendall G. Bioinformatics: current practice and future challenges for life science education. Biochem Molecular Bio Educ. 2005;33:82–85. [DOI] [PubMed] [Google Scholar]
- Newsmantraa . Global bioinformatics market to show significant growth prospects, latest revenue, business outlook, advance technology and expansions 2023–2028. Digital Journal. 2023. https://www.digitaljournal.com/pr/news/global-bioinformatics-market-to-show-significant-growth-prospects-latest-revenue-business-outlook-advance-technology-and-expansions-2023-2028. [Google Scholar]
- Pettersen EF, Goddard TD, Huang CC et al. UCSF Chimera–a visualization system for exploratory research and analysis. J Comput Chem. 2004;25:1605–12. [DOI] [PubMed] [Google Scholar]
- Rizvi SMD, Shakil S, Haneef M. A simple click by click protocol to perform docking: Autodock 4.2 made easy for non-bioinformaticians. EXCLI J. 2013;12:830–57. [PMC free article] [PubMed] [Google Scholar]
- Rosenwald AG, Russell JS, Arora G. The genome solver website: A virtual space fostering high impact practices for undergraduate biology. J Microbiol Biol Educ. 2012;13:188–90. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res. 1994;22:4673–80. [DOI] [PMC free article] [PubMed] [Google Scholar]
- U.S. Bureau of Labor Statistics . Mathematicians and statisticians: occupational outlook handbook. Offi Occupat Statis Employ Project. 2024. https://www.bls.gov/ooh/math/mathematicians-and-statisticians.htm. [Google Scholar]
- Wood L, Gebhardt P. Bioinformatics goes to school-new avenues for teaching contemporary biology. PLoS Comput Biol. 2013;9:e1003089. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1003089. [DOI] [PMC free article] [PubMed] [Google Scholar]
- Wu H, Raha O, Zhang J. Customizing bioinformatics graduate programs for diversified student backgrounds. Proceedings—Frontiers in Education Conference, FIE. 2016;2016-November. 10.1109/FIE.2016.7757506. [DOI] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
Supplementary Materials
Data Availability Statement
Within the manuscript.





