Abstract
Students completing undergraduate majors in chemistry are not typically required to undergo formal training in computer programming or coding. As a result, many chemistry students are graduating without skills in understanding, writing, or manipulating computer code. This skills gap places students at a disadvantage, considering the widespread and ever-increasing use of computers to acquire, analyze, and present data in chemical industry and research. We hypothesized the following: (1) we could introduce coding to the analytical chemistry curriculum in an accessible and discipline-focused manner and (2) tasks based on adapting existing code would be accessible even to novice coders. Presented here is an activity that teaches students to use R, a widely used programming language designed for data analysis and statistics, within the user-friendly RStudio integrated development environment. The activity uses peptide charge as a motivating bioanalytical chemistry topic. The origin and importance of peptide charge are discussed in the four modules that comprise the activity. Applications relevant to chromatography and mass spectrometry are discussed. Students complete tasks of increasing difficulty, with earlier modules supporting later ones. The activity has been taught to advanced undergraduate and first-semester graduate students. In all iterations, anonymous survey data collected using a Likert-scale questionnaire reflected that most students were not familiar with R or coding generally before completing the activity. Students reported finding the activity enjoyable, efficient, effective, and easy to use. The majority reported that they would use R/RStudio as a scientific tool in both chemistry and nonchemistry projects in the future. The activity is freely available at https://weaversd.github.io/R_with_peptides_Project/index.html.
Keywords: Upper-Division Undergraduate, Graduate Education/Research, Analytical Chemistry, Computer-Based Learning, Inquiry-Based/Discovery Learning, Bioanalytical Chemistry, Chromatography, Mass Spectrometry, Proteins/Peptides
Graphical Abstract

I. INTRODUCTION
Background and Problem Identification
Programming skills are increasingly important for chemists working in academia and industry, sectors in which under-graduate chemistry majors may ultimately find employment. A search in an online professional networking platform yields many jobs seeking applicants with a chemistry degree and experience in scientific programming, with qualification sections including phrases like “strong background in programming”, and “proficient in programming languages such as R, Matlab, [or] Python”. To train students in coding skills, chemical educators have developed diverse assignments and activities, many of which have been reported in this Journal, as listed below. In one example, coding in Matlab was introduced through a hands-on activity in which students simulate X-ray photoelectron spectra, developing understanding of quantum chemistry, spectroscopy, and coding.1 Other activities centered on coding in Matlab have been reported.2-5 Many activities centering on the Python programming language, typically implemented using Jupyter notebooks, have been reported,6-12 with applications ranging from colorimetric monitoring of titration end points13 to visualizing NMR concepts.14 An activity that teaches students how to use R for data visualization and statistical analysis in analytical chemistry and quantitative analysis has been presented at the ACS National Meeting,15,16 although the lesson has not yet been reported in a publication (as of June 2022). Activities have been reported regarding teaching students of chemistry to use LabView,17 Maple,18 Mathcad,19 and the Unix terminal20 for discipline-specific tasks. The COVID-19 pandemic necessitated distance learning and led to the development of innovative remote approaches for teaching coding in Mathematica,21 Python programming for structural bioinformatics,22 and the use of spreadsheets and Jupyter notebooks to generate simulated HPLC data,23 among others. To enrich the diversity of teaching tools available to help chemical educators support their students in learning to understand and use code, discipline-specific activities will be of value, especially if they are ready to implement and have the potential to be adapted to the instructor’s preferences. The activity reported here seeks to contribute to this robust and growing body of literature.
Context
The University of Notre Dame is a medium-sized private institution of higher education located in the midwestern United States, with an undergraduate enrollment of 8,874 and a graduate enrollment of 2,200 (Fall 2020). It has an undergraduate graduation rate of 95% and a gender distribution of 52% male, 48% female.
The computational programming skill activity was piloted over two semesters (Spring 2021 and Fall 2021) to a 300-level Analytical Chemistry undergraduate course and a first-semester Analytical Chemistry graduate course.
Research Goal and Questions
Inspired by our own use of computational tools in bioanalytical chemistry research and recognizing the utility of coding experience in industry, we hypothesized that we could introduce coding into the curriculum of an analytical chemistry course in an accessible and discipline-focused manner, and that skills in understanding and adapting existing code would be valuable to students. The goal was to design, develop, implement, and evaluate an open educational resource website containing a computational programming activity to be used as an asynchronous online teaching tool. We chose R as the computational programming language because it is free and open-source, making it consistent with our goal of creating an open educational resource that could be transferrable, scalable, practical, and essentially cost-free. R is also used in many physical, biomedical, and social science fields, suggesting that it may be of practical use to our students in their future studies or careers. A very similar activity could be developed in Matlab, which requires the purchase of a license, or in Python, which is also free and open-source. An advantage of R is the well-developed and self-contained IDE and the quality of packages available for data visualization. The pedagogical research questions we asked follow:
Was the computational programing language (R) easy to use, easy to learn, and likely to be used by students in future applications?
Was the computational programming activity enjoyable, efficient, and effective?
II. METHODOLOGY
Assignment Design
We designed an assignment in which students interact with a web-based module including multiple tabs: Resources, Set Up (instructions on downloading RStudio and performing basic operations), Background (information on proteomics and peptide charge), and steps 1–4. In the assignment, students are tasked with taking R scripts provided on the website and applying them to analytical chemistry questions. We developed the activity with the belief that, especially for novice learners, writing original code is not always necessary, whereas learning to modify existing scripts and make them suitable for the task at hand is an essential skill. The assignment therefore builds from executing existing code, through modifying and combining code, to writing original code. In all cases, the use of coding tools to solve chemical questions is emphasized.
The steps of the assignment are shown in Figure 1 and described briefly here. In step 1, students calculate the charge of a peptide by executing code that takes a peptide sequence (in one-letter abbreviations) and a pH value as arguments. The code identifies ionizable groups (the N- and C-termini and any acidic or basic side chains) and calculates net charge using input pH and pKa included in the code. The code is shown explicitly, along with a user-friendly description of what it does. Students are encouraged to read the code and identify the lines where calculations are executed to find fractional charges on acidic and basic groups. This step is a good entry point for novice coders, requiring only copying and executing existing code. It gets students thinking about the acid–base chemistry responsible for peptide charge at different values of pH. The code in step 1 is based on a publication reported in this Journal describing the use of spreadsheets to calculate the net charge of peptides and proteins as a function of pH.24
Figure 1.

Overview of the activity. The activity involves four steps of increasing complexity. The learning objectives, both chemistry-related and coding-related, are made explicit to the students.
Step 2 involves applying the peptide charge calculation used in step 1 to a problem in anion exchange chromatography. The principles of ion exchange chromatography are described in words and simple illustrations. Students are given a list of five peptides and asked to predict which peptide(s) would elute in buffers of various pH values (10, 7, and 3). Completing this task requires that students apply the peptide charge function from step 1 to five new peptides in three different pH buffers. They apply chemical reasoning to decide what will elute and what will be retained. This step thus builds on the previous one in coding skills and chemistry understanding.
Step 3 is a two-part exercise in which complexity continues to build. In step 3a, students are given a list of 52 peptides (the output of a hypothetical proteomics experiment) and instructed that the research question is to determine the charge at pH 7 of these peptides. Having completed steps 1 and 2, students know a function that can return peptide charge at a given pH, but executing this code for each peptide individually would be time-consuming. Students are guided through a process of automating these calculations. First, they are given code that imports peptide sequences and stores them as a list. Then, they are provided with code that provides a function related to their task of interest: calculating the number of amino acids in a peptide sequence. The script applies a “for” loop to a list of peptides, calculating and displaying the length of each. Students are instructed to execute this code and figure out how to modify the code so that charge rather than length is calculated and displayed. Recognizing that this is the most challenging task in the assignment so far, students are given prompts to help them think through what the code does (“what are the inputs and outputs?”), what they want the code to do (“what inputs do we have, and what outputs do we expect?”), and what parts of the existing code to retain or to change. For each prompt, there is a hint hidden behind a pull-down menu that students can access if they want additional guidance. In step 3b, students see a brief description of MALDI-TOF MS, an important analytical application that depends on peptide mass and charge (as the m/z value). They are given code for two published R functions. One takes a peptide sequence and returns the mass of the peptide in Da. The second takes mass and charge values and returns m/z at a given pH. Students execute the code as given and then figure out how to assemble a function that takes a list of peptides and returns m/z as it would appear in a MALDI experiment. As before, students are guided through examining the code and given a hint to follow if they get stuck.
Finally, step 4 invites students to write new code to solve three questions related to concepts discussed in steps 1–3. In step 4a, students write a function called “peptide_summary” that takes as inputs a peptide and pH and prints all the peptide attributes they have learned to calculate in the earlier steps (charge at given pH, mass, m/z at given pH, and expected m/z in MALDI-MS). Step 4b introduces graphing, asking students to plot charge vs pH for a peptide of their choice. An example plot, for peptide DIAK, is shown in Figure 2. The students are provided with pseudocode listing major tasks the code must execute and helpful functions for manipulating vectors. The challenge in step 4c is to write a function that calculates the isoelectric point (the pH at which a protein or peptide has a net charge of 0). Students are directed to a publication describing an isoelectric point calculator25 that uses an intuitive “guess and check” methodology to accomplish the goal.
Figure 2.

Charge vs pH plot generated in step 4. Students write code to produce similar plots for peptides of their choosing.
III. IMPLEMENTATION WITH STUDENTS
Students were given online access to the activity at https://weaversd.github.io/R_with_peptides_Project/index.html. Two weeks before the assignment was due, the activity was introduced during class. The students were shown a brief Powerpoint presentation (SI Figure 1) providing motivation for the manipulation of large data sets. Genomics, transcriptomics, and proteomics were mentioned as areas of active research in the analytical sciences in which programming skills are useful, if not required. The presentation introduced students to R and RStudio, including a description of the four main windows of the RStudio integrated development environment (IDE): R Code Editor, R Console, Variables and Functions, and Files and Plots. Students were encouraged to bring a laptop computer to class, and the remaining time was available for students to start the activity by downloading R and RStudio, installing the “stringr” package, reading the Background Information section of the activity, and beginning work on step 1 (“Copy and paste R code to calculate the charge on a peptide”). Teaching staff were available for consultation to lower the barrier to entry and mitigate the anxiety of students unfamiliar with coding. In one iteration of the course (Spring 2021), the COVID-19 pandemic necessitated physical distancing in classrooms, so the introduction took place in a virtual video classroom. In another iteration (Fall 2021), physical distancing was relaxed, and the introduction was done in person. The activity itself is a free and open asynchronous digital activity that can be completed by the student at a time and location of her/his choosing. It could also be modified to suit the preferences of the instructor. The Powerpoint presentation, screen shots of the activity, instructions as given to students, and examples of finished work are available in the SI.
Students submitted their work electronically to the course’s online learning management system (hard-copy submission would be another option). Work was scored for completeness and effectiveness, and not for efficiency of the code’s operation. This grading system was based on the belief that, for beginners, obtaining functional code and understanding why it works are more important than writing optimized code.
IV. RESULTS AND DISCUSSION
Student Perception
During the first pilot in the undergraduate course, the activity was given to 12 students as an extra credit opportunity. Of the students, 75% (n = 9) completed the assignment, and of those, 89% (n = 8) returned the postassignment survey. The survey tool was approved by an IRB. Students reported minimal prior experience with R and RStudio, with 75% (n = 6) reporting “No Experience”, 12.5% (n = 1) self-identifying as “Beginner”, and 12.5% (n = 1) describing their experience as “Intermediate”. This group of undergraduate students had a wide range of prior experience with coding in general, with one student reporting “No Experience”, one student identifying as “Expert”, and the rest (n = 6) self-reporting as “Beginner”. Responding to a Likert-scale survey, students reported finding the activity effective in illustrating how coding in R can be used to solve problems relevant to analytical chemistry (4.75/5.00), efficient in its design (4.88/5.00), and enjoyable (4.63/5.00). Students reported finding RStudio easy to use (4.38/5.00) and easy to learn (4.50/5.00). Further, they report that after completing the activity they are more likely to consider RStudio as a tool for problem-solving in analytical chemistry (4.13/5.00) and other areas of study (4.25/5.00). In response to the open-field prompt “In offering this activity to students in the future, I would recommend that…”, one student suggested: “include a page on the assignment webpage that shows the basic commands and what they do.” In response to this suggestion, we added three links to the Resources page to provide resources to students that were stuck or simply wanted to learn more. Another student offered: “I would recommend there be more usage of R in the course. Having just one exposure was great, but I would have liked to learn more through other assignments and be able to apply it myself.” In response to the prompt “Aspects of the activity that I liked were…”, one student said “I like how each of the assignments builds off of the previous one. Like the first one started off with finding the charge and then using that for electrophoresis and then for calculating m/z. It’s cool to see how important the basic calculation of the charges of amino acid is for many different analytical techniques.” A summary of the student survey responses is found in Figure 3. The survey questions and complete survey data for the undergraduate-level course are available in SI.
Figure 3.

Undergraduate student survey results. (A) Student scoring of R/RStudio after assignment completion. (B) Student scoring of the assignment itself. All scores were on a scale of 1–5, with 5 being the best and 1 being the worst. Average scores are reported (n = 8).
During the second pilot, the activity was given as a required assignment in a first-semester graduate Analytical Chemistry course with 7 students. All students completed the activity; 86% (n = 6) returned the postassignment survey. The same survey questions were used. Students reported their prior level of familiarity with R and RStudio as “No Experience” (50%), “Beginner” (33%), or “Intermediate” (17%). Students in this group had slightly more experience than the undergraduates with coding in general, with 50% self-reporting as a Beginner, 17% as Intermediate, and 33% as Advanced (no students self-identified as “No Experience” or as “Expert”). Students found the activity effective (5.00/5.00), efficient (4.83/5.00), and enjoyable (4.83/5.00). Students reported finding RStudio easy to use (4.50/5.00) and easy to learn (4.17/5.00). On the question “After completing this activity, I am more likely to consider RStudio as a tool for problem-solving in analytical chemistry”, most students (83%) responded with Agree or Somewhat Agree while 17% Disagreed. In response to the prompt “Aspects of the activity that I liked were…”, one student commented: “I liked the layout of the activity and the progression to having us eventually write our own code. I also liked how the activity applied to analytical chemistry, as I feel a lot of tutorials for coding are very generalized.” Another student wrote: “Adjusting existing code to your research problem is very helpful and also very close to how coding can be applied in real life since there are numerous sources with prewritten code for very specific or more generic problems.” A summary of the student survey responses is found in Figure 4. Complete survey data for the graduate-level course are available in SI.
Figure 4.

Graduate student survey results. (A) Student scoring of R/RStudio after assignment completion. (B) Student scoring of the assignment itself. All scores were on a scale of 1–5, with 5 being the best and 1 being the worst. Average scores are reported (n = 6).
V. CONCLUSIONS AND INSTRUCTOR REFLECTION
On the basis of student engagement, success in completing the activity, and survey responses, we conclude that teaching simple coding in R using questions specific to analytical chemistry course content achieved our goal of engaging novice coders. Aspects of the activity that supported pedagogical success include the following: (1) incremented task difficulty, with more challenging steps building on easier ones; (2) explicit instructions with example code that students could emulate and modify; (3) tasks based on course-relevant topics (chromatography and mass spectrometry), enabling students to connect coding tasks to prior course-specific learning; (4) emphasis on applications in analytical chemistry in which coding skills might be deployed; and (5) transition from an introduction when teaching staff were available for help to greater independence.
Some differences between the undergraduate and graduate populations are worth mentioning: the graduate students reported more experience with coding and were less likely to use R for future tasks. We interpret these data (along with direct informal communication with students) to reflect that students favor the coding language(s) with which they are familiar.
In the future, we intend to expose students to coding in R earlier and more often. In both pilot studies, the activity stood alone, and students commented that an earlier introduction and more sustained exposure would have been beneficial. Appropriately designed activities spread over a semester-long course would reinforce students’ skills in finding/adapting existing code and in writing code de novo, while connecting coding to more course content than in the activity reported here. Coding activities directly pegged to concepts in statistics, sample preparation, calibration, equilibrium, electrochemistry, and spectroscopy (topics typically covered in the analytical curriculum) can be envisioned. These changes will be implemented in future courses.
The activity presented here could be used in an analytical chemistry course. With minor modification, it would fit well into a course in biochemistry, bioanalytical chemistry, separation science, bioinformatics, or ‘omics. The activity could be implemented as a stand-alone 1–2 day long module, as described here. It could be offered as an extra credit or enrichment activity, outside the regular curriculum, or it could support a larger set of activities to build coding skills among chemistry students more broadly.
Supplementary Material
The Supporting Information is available at https://pubs.acs.org/doi/10.1021/acs.jchemed.2c00395.
Powerpoint presentation, screen shots of the activity, activity instructions, examples of student work, and survey instrument and results (PDF, DOCX)
ACKNOWLEDGMENTS
The authors thank members of the Whelan research group who tested an early version of the activity. The activity benefitted from the input of attendees at the Midwestern Universities Analytical Chemistry Conference (MUACC) held in 2021 at The Ohio State University; this input and the efforts of the organizers are appreciated. The authors particularly wish to acknowledge and thank the students that completed this activity and provided valuable feedback. S.D.W. is a fellow of the Chemistry-Biochemistry-Biology Interface (CBBI) Program at the University of Notre Dame, supported by training grant T32GM075762 from the National Institute of General Medical Sciences. The authors acknowledge Julia Lowndes, who created the open-source template used in the creation of the assignment website (https://openscapes.org/approach-guide).
Footnotes
The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute of General Medical Sciences or the National Institutes of Health.
The authors declare no competing financial interest.
Contributor Information
Simon D. Weaver, Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States; Integrated Biomedical Sciences Graduate Program, University of Notre Dame, Notre Dame, Indiana 46556, United States
G. Alex Ambrose, Kaneb Center for Teaching & Learning, University of Notre Dame, Notre Dame, Indiana 46556, United States.
Rebecca J. Whelan, Department of Chemistry and Biochemistry, University of Notre Dame, Notre Dame, Indiana 46556, United States; Department of Chemistry, University of Kansas, Lawrence, Kansas 66045, United States
REFERENCES
- (1).Fisher A. An Introduction to Coding with Matlab: Simulation of X-ray Photoelectron Spectroscopy by Employing Slater’s Rules. J. Chem. Educ 2019, 96 (7), 1502–1505. [Google Scholar]
- (2).Marlowe J; Tsilomelekis G. Accessible and Interactive Learning of Spectroscopic Parameterization through Computer-Aided Training. J. Chem. Educ 2020, 97 (12), 4527–4532. [Google Scholar]
- (3).Arrabal-Campos F; Cortes-Villena A; Fernandez I. Building ″My First NMRviewer″: A Project Incorporating Coding and Programming Tasks in the Undergraduate Chemistry Curricula. J. Chem. Educ 2017, 94 (9), 1372–1376. [Google Scholar]
- (4).Pierce K; Schale S; Le T; Larson J. An Advanced Analytical Chemistry Experiment Using Gas Chromatography-Mass Spectrometry, MATLAB, and Chemometrics To Predict Biodiesel Blend Percent Composition. J. Chem. Educ 2011, 88 (6), 806–810. [Google Scholar]
- (5).Fisher A. Developing the Chemist’s Inner Coder: A MATLAB Tutorial on the Stochastic Simulation of a Pseudo-First-Order Reaction. J. Chem. Educ 2020, 97 (5), 1476–1480. [Google Scholar]
- (6).Srnec M; Upadhyay S; Madura J. A Python Program for Solving Schrodinger’s Equation in Undergraduate Physical Chemistry. J. Chem. Educ 2017, 94 (6), 813–815. [Google Scholar]
- (7).Kurniawan O; Koh L; Cheng J; Pee M. Helping Students Connect Interdisciplinary Concepts and Skills in Physical Chemistry and Introductory Computing: Solving Schrodinger’s Equation for the Hydrogen Atom. J. Chem. Educ 2019, 96 (10), 2202–2207. [Google Scholar]
- (8).Green M; Chen X. Data Functionalization for Gas Chromatography in Python. J. Chem. Educ 2020, 97 (4), 1172–1175. [Google Scholar]
- (9).Menke E. Series of Jupyter Notebooks Using Python for an Analytical Chemistry Course. J. Chem. Educ 2020, 97 (10), 3899–3903. [Google Scholar]
- (10).Lafuente D; Cohen B; Fiorini G; Garcia A; Bringas M; Morzan E; Onna D. A Gentle Introduction to Machine Learning for Chemists: An Undergraduate Workshop Using Python Notebooks for Visualization, Data Processing, Analysis, and Modeling. J. Chem. Educ 2021, 98 (9), 2892–2898. [Google Scholar]
- (11).Dickson-Karn N; Orosz S. Implementation of a Python Program to Simulate Sampling. J. Chem. Educ 2021, 98 (10), 3251–3257. [Google Scholar]
- (12).De Haan D; Schafer J; Gillette E. Using a Modular Approach to Introduce Python Coding to Support Existing Course Learning Outcomes in a Lower Division Analytical Chemistry Course. J. Chem. Educ 2021, 98 (10), 3245–3250. [Google Scholar]
- (13).Tan S; Naraharisetti P; Chin S; Lee L. Simple Visual-Aided Automated Titration Using the Python Programming Language. J. Chem. Educ 2020, 97 (3), 850–854. [Google Scholar]
- (14).Sengupta I. Illustrating Elementary NMR Concepts through Simple Interactive Python Programs. J. Chem. Educ 2021, 98 (5), 1673–1680. [Google Scholar]
- (15).Vesto J; Cass D; Fry J. Implementation of R for Teaching Quantitative Chemical Data Analysis. In American Chemical Society National Meeting; 2020. [Google Scholar]
- (16).Vesto J; Cass D; Fry J. Developing R coding resources for upper-division undergraduate coursework. In American Chemical Society National Meeting; 2020. [Google Scholar]
- (17).Jensen M. Using Web-Based Resources To Incorporate LabVIEW into an Instrumental Analysis Course. J. Chem. Educ 2009, 86 (4), 525–527. [Google Scholar]
- (18).Montgomery J; Mazziotti D. Maple’s Quantum Chemistry Package in the Chemistry Classroom. J. Chem. Educ 2020, 97 (10), 3658–3666. [Google Scholar]
- (19).Hoyer C; Kegerreis J. A Primer in Monte Carlo Integration Using Mathcad. J. Chem. Educ 2013, 90 (9), 1186–1190. [Google Scholar]
- (20).Scalfani V. Using NCBI Entrez Direct (EDirect) for Small Molecule Chemical Information Searching in a Unix Terminal. J. Chem. Educ 2021, 98 (12), 3904–3914. [Google Scholar]
- (21).Cahill ST; Bergstrom Mann PE; Worrall AF; Stewart MI Remote Teaching of Programming in Mathematica: Lessons Learned. J. Chem. Educ 2020, 97 (9), 3085–3089. [Google Scholar]
- (22).Engelberger F; Galaz-Davison P; Bravo G; Rivera M; Ramirez-Sarmiento C. Developing and Implementing Cloud-Based Tutorials That Combine Bioinformatics Software, Interactive Coding, and Visualization Exercises for Distance Learning on Structural Bioinformatics. J. Chem. Educ 2021, 98 (5), 1801–1807. [Google Scholar]
- (23).Perri M. Online Data Generation in Quantitative Analysis: Excel Spreadsheets and an Online HPLC Simulator Using a Jupyter Notebook on the Chem Compute Web site. J. Chem. Educ 2020, 97 (9), 2950–2954. [Google Scholar]
- (24).Sims P. Use of a Spreadsheet To Calculate the Net Charge of Peptides and Proteins as a Function of pH: An Alternative to Using ″Canned″ Programs To Estimate the Isoelectric Point of These Important Biomolecules. J. Chem. Educ 2010, 87 (8), 803–808. [Google Scholar]
- (25).Kozlowski L. IPC - Isoelectric Point Calculator. Biology Direct 2016, 11, 55. [DOI] [PMC free article] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.
