Skip to main content
Sage Choice logoLink to Sage Choice
. 2025 May 15;31(3-4):73–78. doi: 10.1177/14690667251339718

Prime mass amino acids: A new numbers based classification of significance to mass spectrometry and protein biology

Kevin M Downard 1,
PMCID: PMC12314211  PMID: 40370108

Abstract

The nominal mass of amino acid residues calculated from their elemental compositions are defined by prime numbers far more often than chance, and such residues appear to play an important role in the formation and biology of proteins. It is proposed therefore that consideration be given to classifying prime mass amino acids as such, beyond the more common, familiar definitions associated with the other physicochemical properties of amino acids including charged or non-charged, hydrophobic or hydrophilic, polar or non-polar, acidic or basic, aliphatic or aromatic. Greater focus could also be given to such residues during peptide and protein sequencing with mass spectrometry and the construction of structural maps, given their predominantly hydrophobic character and thus their role in protein folding and transmembrane domains. The use of prime numbers to define amino acids based on the sum of the atomic masses from their elemental compositions invokes other recent interest and observations whereby prime numbers were organised in a way that mirrors electrons arranged within the orbitals of an atom. The article links number theory with both the physical and biological sciences, and mass spectrometry, for the first time.

Keywords: Amino acid, mass, prime numbers, primes, proteins

Introduction

The practice of mass spectrometry is underpinned by physical phenomena and processes but subsequent analysis is dominated by an association with, and an interpretation of, numbers. These numbers are connected to the mass of matter, detected in the form of charged ions that are resolved and distributed across a spectrum.

The study of numbers is a central branch of pure mathematics 1 that dates back centuries. 2 Number theorists are particularly interested in the study prime numbers, often just referred to as primes. By definition, all prime numbers, with the exception of the number 2, are odd since they are only divisible by the number one and themselves, without any remainder. Prime numbers have thus been associated with uniqueness, individualism, and creativity.

Prime numbers are no longer the exclusive domain of the mathematical sciences and have long received attention in relation to their prevalence in the natural world. Considered unique as they cannot be defined by other factors, there are now many applications of prime numbers and theorems that have a documented real world significance. 2

Previous work has sought to define amino acids by a unique prime number. 3 In molecular biology, the nucleotide bases of DNA or RNA can be defined by numbers rather than letters by making use of zero, one and the low primes two and three. Combinations of these “nucleotide numbers” (defined as A = 0, C = 1, U or T = 2, G = 3) comprise quartets that represent the genetic code. Each of the 64 codons is then defined by a function z = f (i,j,k) with i,j,k representing the first, second and third base of the codon respectively. Recognizing the redundancy in the genetic code, associated with bases at the third position, the amino acids were each assigned to an odd prime number according to values for the first two bases using a Diophantine equation.

In an alternate strategy, a complex prime numerical representation (CPNR) for the twenty common amino acids was conceived using the first twenty primes between 2 to 67. Each was uniquely assigned to one of the amino acids based on the number of codons that encode each residue and their relation to the differences between each of the prime numbers in this range. 4

Shcherbak reported5,6 that amino acids can be divided into two groups based on the nucleotide triplets that encode them, and noted arithmetic regularities in the sum of the nominal masses of the backbones for free amino acids, and that for their side chains, in each group. Interestingly, these were found to be multiples of the prime integer 37. 6

Here, it is proposed that the amino acids be simply classified based solely upon their residue molecular masses. Residue masses comprise that for each amino acid minus the mass of a molecule of water to reflect the repeating segment (-NH–CH(R)–C(=O)- ; where R denotes the side chain group) that make up a peptide or protein chain. Almost one half (45%) can be defined by prime numbers, and these residues are considered in terms of their physicochemical properties in the context of protein biology.

Elemental considerations and classifying amino acids based on mass

The monoisotopic (comprised of the lightest isotopes of each element) and nominal mass for each of the 20 amino acid residues that form part of a peptide or protein chain are listed in Table 1. These masses are summed together when the residues are combined into a peptide and protein sequence. Masses for terminal (N and C) groups are added, along with those for the charge-contributing atoms where their ionized counterparts are detected within a mass spectrometer (MS).

Table 1.

Amino acid integer residue masses and those which are prime.

amino acid symbol elemental composition monoisotopic mass nominal mass a odd / even prime hydrophobicity b
glycine G C2H3NO 57.0216 57 odd x −0.4
alanine A C3H5NO 71.0371 71 odd 1.8
serine S C3H5NO2 87.0320 87 odd x −0.8
proline P C5H7NO 97.0528 97 odd −1.6
valine V C5H9NO 99.0684 99 odd x 4.2
threonine T C4H7NO2 101.0477 101 odd −0.7
cysteine C C3H5NOS 103.0092 103 odd 2.5
isoleucine I C6H11NO 113.0841 113 odd 4.5
leucine L C6H11NO 113.0841 113 odd 4.2
aspartic acid D C4H5NO3 115.0270 115 odd x −3.5
glutamic acid E C5H7NO3 129.0426 129 odd x −3.5
methionine M C5H9NOS 131.0405 131 odd 1.9
histidine H C6H7N3O 137.0589 137 odd −3.2
phenylalanine F C9H9NO 147.0684 147 odd x 2.8
tyrosine Y C9H9NO2 163.0633 163 odd −1.3
asparaginea N C4H6N2O2 114.0429 114 even x −3.5
glutaminea Q C5H8N2O2 128.0586 128 even x −3.5
lysinea K C6H12N2O 128.0950 128 even x −3.9
argininea R C6H12N4O 156.1011 156 even x −4.5
tryptophana W C11H10N2O 186.0793 186 even x −0.9
a

residues with even nominal masses are shaded.

b

according to the values of reference. 7

These residue masses are used extensively in mass spectrometry to interpret peptide and protein MS data and to decipher tandem (MS/MS) mass spectra in peptide 7 and protein sequencing. 8 In such experiments, peptides or proteins are cleaved along the backbone to give rise to ion fragments which differ by amino acid residue masses. The integer portion of this mass is of primary importance, while the fractional mass is used in conjunction as confirmatory evidence, to a degree subject to the mass accuracy achieved in the experiment, if the fragment masses first differ by one of these integers. The mass differences allow these fragments to be assembled in order across the spectrum to enable the sequence to be read. A peptide's fractional mass alone can also be of value in the correct assignment of its amino acid composition 9 to aid the sequencing process.

Noteworthy is that for all but five residues, the amino acids have odd integer nominal masses. In a random number distribution, one would expect half (or 10) of the residues to be odd and the other half even. Those with even masses, namely asparagine, glutamine, lysine, arginine and tryptophan, all contain one or two free amine (-NH2) groups. Since the atomic integer masses for the dominant isotopes of carbon, nitrogen, oxygen and sulphur are all even (12, 14, 16 and 32 respectively), and all the even mass residues contain an even number of hydrogen atoms (6, 8, 10 or 12), their nominal mass is also even (Table 1).

The odd integer residue masses for the remaining 15 amino acids (Table 1) are exclusively associated with the odd number of hydrogen atoms they contain; either 3, 5, 7, 9, or 11. Among the amino acids with odd integer masses, the majority of residues (9 of 15) have residue masses that are also primes. These correspond to amino acids alanine, proline, threonine, cysteine, isoleucine, leucine, methionine, histidine and tyrosine. These nine residues contain either 5, 7, 9 or 11 hydrogen atoms.

The sum of the total number of hydrogen atoms across these prime mass residues is a prime number; namely 71. The sum of the number of carbon atoms in these residues is also a prime (47), and the same is true for the atoms of nitrogen and oxygen (both 11), and sulphur (2). Thus the combined prime mass residues then have an elemental composition of C47H71N11O11S2 (Table 2).

Table 2.

Prime and non-prime mass amino acid residues grouped in order of increasing nominal mass.

prime mass amino acid residues non-prime mass amino acid residues
amino acid symbol elemental composition prime nominal mass amino acid symbol elemental composition non-prime
nominal mass
alanine A C3H5NO 71 glycine G C2H3NO 57
proline P C5H7NO 97 serine S C3H5NO2 87
threonine T C4H7NO2 101 valine V C5H9NO 99
cysteine C C3H5NOS 103 asparagine N C4H6N2O2 114
isoleucine I C6H11NO 113 aspartic acid D C4H5NO3 115
leucine L C6H11NO 113 glutamine Q C5H8N2O2 128
methionine M C5H9NOS 131 lysine K C6H12N2O 128
histidine H C6H7N3O 137 glutamic acid E C5H7NO3 129
tyrosine Y C9H9NO2 163 phenylalanine F C9H9NO 147
sum C47H71N11O11S2 1029 a arginine R C6H12N4O 156
tryptophan W C11H10N2O 186
sum C60H86N18O18 1346
a

mass sum is not prime.

This is not the case for the non-prime odd mass residues which have a sum total of 38 hydrogen, 28 carbon, and 6 nitrogen atoms respectively. Only the total number of oxygen atoms is prime (11). When combined with the non-prime, even integer mass residues, the combined residues then have an elemental composition of C60H86N18O18 (Table 2). Here the number of atoms across all elements is even and non-prime, as is the total mass of 1346.

Prime and non-prime mass amino acids; a new classification dimension

The observations above have led to a new classification and grouping of amino acids based on their prime integer residue mass where the elemental composition of each, and in combination, is shown in Table 2. These findings arose from a separate study that only considered their fractional mass to aid the assignment of a peptide's amino acid composition. 9

There are a total of 26 prime numbers between integer values from 57 and 186, reflecting the residue masses for glycine and tryptophan respectively, so that 20% of all 130 natural numbers in this range are primes. Yet considerably more amino acids of the twenty amino acids found in nature (9 out of 20 or 45%) have residue masses in this range that are primes. Thus prime numbers appear in the masses of amino acid residues over twice (x 2.25) more often than by chance. If one considers all possible nominal masses in this integer mass range (57 to 186) that are produced from chemical compositions CxHyNzOmSn between that for glycine (C2H3NO) to tryptophan (C11H10N2O), where the number of hydrogen atoms is odd, 20 out of 49 or just over 40% are prime. 10 These observations suggest that prime integer mass residues may hold some particular importance in biology, hitherto unreported.

Prime mass residue prevalence, properties and their role in protein folding and biosynthesis

All aliphatic residues, bar valine, have prime integer masses. Both of the sulphur-containing aliphatic residues cysteine and methionine are also primes. Proline, the only residue with an aliphatic cyclic side chain coupled back to the nitrogen of the backbone, also has a prime integer mass as does histidine that also contains a cyclic side chain. Tyrosine too, with a cyclic phenol ring as its side chain, has a prime integer mass.

All acidic (-CO2H containing) and basic residues with free amine (NH2) groups, including tryptophan, have non-prime masses. The simplest amino acid, glycine, without any side chain at all, is non-prime.

Interestingly, all of the prime mass residues described above tend to be hydrophobic (Table 1) with positive or low negative hydrophobicity values with respect to their measured properties within proteins. 11 Such residues play a vital role in the formation of proteins by excluding water from the core of the structure. The least hydrophobic residues of this set, histidine and proline, are essential to protein biosynthesis and protein folding. 12 Cysteine, another prime mass residue, also performs a vital function in protein folding by means of bridging like residues. Hydrophobic residues are more important than hydrophilic ones to protein structural stability 13 and have even been proposed to play a major role in the development of foldamers 14 in a protein-first model of biology. 15

Within the human proteome, the prime residue leucine (9.8%) is the most frequent. 16 Leucine is among eight of the amino acids which is expressed by four codons, four of which are prime (A, L, P, T). Since there are 61 sense codons for only 20 amino acids, most amino acids are encoded by more than a one codon. Furthermore, synonymous codons for the same amino acid do not appear with equal frequencies in genomes, a phenomenon termed codon usage bias. It has been reported that he codon usage within highly expressed genes has been selected during the evolution of species to maintain the efficiency of global protein translation. 17

When the percentage of each residue across all known proteins of the human proteome (Uniprot release May 2024) 16 is plotted against their codon frequency, the values are most often aligned for the prime versus non-prime residues (Figure 1). All prime (circled) residue values lie along the x = y intersecting line, while the acidic and basic non-prime residues are the most diverged from this line. In other words, prime mass residues are associated with genes that, overall, exhibit neither a marked increase nor decrease in translation efficiency, and thus maintain their importance during protein expression irrespective of other evolutionary events.

Figure 1.

Figure 1.

Plot of percentage of each amino acid residue across all known proteins of the human proteome versus their codon frequency.

Five of the nine prime mass residues (histidine, isoleucine, leucine, methionine, and threonine) are among only nine in total that are not synthesized by mammals and are therefore essential to our diet. 18 It is thought that this loss of function occurred early in the evolution of metazoa when animals saved energy by avoiding their need to express them internally. Further, children don’t express phenylalanine hydroxylase, an enzyme required to synthesize tyrosine from phenylalanine, making the prime residue tyrosine another essential amino acid in early human life. Thus prime mass residues make up a total of two thirds of the essential amino acids that are vital for the body to maintain a healthy nitrogen balance.

Conclusions

Overall, one can argue that, from both a molecular and biological perspective, the importance of prime mass amino acid residues could be acknowledged by their classification as such. Beyond the common, familiar definitions and measures used to define amino acid residues, such as charged or non-charged, hydrophobic or hydrophilic, polar or non-polar, acidic or basic, aliphatic or aromatic, most of which vary in their assignment according to different measurement scales (proline, for instance, have been considered to be hydrophobic or hydrophilic by different measures), an equally valid molecular description, evoking the concept of mass, is prime or non-prime (Figure 2).

Figure 2.

Figure 2.

Venn diagram showing relative position of prime mass residues among all others based upon their physicochemical properties.

From the standpoint of tandem mass spectra,7,8 mass differences among fragment ions that are prime could receive greater attention and focus in terms of both deciphering peptide and protein sequences, and translating the sequence information onto a protein structural map. The propensity of prime mass residues to occupy hydrophobic domains may assist with identifying transmembrane and protein interaction domains.

The use of prime numbers to define amino acids based on the sum of the atomic masses from their elemental compositions invokes other recent interest and observations that demonstrate that prime numbers can be organised in a way that mirrors electrons arranged in orbitals within an atom. 19 Finally, the author argues that the question posed in a recent published article 20 should be answered confidently in the affirmative.

Acknowledgements

The author thanks Elisabeth Gasteiger of the Swiss Institute of Bioinformatics for providing the proteome data 16 used to compile Figure 1.

Footnotes

ORCID iD: Kevin M Downard https://orcid.org/0000-0003-0904-4141

Author contributions: The named author conducted all work in described in the manuscript.

Data

All data used in the study is available and open to all readers in the public domain.

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding: The author(s) received no financial support for the research, authorship, and/or publication of this article.

References

  • 1.Hardy GH, Wright EM. An introduction to the theory of numbers. 6th Edition. Oxford, UK: Oxford University Press, 2008. [Google Scholar]
  • 2.Curtis M, Tularam GA. The importance of numbers and the need to study primes: the prime questions. J Math Stats 2011; 7: 262–269. [Google Scholar]
  • 3.Yan JF, Yan AK, Yan BC. Prime numbers and the amino acid code: analogy in coding properties. J Theor Biol 1991; 151: 333–341. [DOI] [PubMed] [Google Scholar]
  • 4.Chen D, Wang J, Yan Met al. et al. A Complex prime numerical representation of amino acids for protein function comparison. J Comp Biol 2016; 23: 669–677. [DOI] [PubMed] [Google Scholar]
  • 5.Shcherbak VI. Twenty canonical amino acids of the genetic code: the arithmetical regularities, part I. J Theor Biol 1993; 162: 399–401. [DOI] [PubMed] [Google Scholar]
  • 6.Shcherbak VI. Sixty-four triplets and 20 canonical amino acids of the genetic code: the arithmetical regularities. Part II. J Theor Biol 1994; 166: 475–471. [DOI] [PubMed] [Google Scholar]
  • 7.Steen H, Mann M. The abc's (and xyz's) of peptide sequencing. Nature Rev Mol Cell Biol 2004; 5: 699–711. [DOI] [PubMed] [Google Scholar]
  • 8.Medzihradszky KF, Chalkley RJ. Lessons in de novo peptide sequencing by tandem mass spectrometry. Mass Spec Rev 2015; 34: 43–63. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Downard KM, Cody RB. Amino acid composition determination from the fractional mass of peptides. J Mass Spectrom 2024; 59: e5089. [DOI] [PubMed] [Google Scholar]
  • 10.Rozenski J. Elemental Composition Calculator v1.0, http://rna.rega.kuleuven.be/masspec/elcomp.htm (1999).
  • 11.Wimley W, White S. Experimentally determined hydrophobicity scale for proteins at membrane interfaces. Nat Struct Mol Biol 1996; 3: 842–848. [DOI] [PubMed] [Google Scholar]
  • 12.Sun Y, Li C, Wang Jet al. et al. The effect of histidine behaviours on the structural properties of Aβ(1–42) peptide in protonation stage one, two, and three. Phys Chem Chem Phys 2023; 25: 18346–18353. [DOI] [PubMed] [Google Scholar]
  • 13.Pace CN, Fu H, Fryar KL, et al. Contribution of hydrophobic interactions to protein stability. J Mol Biol 2011; 403: 514–528. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Guseva E, Zuckermann RN, Dill KA. Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers. Proc Natl Acad Sci USA 2017; 114: E7460. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Kocher C, Dill KA. Origins of life: first came evolutionary dynamics. QRB Discov 2023; 4: e4. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Frumkin I, Lajoie MJ, Gregg CJ, et al. Codon usage of highly expressed genes affects proteome-wide translation efficiency. Proc Natl Acad Sci USA 2018; 115: E4940–E4949. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Based on data for all human (homo sapien) proteins of the UniProtKB database (release May 2024).
  • 18.Wu G. Amino acids, biochemistry and nutrition. 2nd Ed. Boca Raton, FL U.S.A.: CRC Press, 2011. [Google Scholar]
  • 19.Folming M. Prime numbers and atomic structure: A pattern revealed. Quantastic J 2024. https://medium.com/the-quantastic-journal/prime-numbers-pattern-77f96921c85b. [Google Scholar]
  • 20.Loconsole M, Regolin L. Are prime numbers special? Insights from the life sciences. Biol Direct 2022; 17: 11. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from European Journal of Mass Spectrometry (Chichester, England) are provided here courtesy of SAGE Publications

RESOURCES