Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2021 Jul 1.
Published in final edited form as: Forensic Sci Int Genet. 2020 May 22;47:102309. doi: 10.1016/j.fsigen.2020.102309

Age-Related Changes in Hair Shaft Protein Profiling and Genetically Variant Peptides

Tempest J Plott a,b,1, Noreen Karim b,1, Blythe P Durbin-Johnson c, Dionne P Swift d, R Scott Youngquist d, Michelle Salemi e, Brett S Phinney e, David M Rocke c, Michael G Davis d, Glendon J Parker a,b,1, Robert H Rice a,b,1
PMCID: PMC7388652  NIHMSID: NIHMS1599685  PMID: 32485593

Abstract

Recent reports highlight possible improvements in individual identification using proteomic information from human hair evidence. These reports have stimulated investigation of parameters that affect the utility of proteomic information. In addition to variables already studied relating to processing technique and anatomic origin of hair shafts, an important variable is hair ageing. Present work focuses on the effect of age on protein profiling and analysis of genetically variant peptides (GVPs). Hair protein profiles may be affected by developmental and physiological changes with age of the donor, exposure to different environmental conditions and intrinsic processes, including during storage. First, to explore whether general trends were evident in the population at different ages, hair samples were analyzed from groups of different subjects in their 20’s, 40’s and 60’s. No significant differences were seen as a function of age, but consistent differences were evident between European American and African American hair profiles. Second, samples collected from single individuals at different ages were analyzed. Mostly, these showed few protein expression level differences over periods of 10 years or less, but samples from subjects at 44 and 65 year intervals were distinctly different in profile. The results indicate that use of protein profiling for personal identification, if practical, would be limited to decadal time intervals. Moreover, batch effects were clearly evident in samples processed by different staff. To investigate the contribution of storage (at room temperature) in affecting the outcomes, the same proteomic digests were analyzed for GVPs. In samples stored over 10 years, GVPs were reduced in number in parallel with the yield of identified proteins and unique peptides. However, a very different picture emerged with respect to personal identification. Numbers of GVPs sufficed to distinguish individuals despite the age differences of the samples. As a practical matter, three hair samples per person provided nearly the maximal number obtained from 5 or 6 samples. The random match probability (where the log increased in proportion to the number of GVPs) reached as high as 1 in 108. The data indicate that GVP results are dependent on the single nucleotide polymorphism profile of the donor genome, where environmental/processing factors affect only the yield, and thus are consistent despite the ages of the donors and samples and batchwise effects in processing. This conclusion is critical for application to casework where the samples may be in storage for long periods and used to match samples recently collected.

Keywords: Proteomic profiling, genetically variant peptides, human hair, ageing, forensic investigation

Introduction

Protein profiling (comparison of relative protein expression levels) and proteomic genotyping (inferring single nucleotide polymorphisms in the genome using the proteome) for human hair comparison and individual identification have shown promise as potential tools for forensic investigation. For example, large inter-individual differences in protein profile are evident in hair shafts (Laatsch et al, 2014). Studies using human twins (Wu et al, 2017) support the conclusion reached using inbred mouse strains (Rice et al, 2012) that differences in profile have primarily a genetic basis. Corneocyte proteins of the hair shaft (Wu et al, 2017), epidermis (Borja et al, 2019) and appendages provide an even more direct connection to genotype in their reflection of individual allelic differences in the genome. Thus, detection of genetically variant peptides (GVPs) containing single amino acid polymorphisms (SAPs) that could be matched to single nucleotide polymorphisms (SNPs) in the coding region of the genome provides a more discriminating way to infer the genotype and even ancestry of the donor (Parker et al, 2016).

From a forensic perspective, limitations on the use of samples for such identifications are important to know. For example, recent findings show that the hair shaft is equally useful for profiling or GVP analysis regardless of its state of pigmentation (Parker et al, 2019) or anatomic site of origin (Chu et al, 2019; Milan et al, 2019), although GVP analysis can offer much greater discrimination. A property that remains to be examined is the reproducibility of such samples with age of donor or period of storage. This issue is pertinent because the protein content of samples may change with the age of the donor at collection, and casework samples are often in storage for many years. Thus, investigators are likely to compare samples from individuals at different ages and originating many years apart.

First, to determine whether global changes in hair are evident with age, present work compares protein profiles in samples from groups of individuals of different age. Samples collected at roughly the same time are compared from American females in their 20’s, 40’s and 60’s from European and African backgrounds, also permitting investigation of the role of ethnic origin. Second, to examine changes in hair from individuals over time, samples were compared in protein profile and GVP content from 9 subjects at age intervals of 4 to 65 years. The results of both studies are presented and reconciled.

MATERIALS AND METHODS

Sample collection

For analysis of samples from different age groups, hair was collected by a commercial supplier from 30 African Americans (10 each of ages 20, 40, 60) and 40 European Americans (20 of age 20 and 10 each of ages 40 and 60), all female (Cohort 1). Samples are referred to as “African” or “European” for simplicity. One sample from each donor was analyzed. To find the effect of age on individuals, a second set of samples that had been collected at different times (stored at room temperature) from nine individuals (A – E (Cohort 2) and F-I (Cohort 3), total three females and six males), each analyzed in sets of 2-6 replicates (Table S1). According to donors, the hair was not chemically treated (dyed, bleached, straightened). These samples were collected with informed consent approved by the University of California Davis Institutional Review Board (protocol 896494) and processed within a year.

Sample processing for protein isolation and mass spectrometry

In each case, aliquots of 4 mg were processed essentially as previously described (Laatsch et al, 2014) except for using 0.05 M ammonium bicarbonate instead of 0.1 M sodium phosphate buffer during reduction and alkylation. Each cohort of samples was processed at a different time by a different investigator. Hair protein digests from the age groups and from individuals were randomized and analyzed by LC-MS/MS on a Thermo Scientific Q Exactive Plus Orbitrap mass spectrometer essentially as previously described (Wu et al, 2017).

Database searching and proteomic profiling based on weighted spectral counts and statistical analysis

Data files generated for the samples of age groups (Cohort 1) and the individuals A-E (Cohort 2) were analyzed using X!Tandem (2016.10.15.2) to search a Uniprot human database with an appended database of common human contaminants and an appended identical but reversed (decoy) peptide database for estimating false discovery rates. The proteomics data are available in the MassIVE repository as #MSV000085030, Proteome Exchange #PXD017771 (https://massive.ucsd.edu/ProteoSAFe/dataset.jsp?task=4a43733eab0c45a0a78a7afc7ad4f685). Also, the data from Cohorts 2 and 3 have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al, 2019) partner repository with the dataset identifier PXD016169. Scaffold (version 4.8.2) was used to validate peptide and protein identifications. Accepted protein identifications contained at least 2 identified peptides. False discovery rates were estimated as 0.1% and 2.9% for peptides and proteins, respectively. The MS results were analyzed as weighted spectral counts (with clusters containing shared peptides) after removal of entries not genuinely present judging by their exclusive peptides. Differential protein abundance analyses were conducted using the limma-voom Bioconductor pipeline, originally developed for analysis of RNA-Seq data and applied here to weighted spectral counts (Ritchie et al, 2015). Standard errors of estimates were adjusted for correlation between replicates from the same sample; subject was included as a fixed effect in all models. The R code is provided in supplemental files.

Protein profiling using PEAKS

Label-free quantitation was performed on the LC-MS/MS datasets of individuals A-I (Cohorts 2 and 3) using PEAKS Studio 10.0 (Bioinformatics Solutions Inc., Waterloo, ON, Canada) to obtain their protein profiles (Zhang et al, 2012). From 2 - 6 samples for each age from all nine individuals amounting to a total of 67 datasets were analyzed against a validated UNIPROT human reference proteome (uniprot-proteome_UP000005640_Human). Default settings of the algorithm were employed except that the precursor mass error range and fragment ion were set to 10 ppm and 0.04 Da, respectively. Cysteine carbamidomethylation (+57 Da) was set as a fixed post translational modification, while deamidation on glutamines and asparagines (+0.98 Da), oxidation of histidines, tryptophan, and methionine (+15.99 Da), dioxidation of methionines (+29.99 Da), pyroglutamation at glutamines (−17.02 Da) and glutamates (−18.01 Da), and acetylation (+42.01) and formylation (+27.99) of N-termini and lysines were variable modifications. The resulting datasets, filtered with a 1% false discovery rate, were analyzed using the Q-module function of PEAKS Studio, and a heat map was generated by label free quantitation for proteins with at least 2 fold difference in the levels among the groups and a significance of 13 (p value = 0.05; −10log(0.05) = 13.01). Due to batch effects identified by comparing profiles of the most recent samples of Cohorts 2 and 3 (Figure S1) a collective comparison of the profiles of individuals A-I was not performed.

GVP analysis

The data files of the nine individuals (A-I) sampled at different ages were searched to generate GVP profiles to determine whether the individuals could be distinguished from each other by this criterion. For GVP analysis, raw data files were submitted to X!Tandem peptide spectra matching algorithm (Global Proteome Machine Fury, X!Tandem Alanine 149 (2016.10.15.2)) after conversion to MzML format by MSConvertGUI (Proteowizard 2.1 http://proteowizard.sourceforge.net). Default search parameters of the algorithm were used except that the virus and prokaryote reference libraries were excluded and point mutations were included in the search. Protein and peptide log(e) scores of −1, and fragment and parent mass error of 20 ppm and 100 ppm, respectively, were used. The files generated by X!Tandem (.XML, thegpm.org) were used to obtain the peptide data, which was then provided to/pasted into GVP Finder (Goecker et al, 2019). From the list of putative GVPs, unique tryptic peptides carrying log(e) scores of < −2 were used for GVP profiling if they displayed no other genetic or chemical modifications (except N/Q deamidation, methionine oxidation, cysteine carboxymethylation and N-terminal acetylation) and, if corresponding to a minor allele, with no major fragmentation masses corresponding to the reference alleles. The GVPs observed in the current study were not validated by DNA sequencing. However, the previously observed rate of false positive identifications of 1.5-2% (Borja et al, 2019; Parker et al, 2016) using the employed method provides high confidence in the GVP profiles. The mass spectrometry proteomics data from Cohorts 2 and 3 have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al, 2019) partner repository.

Random match probability calculation

Random match probabilities (RMPs) were calculated for the GVP profile of each sample using the genotype frequencies of the identified loci from the 1000 Genomes Project Consortium (2015). As all the studied subjects in Cohorts 2 and 3 were of European origin, only European genotype frequencies were used for estimation of RMP. For the calculation, each SNP was treated as independent except the multiple GVPs/alleles from one gene that were treated as one locus. The frequency for the allele combination was then used to estimate the RMPs. The product rule was applied to calculate the RMP for each specific GVP profile (Parker et al, 2016).

Hierarchical clustering

For statistical analysis, all the GVPs detected in the biological replicates were collated. GVPs detected in one or more replicates were given the same weight. All the detections were assigned the value “1”, and those that were not detected in the samples were assigned the value “0”. GVPs that were either detected or not detected throughout the samples (and thus were without probative value) were excluded from the analysis. Agglomerative hierarchical clustering with complete linkage was performed based on the Euclidean distance data for the samples, and a dendrogram for the clustering was plotted using the hclust function of R (Version 3.6.2) (Milan et al, 2019).

RESULTS

Hair proteome comparison among age groups

To study the effect of age and ethnicity on the hair proteome, hair samples from European-American and African Americans of three age groups (20s, 40s, 60s) were studied. The data were analyzed against the Uniprot human database using X!Tandem (2016.10.15.2) and peptide and protein identifications were validated using Scaffold (version 4.8.2). The weighted spectral counts of 241 proteins were used for analyzing pairwise differences in protein profile. As illustrated in Table 1, significant pair-wise differences were not detected in different age groups within each ethnic category or within the ethnic groups of combined ages. However, some significant differences between samples from African-American and European-American subjects were discernable (Figure 1). Proteins higher in the African samples included TYRP1 (Tyrosinase Related Protein 1) and GPNMB (Glycoprotein Nonmetastatic Melanoma Protein B), which participate in melanin biosynthesis (Kobayashi et al, 1998; Zhang et al, 2012), and are a reflection of the higher melanin content in samples from the African-American cohort. In addition, certain keratins (i.e., KRTs 1, 2, 5, 9, 10, 24) were among the proteins higher in level in the African samples. Two proteins involved in membrane lipid metabolism, PLD3 (Gonzalez et al, 2018) and LPCAT3 (Rong et al, 2015), were higher in the European hair samples. As the cuticle cells are bounded by a protein membrane surrounded by lipids (Dias, 2015), the higher number of cuticle layers in the European compared to African samples could contribute to the differences in level of these hair proteins in the two populations. Other proteins higher in the European samples are involved in autophagy (HSP90AA1, ATG9b), ribosomal function (RPS2, EEF1D), and calcium binding (CALML5). The overall data obtained from Cohort 1 identified no consistent proteomic differences in hair shafts as a function of age in the range of 20 to 60 years. Likewise, the lack of overall proteomic differences precludes the possibility of global changes in GVP profile as a function of age. Importantly, however, the data do not exclude the possibility that age-related changes in protein abundance are not detected due to compensating individual variation over time.

Table 1.

Pairwise comparisons of differentially expressed proteins by age and ethnic origin.*

A A20’s A40’s B E20’s E40’s


A40’s 0 E40’s 0
A60’s 0 0 E60’s 0 0


C 20’s 40’s D A20’s A40’s A60’s


40’s 0 E20’s 8
60’s 0 0 E40’s 6
E60’s 2
E All A

All E 19
*

Ethnic groups are indicated by African (A) and European (E) and age groups by 20’s, 40’s and 60’s. The numbers in table indicate the number of proteins with significant differences in expression level.

Figure 1.

Figure 1.

Proteins differing in hair samples from African and European subjects. Shown are the ratios of relative amounts of proteins that differed significantly, judging by weighted spectral counts, between the samples collected from African and European subjects.

Proteomic profile comparisons at different ages in given individuals based on weighted spectral counts

Because a lack of differences in the hair proteome as a function of age in unrelated individuals could be attributed to compensating individual variation, a complementary analysis was also conducted on recent hair samples and those that had been stored over 4 to 65 years from 9 individuals (Supplementary Table S1). Two different groups of subjects (Individuals A-E in Cohort 2 and Individuals F to I in Cohort 3) were analyzed. For the first longitudinal study, proteomic datasets from hair shafts from 5 individuals were processed, and significant differences in pair-wise protein abundances among a total of 211 proteins were tabulated. As shown in Table 2, data from three subjects (A, D, and E) showed few protein differences (0-6) with age in two-way comparisons over periods of 4-11 years. Samples from one subject (C) showed few differences (5-7) over a span of 6 years, but a substantial number (27) over 11 years. One subject (B) showed a substantial number of differences (32) over a span of 65 years. As shown in Figure 2, the protein profiles from a single subject at different ages were much closer in distance than the profiles among different individuals. The data in Table 2 indicated that subjects D and E could be readily distinguished from all the other subjects, but some subject combinations would be more difficult (e.g., A0 or A6 versus C6 or C11). Also the subjects B and C had high levels of internal differences, but these were consistent with longer time frames, a 65 year storage time for subject B and an 11 year difference for subject C. Storage time of the hair sample may have contributed to these differences in protein profiling, although physiological changes due to subject aging cannot be excluded.

Table 2.

Pairwise comparison of proteins significantly different in expression level (weighted spectral counts) in two-way comparisons.*

A6 A11 B0 B65 C0 C6 C11 D0 D5 E0 E4
A0 2 0 34 4 64 7 7 23 22 206 132
A6 6 13 17 30 2 6 7 11 227 131
A11 30 15 56 6 11 26 23 168 103
B0 32 26 17 35 14 16 147 120
B65 88 23 9 24 26 196 132
C0 5 27 54 42 99 105
C6 7 10 9 35 28
C11 38 28 168 93
D0 1 135 118
D5 204 127
E0 3
*

Subjects are identified by letter and years since the first collection (0). Comparisons within the same individual from different years are in bold italic. The numbers in the table indicate the number of differentially expressed proteins.

Figure 2.

Figure 2.

Distances in protein expression levels between samples from single individuals and between subjects. Box plots of Euclidean distances between samples, based on weighted spectral counts. The solid line on each box indicates the median, the lower and upper box edges indicate the 25th and 75th percentiles, respectively, and the lower and upper whiskers indicate the smallest and largest observations lying within 1.5 interquartile ranges of the box edges, respectively.

Proteomic profile comparisons at different ages among individuals based on heatmaps

An additional batch of hair samples (Cohort 3) was processed to expand the number of longitudinal samples. The resulting proteomic profiles were bioinformatically processed to obtain label free quantitation and subsequent heat maps using Q-module in the PEAKs™ software package (version 10.0) (Zhang et al, 2012). The samples were divided into two groups, new (recent samples) and old (collected 7 or more years before present) based on the time since collection. As can be seen in Figure 3A, when protein profiles were filtered based on a 2-fold change and p-value of 0.05, little difference was seen in the proteomes of older and recent samples when compared collectively. Only 3 protein differences were detected, one of which, KRTAP7-1, was a structural protein and one, SEC23B, is involved in endosomal transport and was significantly increased in pigmented hair (Parker et al, 2019). The low number of significant differences, again, could be attributed to the higher variation in proteomic profiles from individual to individual that could cancel statistically significant effects. Another analysis was therefore conducted on the most extreme case, individual I, with a 44 year gap in subject age. Samples from this individual showed 54 proteins that had a 2-fold change in abundance (p=0.05) (Figure 3B) with fifty proteins higher in level in the recent samples compared to the older ones. These included proteins reported to be concentrated in the cuticle (S100A3, KRT40, KRT82, KRTAP16-1, 24-1, and 3-2) among other hair KRTs and KRTAPs (http://www.proteinatlas.org; (Moll et al, 2008; Uhlén et al, 2015). The higher amounts of cuticle concentrated proteins in the recent samples could reflect the loss of cuticle in the older samples (Thibaut et al, 2010). Four of the proteins were higher in level in the older samples, SYNE2 (cytoskeletal protein), AKAP9 (scaffolding protein), and GFAP (an intermediate filament protein) (http://www.proteinatlas.org). A similar analysis from individuals F, G, and H showed considerably fewer proteomic changes over a period of 7 years with 2, 13, and 4 proteins respectively, differing among the stored and recent samples.

Figure 3.

Figure 3.

Heatmap showing differences in the proteomic composition of the newly and previously collected samples of (A) cohort 3 (individuals F-I), and (B) individual I at two times points with a difference of 44 years. The numbers after the hyphens in the sample names represent the storage time of the samples.

Genetically variant peptide analysis

To determine the effect of potential sample degradation with storage, GVPs in each sample were first identified and evaluated. The total number of unique peptides was also measured in each proteomic dataset. Sample storage/age was not seen to affect the average number of identified unique peptides in the samples over periods of <10 years (Figure 4A). However, decreases of ~38, 27, and 33% of the unique peptides, relative to their corresponding recent samples (stored <1 year), were observed in the samples B, C and I over storage periods of 65, 11 and 44 years, respectively (Figure 4A and Table S1). These results are consistent with the previous observations of a reduction in the complexity of proteomes over long periods of time, leading to a loss/degradation of certain proteins (Thibaut et al, 2010; Parker et al, 2016). By contrast, the samples from individual A did not show significant alterations in the amounts of detected proteins or unique peptides over a period of 11 years. The samples from individual E at both ages provided very low numbers of identified unique peptides (≈1200) and proteins (≈300) compared to the average numbers observed in the other samples (≈3000 and ≈600, respectively) (Table S1), an example of a substantial individual effect.

Figure 4: Unique peptides (A) and GVPs (B) in samples from individuals at different ages.

Figure 4:

The lines of different color show values (averages and standard deviations) for individuals at the ages indicated. Significantly lower values in the unique peptides were observed in the stored samples of individuals B, C and I marked by asterisks. Periods of storage are indicated by the time span between points for given subjects.

Genetically variant peptide profiles were identified for each individual (A-I) in the longitudinal study with 2 to 6 biological replicates. Overall, 237 different GVPs at 127 loci were identified with 67 ± 18 GVPs per sample (Table S2). A straightforward relationship could not be made between the age of the sample and the number of GVPs observed except for the individuals B, C, and I (Figure 4B). The numbers of GVPs decreased 1.48 fold from 57.6 ± 8.5 to 36.6 ± 7 (p=0.03) in individual B, 1.5 fold from 63.3 ± 10.5 to 40.3 ± 14 (p=0.015) for individual C, and 2.1 fold from 63.6 ± 6 to 33 ± 3 (p=0.007), for individual I with storage over periods of 65, 11 and 44 years, respectively. However, the number of GVPs detected was seen to be proportional to the number of identified unique peptides in the samples (R=0.86, Figure 5A) as also observed by others (Catlin et al, 2019). GVP detections, when compared with the number of replicates used for each sample, showed that three biological replicates provide enough information to cover 97% of the GVPs, and adding more replicates is hardly more effective (Figure S2).

Figure 5.

Figure 5.

The number of GVPs vs (A) the unique peptides identified in each sample and (B) calculated random match probabilities. The graph shows that the higher the number of unique peptides identified in a sample, the higher will be the number of GVPs observed (p value = 0.0001) and the higher the random match probabilities calculated (p value = 0.003).

Random match probability

To calculate the random match probability (RMP) at each age, SNP profiles were inferred for each of the samples from their respective GVP profiles. The genotype frequencies from the 1000 Genomes Project for the inferred SNPs were used to calculate the RMPs. The calculation employed the product rule with complete independence between GVPs in different genes and complete dependence with GVPs from the same gene. The calculated random match probabilities ranged from 1 in 73 (for sample E1) to 1 in 185 million (for sample A3). The log of the RMP was found to be proportional to the number of GVPs detected (Figure 5B) with rare SNPs considerably increasing the RMPs.

Hierarchical clustering

Proteomic changes observed over 4-7 years were modest. However, more substantial changes over time were observed proteomically in the older samples from 44 and 65 year intervals. This was true for both total numbers of identified proteins (Table S1) and total unique peptide levels (Figure 4A, Table S1). Significant changes were also observed due to batch effects between the second and third cohort of longitudinal samples. A central question of this study was whether these changes also affected the profile of GVP-based inferred SNP genotypes. Therefore, GVP profiles of the individuals at different ages were also compared side by side. Samples from the same individuals were found to carry a large proportion of GVPs common at all ages with some unique GVPs (Figure S3). For the GVP profiles generated for individuals A-I, every GVP detection was assigned a value 1 and a non-detection a value 0 to create a binary data file for calculating Euclidean distances and from them to plot an agglomerative hierarchical clustering dendrogram. As seen in Figure 6, samples collected at different time points from the same individuals were clustered together, although distances among subjects varied. This includes the samples that had the longest storage periods and greatest level of changes, individuals B and I. It also includes samples from different cohorts of longitudinal samples, individuals A to E and F to I, despite recognizable batch effects (Figure S1). This indicates that the GVP-inferred profiles of SNP alleles were more dependent on individual genotypes than changes occurring as a result of storage with proteome degradation and batch effects.

Figure 6.

Figure 6.

Hierarchical clustering dendrogram of all the samples from individual subjects. Based on the Euclidean distances among the samples, the clustering shows that GVP profiles can distinguish individuals despite differences in hair collection and storage times.

DISCUSSION

Previous work has shown that inbred mouse strains can be distinguished by their hair protein profiles (Rice et al, 2012). Subsequently, human individuals were also shown to be distinguishable in this way (Laatsch et al, 2014). Studies of monozygotic twins indicate that the basis for such differences is largely genetic (Wu et al, 2017). That the twin profiles were not found to diverge with age would be consistent with a lack of effect of age or changes with age in the same direction within twin pairs. Present results support the latter alternative. Inasmuch as the different hair shaft layers (e.g., cuticle) have different protein profiles from the rest of the shaft (Laatsch et al, 2014), also reported for sheep wool (Koehn et al, 2010), changing proportions of the layers over time as diameters change could result in altered profiles. Hair shaft diameters reportedly change with age, decreasing in the elderly (Robbins et al, 2012; Kim et al, 2013). This finding is consistent with a report that the relative content of mRNAs encoding keratins and keratin associated proteins in hair follicles also changes with age (Giesen et al, 2011). The basis for chronological ageing is multifactorial, but includes accumulation of oxidative damage from ambient oxidants, ultraviolet radiation, copper content (Marsh et al, 2014) and air pollution (De Vecchi et al, 2019).

Present results indicate a lack of consistent population-wide changes, but some changes are evident for individuals. This finding supports possible usefulness of hair shaft protein profiling in distinguishing among individuals over short time periods, but it highlights a dependence on a short interval between sample collections, a clear limitation. Finding a substantially larger difference in subject C after 11 years compared to 5 or 6 years (27 versus 5 or 7) could be rationalized by a drift in profile. Comparing hair samples from individuals collected at greater than 40 year intervals, as for subjects B and I, reveals a large drift. Such changes could result from effects of normal ageing on hair follicle function/gene expression and profile modifications due to exposure to different physicochemical factors during storage. Therefore, proteomic profiling alone would not likely provide sufficient information to distinguish individuals from each other on a large scale. Moreover, batch effects from processing the samples at different times could confound use of a database of proteomic profiles for individual identification.

GVP analysis, on the other hand, was found to be a powerful tool to identify the source of the hair sample in each of the nine subjects studied despite the samples being stored even for periods >40 years. GVP analysis permits calculation of random match probabilities, providing a statistical basis for confidence in the results. The older samples of the individuals B and I, although deficient in proteins and peptides detected, provided GVP profiles with RMPs of 1 in nearly 1000 and 500, respectively. This capability is of particular interest for old and cold cases, where hair is present as evidence and nuclear DNA is not available. The relation between the number of unique peptides, GVPs, and the calculated RMPs testifies to the value of optimizing sample processing procedures and ongoing efforts to maximize their yields in problematic samples (e.g., from individual E).

The observation of lower unique peptide and protein yields with longer storage is consistent with loss of cuticle in older hair samples (Thibaut et al, 2010; Solazzo et al, 2013). This phenomenon could also rationalize the higher proportion in the recent samples of KRTAPs found in the present study. A factor of potential importance is the chemical modification of samples during long term storage. Deamidation, which has been linked with ageing of hairs (Robinson and Robinson, 2004; Adav et al, 2018), was higher in samples stored over a period of at least 10 years (R=0.97) (Figure S4). Other common chemical modifications were not consistent in their direction of change. Nevertheless, this observation raises the prospect in general of chemical modifications, some of which could depend on storage conditions. An important area for future investigation is the impact on protein profiles, and especially on GVP yield, of treatments individuals may use to reduce environmental damage, and common chemical treatments that are known to induce considerable damage and to reduce protein yields (Marsh et al, 2015).

Conclusion

The present study highlights that the hair, although very resilient in nature, could undergo developmental and environmental changes over decades, resulting in drift in profile and thus intra-individual variation. Therefore, proteomic profiling alone has limitations for human identification. GVP profiles, in contrast, were seen to be more robust over periods as long as 65 years. The stored hair samples, despite losing a fraction of unique peptides and proteins, were sufficient to provide high RMPs. These findings promise to be highly valuable in resolving routine and even old cases where hair samples are available for investigation.

Supplementary Material

MMC 1

Highlights.

  • Hair protein profiles of European-Americans and African-Americans differed.

  • Individual hair protein profiles were considerably different after >40 years.

  • Genetically variant peptides yielded large random match probabilities at any age.

  • Genetically variant peptides appear applicable to casework regardless of sample age.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

Proteomics repository files

The proteomics data are available on the MassIVE repository (https://massive.ucsd.edu) MassIVE # MSV000085030 (reviewer password “age-related”), ProteomeExchange # = PXD017771.

The mass spectrometry proteomics data from Cohorts 2 and 3 have been deposited to the ProteomeXchange Consortium via the PRIDE (Perez-Riverol et al, 2019) partner repository with the dataset identifier PXD016169.

REFERENCES

  1. 1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, Korbel JO, Marchini JL, McCarthy S, McVean GA, Abecasis GR (2015) A global reference for human genetic variation. Nature 526:68–74 [DOI] [PMC free article] [PubMed] [Google Scholar]
  2. Adav SS, Subbaiaih RS, Kerk SK, Lee AY, Lai HY, Ng KW, Sze SK, Schmidtchen A (2018) Studies on the proteome of human hair-identification of histones and deamidated keratins. Scientific Reports 8(1): 1599. [DOI] [PMC free article] [PubMed] [Google Scholar]
  3. Borja T, Karim N, Goecker Z, Salemi M, Phinney BS, Naeem M, Rice RH, Parker GJ (2019) Proteomic genotyping of fingermark donors with genetically variant peptides. Foren Sci Int: Genet 42:21–30 [DOI] [PubMed] [Google Scholar]
  4. Catlin LA, Chou RM, Goecker ZC, Mullins LA, Silva DS, Spurbeck RR, Parker GJ, Bartling CM (2019) Demonstration of a mitochondrial DNA-compatible workflow for genetically variant peptide identification from human hair samples. Foren Sci Int: Genet 43:102148. [DOI] [PubMed] [Google Scholar]
  5. Chu F, Mason KE, Anex DS, Jones AD, Hart BR (2019) Hair proteome variation at different body locations on genetically variant peptide detection for protein-based human identification. Scientific Reports 9(1):7641. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. De Vecchi R, da Silveira Carvalho Ripper J, Roy D, Breton L, Alexandre Germano Marciano AG, de Souza PMB, de Paula Correa M (2019) Using wearable devices for assessing the impacts of hair exposome in Brazil Scientific Reports 9(1): 13357. [DOI] [PMC free article] [PubMed] [Google Scholar]
  7. Dias MF (2015) Hair cosmetics: an overview. Int J Trichology 7:2–15 [DOI] [PMC free article] [PubMed] [Google Scholar]
  8. Goecker ZC, Wills BM, Salemi SR, Phinney BS, Rice RH, Walsh S, Parker GJ (2019) Biogeographic classification of European and African hair using genetically variant peptides. 30th Annual International Symposium on Human Identification Poster #64 [Google Scholar]
  9. Gonzalez AC, Schweizer M, Jagdmann S, Bernreuther C, Reinheckel T, Saftig P, Damme M (2018) Unconventional trafficking of mammalian phospholipase D3 to lysosomes. Cell Reports 22:1040–1053 [DOI] [PubMed] [Google Scholar]
  10. Kim SN, Lee SY, Choi MH, Joo KM, Kim SH, Koh JS, Park WS (2013) Characteristic features of ageing in Korean women’s hair and scalp. Br J Dermatol 168:1215–1223 [DOI] [PubMed] [Google Scholar]
  11. Kobayashi T, Imokawa G, Bennett DC, Hearing VJ (1998) Tyrosinase stabilization by Tyrp1 (the brown locus protein). J Biol Chem 273:31801–31805 [DOI] [PubMed] [Google Scholar]
  12. Koehn H, Clerens S, Deb-Choudhury S, Morton J, Dyer JM, Plowman JE (2010) The proteome of the wool cuticle. J Proteome Res 9:2920–2928 [DOI] [PubMed] [Google Scholar]
  13. Laatsch CN, Durbin-Johnson BP, Rocke DM, Mukwana S, Newland AB, Flagler MJ, Davis MG, Eigenheer RA, Phinney BS, Rice RH (2014) Human hair shaft proteomic profiling: individual differences, site specificity and cuticle analysis. PeerJ 2:e506. [DOI] [PMC free article] [PubMed] [Google Scholar]
  14. Marsh JM, Iveson R, Flagler MJ, Davis MG, Newland AB, Greis KD, Sun Y, Chaudhary T, Aistrup ER (2014) Role of copper in Photochemical damage to hair. Int J Cosmetic Sci 36:32–38 [DOI] [PubMed] [Google Scholar]
  15. Marsh JM, Davis MG, Flagler MJ, Sun Y, Chaudhary T, M Mamak M, McComb DW, Williams REA, Greis KD, Rubio L, Coderch L (2015) Advanced hair damage model from ultraviolet radiation in the presence of copper Int J Cosmetic Sci 37:532–541 [DOI] [PubMed] [Google Scholar]
  16. Milan J, Wu P-W, Salemi M, Durbin-Johnson B, Rocke DM, Phinney BS, Rice RH, Parker GJ (2019) Comparison of protein expression levels and proteomically-inferred genotypes using human hair from different body sites. Foren Sci Int: Genet 41:19–23 [DOI] [PubMed] [Google Scholar]
  17. Moll R, Divo M, Langbein L (2008) The human keratins: biology and pathology. Histochem Cell Biol 129:705–733 [DOI] [PMC free article] [PubMed] [Google Scholar]
  18. Parker G, Goecker Z, Franklin R, Durbin-Johnson B, Milan J, Karim N, De Leon C, Matzoll A, Borja T, Rice B (2019) Proteomic genotyping: using mass specrometry to infer SNP genotypes in a forensic context. For Sci Intl: Genet Suppl Ser 7:664–666 [Google Scholar]
  19. Parker GJ, Leppert T, Anex DS, Hilmer JK, Matsunami N, Baird L, Stevens J, Parsawar K, Durbin-Johnson BP, Rocke DM, Nelson C, Fairbanks DJ, Wilson AS, Rice RH, Woodward SR, Bothner B, Hart H, Leppert M (2016) Demonstration of protein-based human identification using the hair shaft proteome. PLoS One 11(9):e0160653. [DOI] [PMC free article] [PubMed] [Google Scholar]
  20. Perez-Riverol Y, Csordas A, Bai J, Bernal-Llinares M, Hewapathirana S, Kundu D, Inuganti A, Griss J, Mayer G, Eisenacher M, Perez E, Uszkoreit J, Pfeuffer J, Sachsenberg T, Yilmaz S, Tiwary S, Cox J, Audain E, Walzer M, Jarnuczak AF, Ternent T, Brazma A, Vizcaino JA (2019) The PRIDE database and related tools and resources in 2019: improving support for quantification data. Nucl Acids Res 47(D1):D442–D450 [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Rice RH, Bradshaw KM, Durbin-Johnson BP, Rocke DM, Eigenheer RA, Phinney BS, Sundberg JP (2012) Differentiating inbred mouse strains from each other and those with single gene mutations using hair proteomics. PLoS One 7:e51956. [DOI] [PMC free article] [PubMed] [Google Scholar]
  22. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK (2015) limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucl Acids Res 43(7):e47. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Robbins C, Mirmirani P, Messenger AG, Birch MP, Youngquist RS, Tamura M, Filloon T, Luo F, Dawson TLJ (2012) What women want - quantifying the perception of hair amount: an analysis of hair diameter and density changes with age in Caucasian women. Br J Dermatol 167:324–332 [DOI] [PubMed] [Google Scholar]
  24. Robinson NE, Robinson AB (2004) Amide molecular clocks in drosophila proteins: potential regulators of aging and other processes. Mech Ageing Dev 125:259–267 [DOI] [PubMed] [Google Scholar]
  25. Rong X, Wang B, Dunham MM, Hedde PN, Wong JS, Gratton E, Young SG, Ford DA, Tontonoz P (2015) Lpcat3-dependent production of arachidonoyl phospholipids is a key determinant of triglyceride secretion. Elife 4:e06557. [DOI] [PMC free article] [PubMed] [Google Scholar]
  26. Solazzo C, Dyer JM, Clerens S, Plowman J, Peacock EE, Collins MJ (2013) Proteomic evaluation of the biodegradation of wool fabrics in experimental burials. Int Biodeterior Biodegrad 80:48–59 [Google Scholar]
  27. Thibaut S, De Becker E, Bernard BA, Huart M, Fiat F, Baghdadli N, Luengo GS, Leroy F, Angevin P, Kermoal AM, Muller S (2010) Chronological ageing of human hair keratin fibres. Int J Cosmetic Sci 32:422–434 [DOI] [PubMed] [Google Scholar]
  28. Uhlén M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P, Mardinoglu A, Sivertsson A, Kampf C, Sjostedt E, Asplund A, Olsson I, Edlund K, Lundberg E, Navani S, Al-Khalili Szigyarto C, Odeberg J, Djureinovic D, Takanen JO, Hober S, Alm T, Edqvist P-H, Berling H, Tegel H, Mulder J, Rockberg J, Nilsson P, Schwenk JM, Hamsten M, von Feilitzen K, Forsberg M, Persson L, Johansson F, Zwahlen M, von Heijne G, Nielsen J, Ponten F (2015) Tissue-based map of the human proteome. Science 347:394 (1260419) [DOI] [PubMed] [Google Scholar]
  29. Wu P-W, Mason KE, Durbin-Johnson BP, Salemi M, Phinney BS, Rocke DM, Parker GJ, Rice RH (2017) Proteomic analysis of hair shafts from monozygotic twins: Expression profiles and genetically variant peptides. Proteomics 17:13–14, 1600462 [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Zhang J, Xin L, Shan B, Chen W, Xie M, Yuen D, Zhang W, Zhang Z, Lajoie GA, Ma B (2012) PEAKS DB: de novo sequencing assisted database search for sensitive and accurate peptide identification. Mol Cell Proteomics 11(4):M111.010587. [DOI] [PMC free article] [PubMed] [Google Scholar]
  31. Zhang P, Liu W, Zhu C, Yuan X, Li D, Gu W, Ma H, Xie X, Gao T (2012) Silencing of GPNMB by siRNA inhibits the formation of melanosomes in melanocytes in a MITF-independent fashion. PLoS One 7(8):e42955. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

MMC 1

RESOURCES