The application of multiple linear regression methods to FTIR spectra of fingernails for predicting gender and age of human subjects

L Mihaly Cozmuta

doi:10.1016/j.heliyon.2025.e42815

. 2025 Feb 19;11(4):e42815. doi: 10.1016/j.heliyon.2025.e42815

The application of multiple linear regression methods to FTIR spectra of fingernails for predicting gender and age of human subjects

L Mihaly Cozmuta ¹

PMCID: PMC11891726 PMID: 40066026

Abstract

The paper explores the accuracy of gender and age prediction of human subjects based on the chemometric analysis of FTIR spectra from fingernails. The baseline and scaling over the 0–1 range were applied to FTIR spectra from fingernails of 123 subjects, and wavenumbers for which absorbance values showed a statistically significant correlation with gender and age were identified. The prediction accuracy was analyzed using: Multiple linear regression, Forward stepwise regression, Backward stepwise regression, Principal component regression, and Partial least squares regression. As regard the gender prediction, the principal component regression model proved to be the most accurate, with 8 extracted components allowing a prediction of 93.50 % (86.00 % for women, 98.63 % for men). The predictive power of the model showed that in case of new subjects, gender classification could be done in 91.06 % of cases (84.00 % for women, 95.89 % for men). For age prediction, the optimal backward stepwise regression model with 6 statistically significant predictors showed an average error of 11.39 % (13.70 % for women, 9.81 % for men). The predictive power of the model was 12.13 % (11.73 % for women, 10.35 % for men). The age prediction was less accurate, with a maximum error of 10.00 % achieved only in the case of 65.04 % of subjects.

Keywords: Fingernails, FTIR, Linear regression models, Cross-validation, Gender prediction, Age prediction

Graphical abstract

Highlights

•
Gender and age prediction of subjects from fingernails FTIR spectra were studied.
•
Five multiple linear regression models were used to process the FTIR spectra.
•
The principal component regression model was the most accurate for gender prediction.
•
The backward stepwise regression model was the most accurate for age prediction.
•
The age prediction was less accurate than gender prediction.

1. Introduction

FTIR infrared absorption spectrometry is a fast and non-destructive method used in a wide range of qualitative and quantitative analyzes, in identifying the composition and structure of proteins, nucleic acids, lipids, carbohydrates or other organic compounds. Proteins have an essential role in living organisms and present an enormous diversity both from the point of view of basic amino acids and spatial orientation of polypeptide chains [1]. The structure of proteins is closely correlated with their functional role and any deviation from normality can be a warning signal containing information about the disturbing factor. Coupling FTIR spectra with different statistical analysis techniques allows highlighting changes in the composition and structure of biomolecules with applications in a very large number of fields. In the food industry, the technique can be used to identify protein sources [2], to highlight falsifications [3] and degradation processes during storage [4]. In medicine, it can be used to identify conditions such as: autoimmune diseases [5], Alzheimer's [6], osteoporosis [7], rheumatoid arthritis [8], hypoalbuminemia [9], myocardial infarction [10], cartilage denaturation [11], and diabetes [12]. Papers related to the screening of different types of cancer by applying FTIR technique [[13], [14], [15], [16], [17], [18]] were also reported in the literature. In sports analysis, FTIR can be successfully used for health monitoring and doping prevention [19] or for highlighting drug abuse [20]. Nails are a keratinized matrix with continuous growth that incorporates information related to genetic inheritance [21], changes occurring in their structure being correlated with different medical conditions. Higher concentration of glycated protein in fingernails of diabetics as compared to non-diabetics were found by the work of Coopman et al. [22]. The presence of amide II band, a peak particularly around 468 cm⁻¹, and alkyl thiolated structures were identified in the protein structure of nails from diabetic patients, while the nails structure of non-diabetic does not have amide II structure [23]. Psoriasis produces a significant decrease of α-helix and increase of β-sheet and random coil components in the fingernail protein, accompanied by the degradation of disulfide bridges of the cystine to thiol groups [24]. A decreased α-helix content and increased β-sheet content were observed in the secondary structural changes of proteins in fingernails of chronic fatigue syndrome patients [25]. The mineral composition of nails is also influenced by diseases or living environment. The papers of Sihota et al. [26] and Nasli-Esfahani et al. [27] reported higher amounts of Mn and lower contents of Zn, Mg, Cu, Se and Cr in the nails of diabetic patients. The consumption or abuse of drugs [28] or exposure of the human body to several inorganic pollutants (Hg, Cd, Pb or Bi) affects their level in nails [29,30]. Biometric authentication using fingernail plates has also been successfully tried [31]. Since the nails undergo much less decomposition processes as compared to soft tissues, they are particularly important in forensic analysis [32]. Relatively recent studies attempted to determine the gender and age of subjects based on analysis of fingernails samples. In terms of age prediction, the study of Fokias et al. [33] showed that the chronological age can be assessed through DNA methylation patterns in nails. An accurate classification and prediction of gender and age in forensic context was achieved by chemometric analysis of ATR-FTIR spectra of fingernails [[34], [35], [36]] or Raman spectroscopy [37].

This study investigates the accuracy in gender and age prediction of human subjects based on the chemometric analysis of FTIR spectra from their fingernails. Selection of the most accurate multiple linear regression model from the five models employed for the processing fingernails FTIR spectra enhances the approaches reported by the literature for the gender prediction. Insights into models and exploration of the performances and limitations of multiple linear regression models in predicting gender and age of human subjects are another novel element of the paper.

2. Materials and methods

2.1. Samples collection and preparation

Fingernail samples were collected from 123 healthy volunteers, 50 females and 73 males, aged in range from 21 to 85 years. The study was approved by the Ethics Committee of Technical University of Cluj Napoca Romania (474/08.12.2022) and complies the principles of the declaration of Helsinki as revised in 2003. Informed consent was sought and obtained from each volunteer before enrollment into the study. The datasets collected and analyzed during the study are not publicly available due to ethical considerations. Fingernail clippings free of nail polish, onychomycosis and visible physical damages were collected from enrollment subjects in Eppendorf tubes, cleaned with acetone to remove dirt and organic residues, washed with ultrapure water, and dried at room temperature (25C⁰) for 24 h. They were stored under sterile environment at temperature of 4 °C for their later analysis. The experiments were conducted within one month after the nail samples were collected. From each volunteer, at least minimum three fingernail samples from different fingers were collected. Nails from each finger of each hand of 5 subjects randomly selected, were collected to analyze the data variability.

2.2. Instrumental parameters

The fingernail samples analysis was performed on a PerkinElmer FTIR Spectrophotometer BX2 (USA), based on single beam scanning, KBr beamsplitter, DTGS detector, MIR source, OPD velocity 0.3 cm/s, signal-to-noise ratio 15000/1, equipped with a Pike Miracle ATR diamond crystal, on the wavelength range of 4000–600 cm⁻¹. Each sample was placed directly on the crystal with the same pressure and scanned in 50 repetitions with a resolution of 4 cm⁻¹, the average spectrum was recorded automatically. Before each measurement, crystal cleaning was carried out with acetone and tissue paper. The background without a sample on the crystal was first collected and then subtracted from all measured spectra. The Spectrum v5.3.1. Software (PerkinElmer 2009–23384, USA) was used to control the equipment and manage the measurement.

2.3. Pre-treatment of the data

Two ranges of wavenumbers of 3600–2800 cm⁻¹ and 1800-1000 cm⁻¹ were considered, for which the absorbance values at the limits of the intervals had the minimum values. Aiming to reduce the background noise caused by radiation scattering, resulting in the absorbance variations mostly due to the chemical characteristics of the samples, the initial processing of FTIR spectra by applying baselines and normalization over the 0–1 range was applied. When analyzing regression models, a single spectrum was considered for each person, which was obtained as the arithmetic mean of the spectra of the analyzed samples.

2.4. Statistical analysis

Data processing was done using Microsoft Office Excel (365). The application of multiple linear regression models was performed using Statistics 7.0 software (Stat Soft Inc., Tulsa, USA).

2.4.1. Data variability

Aiming to assess the differences between a certain number of FTIR spectra, the variability of the data was calculated as follows: i) for each wavenumber, the arithmetic mean value of the absorbances of all spectra and the value of the standard deviation were calculated; ii) for all wavenumbers the arithmetic mean of absorbance means and arithmetic mean of standard deviations were quantified; iii) the variability of the data was calculated by ratios as a percentage of the mean of the standard deviations to the arithmetic mean of the absorbance means. In the preliminary phase, data variability as a measure of homogeneity in the composition of individual samples was analyzed in three situations: (i) In the case of those 5 subjects from whom fingernails were collected from two hands, 5 FTIR spectra were obtained by changing the nails position on the crystal, and based on them the variability of the samples was calculated; (ii) In the second case, for each finger the individual spectra were mediated. For each person, by comparing the mean spectra the variability of the person was calculated. It highlights the differences that may occur between the nail samples from different fingers belonging to the same person; (iii) In the third case, total variability was calculated as the differences between the average spectra characteristic of each person (of the 123). The ratio of total variability to the average variability of individuals is an indicative measure of the possibility of differentiating a particular person. The higher the value of this ratio the higher the possibility of differentiating the gender and age of a particular person.

2.4.2. Regression methods used in data analysis

The aim of the study was to analyze the possibility of determining the gender and age of a subject based on infrared absorption spectra obtained from fingernails samples. Based on the FTIR spectra, initially processed by baseline application and normalization, wavenumbers for which the absorbance values show statistically significant correlation coefficients with the gender and age of the subjects were identified. The main multiple linear regression models were used (described below), the gender and age of subjects being considered the dependent variable, while the absorbance values corresponding to previously determined wavenumbers were taken as independent (predictive) variables. The processing of experimental data aimed to determine the best prediction model. For this, the following models were tested: i) Multiple linear regression (MLR), ii) Forward stepwise regression (FSR), iii) Backward stepwise regression (BSR), iv) Principal component regression (PCR) and v) Partial least squares regression (PLS).

i)
Multiple linear regression MLR model considers the dependent variable as a linear combination of variables considered independent. The values of the model coefficients are determined by the method of least squares by minimizing the quadratic errors between the values of the experimental dependent variables and the values obtained from the model. The limitations of the model are related to the following aspects: assuming the linearity relationship between variables, assuming the independence of predictive variables, and in case of a high number of independent variables choosing the optimal number to ensure the minimum prediction error. In case of significant departure from linearity (which can be evidenced by error analysis) it is recommended to use nonlinear regression models. The major problem of MLR is that it assumes that predictive variables are independent. In fact, these variables have certain correlations with each other, being expressed by the term multicollinearity. Multicollinearity can be evaluated by means of parameters such as tolerance or semi-partial correlation coefficients, which show the extent to which a given predictive variable can be expressed as a linear combination of the other predictive variables. A predictive variable with a low tolerance indicates that the considered variable represents a combination of the other variables, therefore it is not recommended to be introduced into the regression model. Each independent variable in the model, on the one hand, will explain a certain percentage of the variability of the dependent variable and, on the other hand, will introduce its own error. If a predictive variable with a high tolerance is introduced into the regression model, it will not significantly increase the explained proportion of the variability of the dependent variable, instead it will generate an increase in model error by introducing its own error. For this reason, the MLR is not suitable for use in the case of high multicollinearity which decreases the predictive power of the model. When many independent variables are available, it is preferable to eliminate variables that have very little influence on prediction. Two cases can be encountered: i) underfitting (a model with few independent variables that does not adequately describe the variation of the dependent variable) and ii) overfitting (a linear regression model with too many independent variables will generate increased noise in the variation of the dependent variable and implicitly will generate a decrease in the predictive power of the model). The optimal variant is between the two cases. In the case of MLR there is the possibility of testing the level of statistical significance of the coefficients of predictive variables. It is recommended to maintain within the model variables (with high tolerance) that have statistically significant coefficients. The higher the ratio of the number of statistically significant independent variables to the total number of independent variables, the closer the predictive power of the regression model will come to the regression model error. Stepwise regression models are used if the number of independent variables is high. The goal is to determine the best regression model (with the best predictive power) that simultaneously contains a minimum number of statistically significant independent variables. The step-by-step methods can be applied in two variants: forward stepwise regression FSR and backward stepwise regression BSR. For each of these, the values of the input and elimination criteria of variables are imposed at the initial stage. These criteria can be a certain level of statistical probability or a certain value of the Fisher criterion and are designed to quantify how significant the contribution of an independent variable must be to the regression equation for it to be added or removed from the model.
ii)
In the case of FSR, the direction of evolution is to successively increase the number of variables in the model. In the first step, all simple linear regression models between the dependent variable and each of the independent variables shall be analyzed. The optimal variable related to the value of the predetermined input criterion will be retained in the model. In the second step, to the previously established model, each of the remaining independent variables is successively added and the second optimal independent variable based on the input criterion will be introduced into the model. In the following stages, the algorithm proceeds similarly until finally obtaining the best regression model. After each step of introducing an additional variable, the contributions of each variable in the model will be tested and if this contribution is below the threshold value of the elimination criterion, the variable will be eliminated. The variable removed in one step from the model becomes a candidate to be entered in the next steps.
iii)
The BSR model proceeds similarly, with the observation that the direction of evolution is reversed. It starts from the generalized linear regression model that includes all independent variables (MLRs). Restricting the number of predictors in the model will be achieved in successive stages by comparing the effect of each variable in the model with the threshold of the elimination criterion. Independent variables that have a contribution value lower than the set value of the elimination criterion will be removed from the model. In the case of step-by-step regression methods, the required values of the input or elimination criteria should be chosen to force each independent variable into the model at least once. In the presence of multicollinearity, step-by-step regression models FSR and BSR show superiority to the MLR method.
iv)
Principal component regression (PCR). Principal main component analysis (PCA) is a mathematical method that converts initial predictors affected by multicollinearity into truly independent principal components. The goal of PCA is to reduce the number of predictors while preserving the covariance structure of predictors. Because in the case of initial predictors their physical significance is known and in the case of main components this significance is lost, these main components are also called independent latent variables. The main components are a linear combination of the initial predictors. The order of the main components corresponds to the extent to which they influence the dependent variable. Thus, the first main component will explain to the greatest extent the variability of the dependent variable. The second component will explain the variability of the dependent variable to a lesser extent compared to the first component, and so on. The last main component will influence the dependent variable to the smallest extent. The realization of the multiple linear regression model based on main components (PCR) has the advantage of including in the regression model a smaller number of predictors compared to MLR and of not including in the model the main components that have a small (insignificant) influence that would increase prediction errors. The optimal number of main components introduced into the regression model can be determined based on the statistical significance of the effect of their introduction, based on the evaluation of the predictive power of the regression model or criteria (Kaiser or Cattell). The effect of adding to the regression model the main components that have a statistically significant effect generates an increase in the predictive power of the model. When main components with insignificant effect are included into the model, it resulted in a decrease in the predictive power. PCR is superior to MLR in conditions of higher multicollinearity of initial predictors.
v)
Partial least squares regression (PLS) model is similar to PCR model, by using latent variables that represent linear combinations of initial predictors. The difference is in how they are determined. While PCA determines these variables so that they explain as much as possible of the predictor variant without considering the dependent variable, in PLS the determination of independent variables aims both to extract the maximum variant of the initial predictors and the covariation between them and the dependent variable. The purpose of the model is to use a minimum number of predictors to obtain the highest performance of the regression model to ensure the best predictive power. From this point of view PLS is superior to PCR.

2.4.3. Cross-validation method

The quality of a linear regression model can be highlighted by some parameters, among which: Pearson correlation coefficient (r), determination coefficient (R²), standard error of estimate (SEE) or statistical probability level determined based on ANOVA analysis. It is also possible to test the statistical significance of the effect each predictor in the model has on the dependent variable by testing the associated regression coefficients. In all cases, the aim is to obtain the best model describing the interdependencies between variables and ensuring the smallest distances between experimental values and modeled values of the dependent variable. However, in many cases, the predictive power of the model is more important than its quality. Predictive power is a measure of the error in determining the dependent variable when applying the model to a set of predictor values that were not used in establishing the model. Predictive power is the error likely to be obtained when estimating the unknown dependent variable based on the model and a new set of predictive variables. There are several possibilities to determine the predictive power of a regression model, all based on the leave-one-out cross validation method. It involves the following steps: i) from the n data sets of the predictors, remove the first set and establish the regression model; ii) with the help of model coefficients, based on the set of eliminated values, the value of the dependent variable is calculated and the residual value (error) is calculated as the difference between the values of the initial dependent variable and obtained based on the reduced model; iii) the process proceeds in similar successive stages in which each of the predictors' datasets is eliminated in turn; iv) predictive power is assessed considering the differences between dependent variable values and values modeled by cross-validation (based on reduced models), while model quality is evaluated based on differences between dependent variable values and modeled values (based on extended model). In general, in the case of a high number of predictors, for all the possibilities of achieving multiple linear regression presented above, increasing the number of statistically significant variables added to the model increases both the quality of the model and its predictive power. When the model is enlarged by adding statistically insignificant predictors, although a better regression model will be obtained, its predictive power will diminish. The optimal number of predictors will correspond to the maximum predictive power.

3. Results and discussion

3.1. Spectral features

The spectral domains in the collected FTIR spectra (3600-2800 cm⁻¹ and 1800-1000 cm⁻¹) cover the main elements of nail composition: peptides and proteins (keratin), lipids (mainly cholesterol), phosphate-carrying compounds and water. Table 1 displays the assignment of the main vibration bands characteristic of the FTIR spectra of the analyzed fingernail samples [[15], [16], [17],[34], [35], [36],38,39].

Table 1.

The main spectral peaks and their band assignment.

Wavenumber (cm⁻¹)	Band assignment
3286	O-H stretching
3263	NH₂ stretching modes in peptides and proteins
2918–2855	C-H symmetric stretching (CH₂ and CH₃ anti symmetric and symmetric stretching modes) in lipids and proteins
1673–1671	C=C stretching modes in cholesterol
1635	C=O stretch and small contribution from NH bend, amide I band
1536–1527	Amide II, C=O stretching coupled with C-N stretching and bending deformation of N-H
1465–1334	antisymmetric and symmetric deformation modes of CH₃ and CH₂ in cholesterol
1455	CH₂, CH₃ asymmetric bending modes in lipids and proteins
1412	C-H deformation in CH₃ symmetric mode in amino acids
1382–1116	wagging and rocking modes of CH₂ groups in cholesterol
1300–1285	Amide III, N-H in plane bending, O=C-N bend, C-N stretch
1246	Amide III band, C-N stretching vibrations
1177	C-O stretch
1170–1080	PO₂⁻ symmetric stretching of nucleic acids
1127	Ribose C-O band
1078	C-C random conformation
1045	C=O absorption
1034	C-O amide I band, C-C, C-N

Open in a new tab

3.2. Data variability

Table 2 shows the values of sample variability and Table 3 presents the values of variability of persons and total variability of data. The data obtained show that the variability of samples registers variations in the range 1.82 %–5.35 %, and the variability of the person registers variations in the range 3.48 %–6.57 %. For the same person, there are no statistically significant differences between the FTIR spectra coming from the left and right hands. These data show that the average differences that can occur as result of variation in the composition of each sample are 3.98 %, and for the same person, changing the finger from which the nail sample was collected causes an average variation in data of 5.37 %. The average variability calculated for the mean spectra characteristic of the 123 people was 22.31 %. The aforementioned data were obtained based on the spectra initially processed by applying baseline lines and normalizing on the 0–1 range. If raw spectra were used, the values were higher, of 14.30 % for sample variability, 16.51 % for person variability, and 41.71 % for total variability. In this case, the ratio of total variability to variability of subject (R) was 2.53, lower compared to the initially processed spectra in which the ratio was 4.16. These values show that the accuracy in classification of the gender and age of a particular subject is higher by using processed FTIR spectra. The decrease in the data variability of the subject by using processed spectra instead of raw spectra is exemplified in Fig. 1, Fig. 2. These figures show the average spectra of five fingers corresponding to the left hand of the same person, in the raw version and in the processed version (by applying the baseline and normalization). It can be observed a reducing in the data variability from 13.13 % in the case of raw spectra, to 4.28 % in the processed spectra.

Table 2.

Variability of samples - average values for fingernail samples collected from both hands.

Subject	Hand	Number of collected spectra 5/fingernail sample	Raw spectra variability	Processed spectra variability
1	left	5 × 5	13.13	4.28
2		5 × 5	9.36	3.23
3		5 × 5	24.64	4.75
4		5 × 5	10.72	2.62
5		5 × 5	9.10	4.90
1	right	5 × 5	24.09	3.97
2		5 × 5	8.76	3.88
3		5 × 5	14.10	5.35
4		5 × 5	12.83	5.02
5		5 × 5	16.25	1.82
average	left		13.39	3.96
average	right		15.21	4.01
min	both		8.76	1.82
max			24.64	5.35
average			14.30	3.98

Open in a new tab

Table 3.

Values of subjects’ variability and total data variability.

Subject	Number of spectra 1/fingernail	Raw spectra variability	Processed spectra variability
1	10	9.40	6.57
2	10	13.95	6.49
3	10	20.08	5.90
4	10	27.06	3.48
5	10	12.09	4.41
min	50	9.40	3.48
max	50	27.06	6.57
average	50	16.51	5.37
total	123	41.71	22.31
R		2.53	4.16

Open in a new tab

Fig. 1 — Average raw spectra of all fingernails from the left hand of the same subject; variability of raw spectra 13.13 %.

Fig. 2 — Processed media spectra of all fingernails from the left hand of the same subject; variability of processed spectra 4.28 %.

3.3. Identification of independent variables correlated with gender and age of subjects

In order to identify the wavenumbers for which the absorbance values are related to the gender of the subjects, the next steps were followed: i) all values of the correlation coefficients between the gender of the subjects (quantified with 0 for women and 1 for men) and the absorbance values of the spectra for all wavenumbers of the considered domains were calculated; ii) each value of the correlation coefficient was subjected to the Student's statistical significance test; iii) from the variation diagrams of the absolute values of the correlation coefficients depending on the wavenumber, 27 wavenumbers corresponding to the optimal points (maximum or inflection) were selected (Fig. 3a and b). The same procedure was applied in the case of correlations between absorbance values and age of subjects, and 24 wavenumbers corresponding to optimal points were identified (Fig. 3c and d). In the regression models tested, absorbance values corresponding to the 27 and 24 wavenumbers, respectively, were considered as independent variables.

Fig. 3 — Variations in absolute values of correlation coefficients with wavenumbers. a) correlation with the gender of subjects (range 800–1800 cm⁻¹); b) correlation with the gender of subjects (range 2800–3600 cm⁻¹); c) correlation with the age of subjects (range 800–1800 cm⁻¹); d) correlation with the age of subjects (range 2800–3600 cm⁻¹).

3.4. Results of regression models

In linear regression models, gender and age of subject were assimilated to the dependent variable (y) and independent variables (x) assimilated to absorbent values corresponding to the 24 or 27 wavenumbers previously established. To identify the best models that allow modeling (or prediction) the gender and age of the subjects from whom the nail samples originated, the 5 regression methods presented in chapter 2.4.2 were tested and further discussed.

a)
The results obtained in the case of gender prediction of subjects by applying the regression models MLR, FSR, BSR, PCR and PLS are presented in Table 4.

Table 4.

Results obtained by fitting experimental data on the considered regression models.

	Gender	v1	v2	v3	v4	v5	v6	v7	v8	v9	v10
	Regression model	MRL	MLR	MLR	FSR	BSR	PCR	PCR	PLS	PLS	PLS
1	nx	27	7	3	11	13	26	8	26	7	6
2	nx(s)	7	3	3	11	13	8	8	7	6	6
3	nx(n)	20	4	0	0	0	18	0	19	1	0
4	F	9.51	20.00	20.93	21.03	18.62	9.93	26.90	9.96	30.73	34.44
5	p	6.10E-17	2.45E-17	5.90E-11	2.14E-22	1.05E-22	2.14E-17	5.54E-23	1.90E-17	1.33E-23	1.26E-23
6	DW	1.73	1.29	0.77	1.51	1.65	1.73	1.49	1.73	1.56	1.53
7	nout	3	5	5	6	5	3	2	3	4	7
8	nout[%]	2.44	4.07	4.07	4.88	4.07	2.44	1.63	2.44	3.25	5.69
9	n(m)	123	123	123	123	123	123	123	123	123	123
10	R(m)	0.86	0.74	0.59	0.82	0.84	0.85	0.81	0.85	0.81	0.80
11	R^2(m)	0.73	0.55	0.35	0.68	0.71	0.73	0.65	0.73	0.65	0.64
12	AdjR^2(m)	0.65	0.52	0.33	0.64	0.67	0.66	0.623	0.66	0.63	0.62
13	SSE(m)	8.02	13.38	19.43	9.62	8.69	8.05	10.28	8.02	10.34	10.67
14	SEE(m)	0.29	0.34	0.40	0.29	0.28	0.29	0.30	0.29	0.30	0.30
15	F(m)	47	39	37	44	46	47	43	47	44	42
16	M(m)	70	70	63	70	69	71	72	70	70	68
17	T(m)	117	109	100	114	115	118	115	117	114	110
18	F(m)[%]	94.00	78.00	74.00	88.00	92.00	94.00	86.00	94.00	88.00	84.00
19	M(m)[%]	95.89	95.89	86.30	95.89	94.52	97.26	98.63	95.89	95.89	93.15
20	T(m)[%]	95.12	88.62	81.30	92.68	93.50	95.93	93.50	95.12	92.68	89.43

leave-one-out cross validation

21	R(cv)	0.70	0.68	0.56	0.75	0.77	0.70	0.77	0.70	0.76	0.77
22	R^2(cv)	0.49	0.46	0.31	0.56	0.60	0.50	0.59	0.49	0.58	0.59
23	AdjR^2(cv)	0.34	0.43	0.29	0.52	0.55	0.35	0.56	0.36	0.56	0.56
24	SSE(cv)	17.16	16.34	20.52	13.60	12.36	16.97	12.40	16.81	12.51	12.44
25	SEE(cv)	0.43	0.38	0.42	0.35	0.34	0.42	0.33	0.42	0.33	0.33
26	F(cv)	41	38	37	42	44	42	42	41	43	41
27	M(cv)	65	69	63	67	67	65	70	66	68	68
28	T(cv)	106	107	100	109	111	107	112	107	111	109
29	F(cv)[%]	82.00	76.00	74.00	84.00	88.00	84.00	84.00	82.00	86.00	82.00
30	M(cv)[%]	89.04	94.52	86.30	91.78	91.78	89.04	95.89	90.41	93.15	93.15
31	T(cv)[%]	86.18	86.99	81.30	88.62	90.24	86.99	91.06	86.99	90.24	88.62
32	RS	0.68	0.91	0.97	0.84	0.84	0.69	0.91	0.69	0.91	0.93

Open in a new tab

In all analyzed cases, the following parameters were specified: 1) the number of independent variables of the nx model; 2) number of statistically significant independent variables - nx(s), 3) number of non-significant independent variables - nx(n); 4) Fisher factor corresponding to the analysis of ANOVA - F; 5) the level of statistical probability obtained at the analysis of ANOVA - p; 6) Durbin-Watson coefficient associated with error autocorrelation - DW; 7) number of values considered outliers obtained on the basis of standard residual analysis – nout; 8) percentage of data considered outliers – nout[%]; 9) number of datasets used in the construction of regression models - n(m); 10) values of Pearson correlation coefficients – r(m); 11) values of determination coefficients (squares of correlation coefficients) - R^2(m); 12) adjusted values of the square correlation coefficient - Adj R^2(m); 13) sum of squares error values – SSE(m); 14) standard error of estimate values – SEE(m); 15–17) the number of female, male, and total subjects for whom the regression model correctly identifies sex, and 18–20) the corresponding percentage of accurate predictions obtained. In case of validation of regression models (leave-one-out cross validation), Table 4 displays: nx is the number of independent variables of the model; nx(s) is the number of statistically significant independent variables; nx(n) is the number of non-significant independent variables; F is the Fisher factor corresponding to the analysis of ANOVA; p is the level of statistical probability obtained at the analysis of ANOVA; DW is the Durbin-Watson coefficient associated with error autocorrelation; nout is the number of values considered outliers obtained on the basis of standard residual analysis; novel[%] is the percentage of data considered outliers; n(m) is the number of datasets used in the construction of regression models; R(m) is the values of Pearson correlation coefficients; R^2(m) is the values of determination coefficients (squares of correlation coefficients); Adj R^2(m) is the adjusted values of the square correlation coefficient; SSE(m) is the sum of squares error values; SEE(m) is the standard error of estimate values; F(m), M(m), T(m) is the number of female, male, and total subjects for whom the regression model correctly identifies gender; F(m)[%], M(m)[%], T(m)[%] is the corresponding percentage of accurate predictions obtained; R(cv) is the values of the correlation coefficients Pearson; Adj R^2(cv) is the adjusted values of the square correlation coefficient; SSE(cv) is the sum of squares error values; SEE(cv) is the standard error of estimate values; F(cv), M(cv), T(cv) are the numbers of female, male subjects and the total number of subjects for whom the validation of the model correctly identifies the sex; F(cv)[%], M(cv)[%] and T(cv)[%] are the corresponding percentage of accurate predictions obtained; RS is the ratio of SSE(m) to SSE(cv).

In the first variant (v1) that considers the MLR model based on 27 independent variables, statistical analysis showed a gender prediction capacity of 95.12 % for the model and 86.18 % for its validation. The percentage of prediction obtained lower at validation than that of the model is caused by the fact that of the 27 independent variables considered, the number of statistically significant variables is only 7.

In the second version (v2), only 7 variables identified as statistically significant from the first model were maintained in the model. In this case, although the model resulted in a low prediction of 88.62 %, in case of validation the percentage is 86.99 %, slightly higher compared to the previous case in which all 27 predictors were considered. Analysis of the data showed that of the 7 variables in the model only 3 are statistically significant, due to multicollinearity.

In the third variant (v3), the MLR model was built only based on the 3 significant predictors obtained in the previous case. All 3 variables remained significant, but the model achieves a lower gender prediction of 81.30 %. This case corresponds to an underfitting case, the number of predictors being too small to explain the variation of the dependent variable. Comparing the three models, it is noticed that, as the proportion of significant predictors increases in the model, the standard error of estimate value of the model approaches the value at validation, their ratio (RS) increases from 0.68 to 0.97.

Variants fourth (v4) and fifth (v5) contain the data obtained when fitting the data corresponding to the initial 27 predictors by the FSR and BSR methods, respectively. The optimal variant determined in the case of FSR was a regression model containing 11 statistically significant predictors, for which the prediction capacity was 92.68 % and 88.62 % in the case of validation. In the case of BSR, the optimal model contains 13 significant predictors, resulting in a model prediction of 93.50 % and for validation of 90.24 %.

The seventh variant (v7) contains the multiple linear regression model based on the first 8 main components obtained in the case of PCA. All these components have a statistically significant influence on the model. The predictive power of the model is 93.50 % and in case of validation a prediction of 91.06 % is made, higher than the value of 86.99 % obtained based on the extended PCR model that considers 26 predictors.

The eight variant (v8) displays the data obtained in the PLS analysis corresponding to the extended model containing 26 extracted components (predictors), of which the first 7 are statistically significant. In this case, a predictive power of 86.99 % is obtained at validation. If only the first 7 components are retained in the PLS model, the predictive power at validation increases to 90.24 %, although the predictive power of the model decreases from 95.12 % to 92.68 %. Analysis of the data showed that of the 7 predictors considered, only 6 of them are significant. If only 6 significant predictors are retained in the PLS model, the predictive power decreases both in the case of the model (89.43 %) and in its validation (88.62 %), due to the underfitting situation.

After analyzing all 10 considered variants, the PCR model is selected as the most accurate variant for gender prediction based on the processing of FTIR spectra of fingernails, the model including the first 8 main components extracted from the initial independent variables (v7). In the case of women, the model has a success rate of 86.00 % of gender identification, lower than the value of 98.63 % obtained in the case of men, the overall prediction power of the model being 93.50 %. In the case of predictive power analysis by cross-validation method, the values of 84.00 %, 95.89 % and 91.06 % are obtained, corresponding to the correct identification of gender for women and men, respectively for all 123 considered subjects. The lower prediction of gender in the case of women was also signaled by Sharma et al. [34], which obtained the correct classification based on FTIR spectra of 87.5 % of cases for women and 91 % for men, the group of volunteers consisted in 50 women and 50 men.

Fig. 4 shows the variant of the dependent variable explained by the components extracted by PCA and Fig. 5 displays the scatter plot between PC1, PC2 and PC3 components values for females and males’ samples. The first three components explain 90.68 % of the variability of the dependent variable. Fig. 6, Fig. 7 show the dependencies between predicted values of the model (respectively predicted values obtained at cross-validation) depending on the actual values. Fig. 8 displays the predicted values variations obtained at validation against the predicted values of the regression model. Between the two there is a linear relationship characterized by a square correlation coefficient of 0.99, which indicates that the predictive power when using the model in new situations is very close to the predictive power of the model.

b)
The results obtained in the age prediction of subjects by applying the regression models MLR, FSR, BSR, PCR and PLS are shown in Table 5.

Fig. 5 — Scatter plot between PC1, PC2 and PC3 components values for females and males samples: red dots – female; blus dots – male.

Fig. 6 — The variation of predicted values against the observed values. PCR model (8 components).

Fig. 7 — The variation of cross-validated predicted values against the observed values. PCR model (8 components).

Fig. 8 — The variation of cross-validated predicted values against the predicted values. PCR model (8 components).

Table 5.

Results obtained by fitting the experimental data on the considered regression models.

	Age	v1	v2	v3	v4	v5	v6	v7	v8
	Regression model	MRL	MRL	FSR	BSR	PCR	PCR	PLS	PLS
1	nx	24	7	4	6	23	6	23	8
2	nx(s)	7	7	4	6	6	6	8	8
3	nx(n)	17	0	0	0	17	0	15	0
4	F	12.53	32.30	47.44	45.86	10.43	39.83	13.21	33.10
5	p	2.26E-20	2.11E-24	1.02E-23	2.14E-28	1.65E-17	5.47E-26	6.12E-21	2.24E-26
6	DW	1.35	1.35	1.52	1.54	1.46	1.50	1.58	1.54
7	nout	6	7	6	8	6	9	7	9
8	nout[%]	4.88	5.69	4.88	6.50	4.88	7.32	5.69	7.32
9	n(m)	123	123	123	123	123	123	123	123
10	R(m)	0.87	0.81	0.79	0.84	0.84	0.82	0.87	0.84
11	R^2(m)	0.75	0.66	0.62	0.70	0.71	0.67	0.75	0.70
12	AdjR^2(m)	0.69	0.64	0.60	0.69	0.64	0.66	0.70	0.68
13	SSE(m)	5594	7674	8728	6750	6648	7438	5594	6850
14	SEE(m)	7.56	8.17	8.60	7.63	8.20	8.01	7.52	7.75
15	RE(%)(m)	10.04	12.44	12.80	11.39	11.31	11.85	10.04	11.46

leave-one-out cross validation

16	R(cv)	0.53	0.70	0.76	0.82	0.50	0.79	0.55	0.80
17	R^2(cv)	0.28	0.49	0.58	0.67	0.25	0.62	0.30	0.64
18	AdjR^2(cv)	0.11	0.46	0.57	0.65	0.08	0.61	0.14	0.61
19	SSE(cv)	23677	11987	9609	7572	25341	8626	22418	8367
20	SEE(cv)	15.54	10.21	9.02	8.08	16.00	8.62	15.05	8.57
21	RE(%)(cv)	14.68	13.87	13.47	12.13	16.57	12.74	14.47	12.72
22	RS	0.49	0.80	0.95	0.94	0.51	0.93	0.50	0.91

Open in a new tab

For all analyzed cases, two additional parameters were included: (i) RE(%)(m), which represents the arithmetic mean of the absolute errors between the modeled age of the subjects and the actual age of the subjects, and (ii) RE(%)(cv) a similar parameter takes into account the age of the subjects obtained at leave-one-out cross validation. In Table 5, the first two cases (v1 and v2) consider the MLR model. In the first case (v1), analysis of the data showed that of the 24 predictors only 7 are statistically significant. The model showed an average error of 10.04 % (RE(%)(m)) and 14.68 % in validation (RE(%)(cv)). Retention of only significant predictors (v2) in the model generated an increase in the mean error of the model to 12.44 %, but the mean error in validation decreased to 13.87 %.

Among the step-by-step regression methods (FSR – v3 variant and BSR – v4 variant), the BSR method using 6 statistically significant predictors generated the best results. Although the average model error was 11.39 % (higher compared to v1), the mean prediction error obtained at validation was the lowest 12.13 %. The use of PCR and PLS methods did not result in better results. Of these, the PLS model using 8 extracted components is closer to the optimal variant, the average error obtained at validation was 12.72 %.

Fig. 9 shows for the optimal model obtained (v4), a linear dependence between the modeled values of the age of the subjects and the age values resulted from the validation of the regression model. Among these, the square correlation coefficient was 0.9970, which shows that the performance in predictions of new subjects is close to the performance of the baseline regression model. For the optimal model of estimating the age of the subjects, Fig. 10 shows the dependence between the accuracy of estimating the age of the subjects (mean error) and the number of cases analyzed (in percentage expression), both for the regression model and in case of its validation. The data showed that for 34.96 % of subjects, the age could be determined with an error of 5 %, and for 65.04 % of subjects with an error of maximum 10 %. In 95.12 % of subjects, the maximum age error was 50 %.

Fig. 10 — The correlation between the accuracy of estimating the age of subjects (mean error), depending on the number of cases analyzed.

4. Conclusions

The aim of the paper was to explore the fast prediction of gender and age of a subject based on the FTIR spectra of fingernails clippings and chemometric analysis. From the raw spectra were considered two spectral ranges of 3600–2800 and 1800-1000 cm⁻¹, which were initially processed by applying baseline and scaling on the 0–1 range. The results obtained are comparable to those obtained by Yadav et al. [35] who performed the differentiation of sex and age based on partial least-square analysis (PLS-R) and PCA. The use by Mitu et al. [36] of a machine learning algorithm based on an artificial neural network (ANN) had the advantage of the best performance in differentiating sex based on FTIR spectra on nail samples, but has the disadvantage of not knowing the mathematical apparatus and the connections inside the black box represented by ANN.

The chemometric analysis considered 5 possibilities of multiple linear regression, in terms of Multiple linear regression (MLR), Forward stepwise regression (FSR), Backward stepwise regression (BSR), Principal component regression (PCR) and Partial least squares regression (PLS). The PCR model was found optimal for the gender prediction, with 8 extracted components (statistically significant) which allowed an accurate prediction in 93.50 % of cases (86.00 % for women and 98.63 % for men). The predictive power of the model showed that in case of new subjects, the gender prediction can be done in 91.06 % of cases (84.00 % for women and 95.89 % for men). The BSR model proved to be optimal for the age prediction, with 6 statistically significant predictors resulted in an average error of 11.39 % (13.70 % for women and 9.81 % for men). The predictive error of the model was 12.13 % (11.73 % for women and 10.35 % for men). For 65.04 % of subjects, the age could be predicted with an error of maximum 10.00 %. Since in the case of 95.12 % of subjects the maximum error of age prediction was 50 %, we can conclude that lower performance is obtained for the age prediction as compared to gender prediction.

However, the prediction of gender and age of an unknown subject based on FTIR spectra of fingernails clippings and chemometric analysis can be a useful tool especially in the field of forensic investigations.

Consent for volunteers

Informed written consent was obtained from all volunteers. The study was approved by the Ethics Committee of Technical University of Cluj Napoca Romania (474/08.12.2022) and complies the principles of the declaration of Helsinki as revised in 2003.

Declaration of competing interest

The author declares that he is not aware of any competing financial interests or personal relationships that could appear to influence the work reported in this study.

References

1.Bunaciu A., Serban F., Hassan Y.A.E. Evaluation of the protein secondary structures using Fourier transform infrared spectroscopy. Gazi Univ. J. Sci. 2014;27(1):637–664. [Google Scholar]
2.Carbonaro M., Nucara A. Secondary structure of food proteins by Fourier transform spectroscopy in the mid-infrared region. AminoAcids. 2010;38:679–690. doi: 10.1007/s00726-009-0274-3. [DOI] [PubMed] [Google Scholar]
3.Nunes K.M., Andrade M.V.O., Santos Filho A.M.P., Lasmar M.C., Sena M.M. Detection and characterisation of frauds in bovine meat in natura by non-meat ingredient additions using data fusion of chemical parameters and ATR-FTIR spectroscopy. Food Chem. 2016;205:14–22. doi: 10.1016/j.foodchem.2016.02.158. [DOI] [PubMed] [Google Scholar]
4.Kirschner C., Ofstad R., Skarpeid H.J., Host V., Kohler A. Monitoring of denaturation processes in aged beef loin by Fourier Transform Infrared microspectroscopy. J. Agric. Food Chem. 2004;52:3920–3929. doi: 10.1021/jf0306136. [DOI] [PubMed] [Google Scholar]
5.Alkhuder K. Fourier-transform infrared spectroscopy: a universal optical sensing technique with auspicious application prospects in the diagnosis and management of autoimmune diseases. Photodiagnosis Photodyn. Ther. 2023;42 doi: 10.1016/j.pdpdt.2023.103606. [DOI] [PubMed] [Google Scholar]
6.Carmona P., Molina M., Calero M., Bermejo-Pareja F., Martínez-Martín P., Alvarez I., Toledano A. Infrared spectroscopic analysis of mononuclear leukocytes in peripheral blood from Alzheimer's disease patients. Anal. Bioanal. Chem. 2012;402:2015–2021. doi: 10.1007/s00216-011-5669-9. [DOI] [PubMed] [Google Scholar]
7.Kourkoumelis N., Zhang X., Lin Z., Wang J. Fourier transform infrared spectroscopy of bone tissue: bone quality assessment in preclinical and clinical applications of osteoporosis and fragility fracture. Clin. Rev. Bone Miner. Metabol. 2019;17:24–39. doi: 10.1007/s12018-018-9255-y. [DOI] [Google Scholar]
8.Lechowicz L., Chrapek M., Gaweda J., Urbaniak M., Konieczna I. Use of Fourier-transform infrared spectroscopy in the diagnosis of rheumatoid arthritis: a pilot study. Mol. Biol. Rep. 2016;43:1321–1326. doi: 10.1007/s11033-016-4079-7. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Muehrcke C.R. The fingernails in Chronic hypoalbuminaemia, a new physical sign. Br. Med. J. 1956;9:1327–1328. doi: 10.1136/bmj.1.4979.1327. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Zheng N., Yang T., Liang M., Zhang H., Li L., Sunnassee A., Liu L. Characterization of protein in old myocardial infarction by FTIR micro-spectroscopy. Huazhong Univ. Sci. Technol. [Med. Sci.] 2010;30:546–550. doi: 10.1007/s11596-010-0466-9. [DOI] [PubMed] [Google Scholar]
11.Youn J.-I., Milner T.E. Evaluation of photothermal effects in cartilage using FT-IR spectroscopy. Laser Med. Sci. 2008;23:229–235. doi: 10.1007/s10103-007-0464-8. [DOI] [PubMed] [Google Scholar]
12.Scott S.A., Renaud D.E., Krishnasamy S., Meriç P., Buduneli N., Çetinkalp Ş., Liu K.-Z. Diabetes-related molecular signatures in infrared spectra of human saliva. Diabetol. Metab. Syndr. 2010;2:48. doi: 10.1186/1758-5996-2-48. [DOI] [PMC free article] [PubMed] [Google Scholar]
13.Plotnikova L.V., Kobelevab M.O., Borisova E.V., Garifullinc A.D., Povolotskaya A.V., Voloshinc S.V., Polyanichko A.M. Infrared spectroscopy of blood serum from patients with multiple myeloma. Cell Tissue Biol. 2019;13(2):130–135. doi: 10.1134/S1990519X19020093. [DOI] [Google Scholar]
14.Tolstorozhev G.B., Skornyakov I.V., Butra V.A. Infrared spectra of thyroid tumor tissues. J. Appl. Spectrosc. 2010;77(3):427–431. doi: 10.1007/s10812-010-9349-x. [DOI] [Google Scholar]
15.Al-Muslet N.A., Ali E.E. Spectroscopic analysis of bladder cancer tissues using Fourier Transform infrared spectroscopy. J. Appl. Spectrosc. 2012;79(1):139–142. doi: 10.1007/s10812-012-9575-5. [DOI] [Google Scholar]
16.Khanmohammadi M., Garmarudi A.B., Samani S., Ghasemi K., Ashuri A. Application of linear discriminant analysis and attenuated total reflectance Fourier transform infrared microspectroscopy for diagnosis of colon cancer. Pathol. Oncol. Res. 2011;17:435–441. doi: 10.1007/s12253-010-9326-y. [DOI] [PubMed] [Google Scholar]
17.Mehrotra R., Tyagi G., Jangir D.K., Dawar R., Gupta N. Analysis of ovarian tumor pathology by Fourier transform infrared spectroscopy. J. Ovarian Res. 2010;3:27. doi: 10.1186/1757-2215-3-27. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Bel’skaya L.V. Use of IR spectroscopy in cancer diagnosis. A review. J. Appl. Spectrosc. 2019;86(2):187–205. doi: 10.1007/s10812-019-00800-w. [DOI] [Google Scholar]
19.Petibois C., Déléris G., Cazorla G. Perspectives in the utilisation of Fourier-transform infrared spectroscopy of serum in sports medicine. Sports Med. 2000;29(6):387–396. doi: 10.2165/00007256-200029060-00002. [DOI] [PubMed] [Google Scholar]
20.Lemos P.N., Anderson R.A., Robertson J.R. Nail analysis for drugs of abuse: extraction and determination of cannabis in fingernails by RIA and GC-MS. J. Anal. Toxicol. 1999;23:147–152. doi: 10.1093/jat/23.3.147. [DOI] [PubMed] [Google Scholar]
21.Chouhan P., Saini T.R. Designing a test for nail safety evaluation to select nail-friendly permeation enhancers. Indian J. Pharmaceut. Sci. 2018;80(4):694–701. doi: 10.4172/pharmaceutical-sciences.1000409. [DOI] [Google Scholar]
22.Coopman R., van de Vyver T., Kishabongo A.S., Katchunga P., van Aken E.H., Cikomola J., Monteyne T., Speeckaert M.M., Delanghe J.R. Glycation in human fingernail clippings using ATR-FTIR spectrometry, a new marker for the diagnosis and monitoring of diabetes mellitus. Clin. Biochem. 2017;50:62–67. doi: 10.1016/j.clinbiochem.2016.09.001. [DOI] [PubMed] [Google Scholar]
23.Farhan K.M., Sastry T.P., Mandal A.B. Comparative study on secondary structural changes in diabetic and non-diabetic human fingernail specimen by using FTIR spectra. Clin. Chim. Acta. 2011;412:386–389. doi: 10.1016/j.cca.2010.11.016. [DOI] [PubMed] [Google Scholar]
24.Coroaba A., Pinteala T., Chiriac A., Chiriac A.E., Simionescu B.C., Pinteala M. Degradation mechanism induced by psoriasis in human fingernails: a different approach. J. Invest. Dermatol. 2016;136:311–313. doi: 10.1038/JID.2015.387. [DOI] [PubMed] [Google Scholar]
25.Sakudo A., Kuratsune H., Kato Y.H., Ikuta K. Secondary structural changes of proteins in fingernails of chronic fatigue syndrome patients from Fourier-transform infrared spectra. Clinic. Chim. Acta. 2009;402:75–78. doi: 10.1016/j.cca.2008.12.020. [DOI] [PubMed] [Google Scholar]
26.Sihota P., Yadav R.N., Dhiman V., Bhadada S.K., V M., Kumar N. Investigation of diabetic patient's fingernail quality to monitor type 2 diabetes induced tissue damage. Sci. Rep. 2019;9:3193. doi: 10.1038/s41598-019-39951-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
27.Ensieh N.-E., Faridbod F., Larijani B., Ganjali M.R., Norouzi P. Trace element analysis of hair, nail, serum and urine of diabetes mellitus patients by inductively coupled plasma atomic emission spectroscopy. Iran. J. Diabetes & Lipid Disord. 2011;10:1–9. [Google Scholar]
28.Cappelle D., Yegles M., Neels H., van Nuijs A.L.N., de Doncker M., Maudens K., Covaci A., Crunelle C.L. Nail analysis for the detection of drugs of abuse and pharmaceuticals: a review. Forensic Toxicol. 2015;33:12–36. doi: 10.1007/s11419-014-0258-1. [DOI] [Google Scholar]
29.Rodushkin I., Axelsson M.D. Application of double focusing sector field ICP-MS for multielemental characterization of human hair and nails. Part II. A study of the inhabitants of northern Sweden. Sci. Total Environ. 2000;262:21–36. doi: 10.1016/s0048-9697(00)00531-3. [DOI] [PubMed] [Google Scholar]
30.Wongsasuluk P., Chotpantarat S., Siriwong W., Robson M. Using hair and fingernails in binary logistic regression for bio-monitoring of heavy metals/metalloid in groundwater in intensively agricultural areas, Thailand. Environ. Res. 2018;162:106–118. doi: 10.1016/j.envres.2017.11.024. [DOI] [PubMed] [Google Scholar]
31.Kumar A., Garg S., Hanmandlu M. Biometric authentication using fingernail plates. Expert Syst. Appl. 2014;41:373–386. doi: 10.1109/ICETETS.2016.7603054. [DOI] [Google Scholar]
32.Parmar P., Rathod G.B. Forensic Onychology: an essential entity against crime. Indian Acad. Forensic Med. 2012;34:4. [Google Scholar]
33.Fokias K., Dierckx L., van de Voorde W., Bekaert B. Age determination through DNA methylation patterns in fingernails and toenails. Forensic Sci. Int. Genet. 2023;4 doi: 10.1016/j.fsigen.2023.102846. [DOI] [PubMed] [Google Scholar]
34.Sharma A., Verma R., Kumar R., Chauhan R., Sharma V. Chemometric analysis of ATR-FTIR spectra of fingernail clippings for classification and prediction of sex in forensic context. Microchem. J. 2020;159 doi: 10.1016/j.microc.2020.105504. [DOI] [Google Scholar]
35.Yadav A., Nimi C., Bhatia D., Rani N., Singh R. Estimation of age and sex from fingernail clippings by using ATR-FTIR spectroscopy coupled with chemometric interpretation. Int. J. Leg. Med. 2024;138:2401–2410. doi: 10.1007/s00414-024-03275-3. [DOI] [PubMed] [Google Scholar]
36.Mitu B., Trojan V., Halamkova L. Sex determination of human nails based on attenuated total Reflection Fourier Transform Infrared Spectroscopy in forensic context. Sensors. 2023;23:9412. doi: 10.3390/s23239412. [DOI] [PMC free article] [PubMed] [Google Scholar]
37.Widjaja E., Lim G.H., An A. A novel method for human gender classification using Raman spectroscopy of fingernail clippings. Analyst. 2008;133:493–498. doi: 10.1039/B712389B. [DOI] [PubMed] [Google Scholar]
38.Mitu B., Cerda M., Hrib R., Troian V., Halamkova L. Attenuated total reflection Fourier Transform Infrared Spectroscopy for forensic screening of long-term alcohol consumption from human nails. ACS Omega. 2023;8(24):22203–22210. doi: 10.1021/acsomega.3c02579. [DOI] [PMC free article] [PubMed] [Google Scholar]
39.Romano E., Cataldo P.G., Iramain M.A., Castillo M.V., Manzur M.E., Brandan S.A. Identification of cholesterol in different media by using the FT-IR, FT-Raman and UV–visible spectra combined with DFT calculations. J. Mol. Liq. 2024;403 doi: 10.1016/j.molliq.2024.124879. [DOI] [Google Scholar]

[bib1] 1.Bunaciu A., Serban F., Hassan Y.A.E. Evaluation of the protein secondary structures using Fourier transform infrared spectroscopy. Gazi Univ. J. Sci. 2014;27(1):637–664. [Google Scholar]

[bib2] 2.Carbonaro M., Nucara A. Secondary structure of food proteins by Fourier transform spectroscopy in the mid-infrared region. AminoAcids. 2010;38:679–690. doi: 10.1007/s00726-009-0274-3. [DOI] [PubMed] [Google Scholar]

[bib3] 3.Nunes K.M., Andrade M.V.O., Santos Filho A.M.P., Lasmar M.C., Sena M.M. Detection and characterisation of frauds in bovine meat in natura by non-meat ingredient additions using data fusion of chemical parameters and ATR-FTIR spectroscopy. Food Chem. 2016;205:14–22. doi: 10.1016/j.foodchem.2016.02.158. [DOI] [PubMed] [Google Scholar]

[bib4] 4.Kirschner C., Ofstad R., Skarpeid H.J., Host V., Kohler A. Monitoring of denaturation processes in aged beef loin by Fourier Transform Infrared microspectroscopy. J. Agric. Food Chem. 2004;52:3920–3929. doi: 10.1021/jf0306136. [DOI] [PubMed] [Google Scholar]

[bib5] 5.Alkhuder K. Fourier-transform infrared spectroscopy: a universal optical sensing technique with auspicious application prospects in the diagnosis and management of autoimmune diseases. Photodiagnosis Photodyn. Ther. 2023;42 doi: 10.1016/j.pdpdt.2023.103606. [DOI] [PubMed] [Google Scholar]

[bib6] 6.Carmona P., Molina M., Calero M., Bermejo-Pareja F., Martínez-Martín P., Alvarez I., Toledano A. Infrared spectroscopic analysis of mononuclear leukocytes in peripheral blood from Alzheimer's disease patients. Anal. Bioanal. Chem. 2012;402:2015–2021. doi: 10.1007/s00216-011-5669-9. [DOI] [PubMed] [Google Scholar]

[bib7] 7.Kourkoumelis N., Zhang X., Lin Z., Wang J. Fourier transform infrared spectroscopy of bone tissue: bone quality assessment in preclinical and clinical applications of osteoporosis and fragility fracture. Clin. Rev. Bone Miner. Metabol. 2019;17:24–39. doi: 10.1007/s12018-018-9255-y. [DOI] [Google Scholar]

[bib8] 8.Lechowicz L., Chrapek M., Gaweda J., Urbaniak M., Konieczna I. Use of Fourier-transform infrared spectroscopy in the diagnosis of rheumatoid arthritis: a pilot study. Mol. Biol. Rep. 2016;43:1321–1326. doi: 10.1007/s11033-016-4079-7. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib9] 9.Muehrcke C.R. The fingernails in Chronic hypoalbuminaemia, a new physical sign. Br. Med. J. 1956;9:1327–1328. doi: 10.1136/bmj.1.4979.1327. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib10] 10.Zheng N., Yang T., Liang M., Zhang H., Li L., Sunnassee A., Liu L. Characterization of protein in old myocardial infarction by FTIR micro-spectroscopy. Huazhong Univ. Sci. Technol. [Med. Sci.] 2010;30:546–550. doi: 10.1007/s11596-010-0466-9. [DOI] [PubMed] [Google Scholar]

[bib11] 11.Youn J.-I., Milner T.E. Evaluation of photothermal effects in cartilage using FT-IR spectroscopy. Laser Med. Sci. 2008;23:229–235. doi: 10.1007/s10103-007-0464-8. [DOI] [PubMed] [Google Scholar]

[bib12] 12.Scott S.A., Renaud D.E., Krishnasamy S., Meriç P., Buduneli N., Çetinkalp Ş., Liu K.-Z. Diabetes-related molecular signatures in infrared spectra of human saliva. Diabetol. Metab. Syndr. 2010;2:48. doi: 10.1186/1758-5996-2-48. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib13] 13.Plotnikova L.V., Kobelevab M.O., Borisova E.V., Garifullinc A.D., Povolotskaya A.V., Voloshinc S.V., Polyanichko A.M. Infrared spectroscopy of blood serum from patients with multiple myeloma. Cell Tissue Biol. 2019;13(2):130–135. doi: 10.1134/S1990519X19020093. [DOI] [Google Scholar]

[bib14] 14.Tolstorozhev G.B., Skornyakov I.V., Butra V.A. Infrared spectra of thyroid tumor tissues. J. Appl. Spectrosc. 2010;77(3):427–431. doi: 10.1007/s10812-010-9349-x. [DOI] [Google Scholar]

[bib15] 15.Al-Muslet N.A., Ali E.E. Spectroscopic analysis of bladder cancer tissues using Fourier Transform infrared spectroscopy. J. Appl. Spectrosc. 2012;79(1):139–142. doi: 10.1007/s10812-012-9575-5. [DOI] [Google Scholar]

[bib16] 16.Khanmohammadi M., Garmarudi A.B., Samani S., Ghasemi K., Ashuri A. Application of linear discriminant analysis and attenuated total reflectance Fourier transform infrared microspectroscopy for diagnosis of colon cancer. Pathol. Oncol. Res. 2011;17:435–441. doi: 10.1007/s12253-010-9326-y. [DOI] [PubMed] [Google Scholar]

[bib17] 17.Mehrotra R., Tyagi G., Jangir D.K., Dawar R., Gupta N. Analysis of ovarian tumor pathology by Fourier transform infrared spectroscopy. J. Ovarian Res. 2010;3:27. doi: 10.1186/1757-2215-3-27. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib18] 18.Bel’skaya L.V. Use of IR spectroscopy in cancer diagnosis. A review. J. Appl. Spectrosc. 2019;86(2):187–205. doi: 10.1007/s10812-019-00800-w. [DOI] [Google Scholar]

[bib19] 19.Petibois C., Déléris G., Cazorla G. Perspectives in the utilisation of Fourier-transform infrared spectroscopy of serum in sports medicine. Sports Med. 2000;29(6):387–396. doi: 10.2165/00007256-200029060-00002. [DOI] [PubMed] [Google Scholar]

[bib20] 20.Lemos P.N., Anderson R.A., Robertson J.R. Nail analysis for drugs of abuse: extraction and determination of cannabis in fingernails by RIA and GC-MS. J. Anal. Toxicol. 1999;23:147–152. doi: 10.1093/jat/23.3.147. [DOI] [PubMed] [Google Scholar]

[bib21] 21.Chouhan P., Saini T.R. Designing a test for nail safety evaluation to select nail-friendly permeation enhancers. Indian J. Pharmaceut. Sci. 2018;80(4):694–701. doi: 10.4172/pharmaceutical-sciences.1000409. [DOI] [Google Scholar]

[bib22] 22.Coopman R., van de Vyver T., Kishabongo A.S., Katchunga P., van Aken E.H., Cikomola J., Monteyne T., Speeckaert M.M., Delanghe J.R. Glycation in human fingernail clippings using ATR-FTIR spectrometry, a new marker for the diagnosis and monitoring of diabetes mellitus. Clin. Biochem. 2017;50:62–67. doi: 10.1016/j.clinbiochem.2016.09.001. [DOI] [PubMed] [Google Scholar]

[bib23] 23.Farhan K.M., Sastry T.P., Mandal A.B. Comparative study on secondary structural changes in diabetic and non-diabetic human fingernail specimen by using FTIR spectra. Clin. Chim. Acta. 2011;412:386–389. doi: 10.1016/j.cca.2010.11.016. [DOI] [PubMed] [Google Scholar]

[bib24] 24.Coroaba A., Pinteala T., Chiriac A., Chiriac A.E., Simionescu B.C., Pinteala M. Degradation mechanism induced by psoriasis in human fingernails: a different approach. J. Invest. Dermatol. 2016;136:311–313. doi: 10.1038/JID.2015.387. [DOI] [PubMed] [Google Scholar]

[bib25] 25.Sakudo A., Kuratsune H., Kato Y.H., Ikuta K. Secondary structural changes of proteins in fingernails of chronic fatigue syndrome patients from Fourier-transform infrared spectra. Clinic. Chim. Acta. 2009;402:75–78. doi: 10.1016/j.cca.2008.12.020. [DOI] [PubMed] [Google Scholar]

[bib26] 26.Sihota P., Yadav R.N., Dhiman V., Bhadada S.K., V M., Kumar N. Investigation of diabetic patient's fingernail quality to monitor type 2 diabetes induced tissue damage. Sci. Rep. 2019;9:3193. doi: 10.1038/s41598-019-39951-3. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib27] 27.Ensieh N.-E., Faridbod F., Larijani B., Ganjali M.R., Norouzi P. Trace element analysis of hair, nail, serum and urine of diabetes mellitus patients by inductively coupled plasma atomic emission spectroscopy. Iran. J. Diabetes & Lipid Disord. 2011;10:1–9. [Google Scholar]

[bib28] 28.Cappelle D., Yegles M., Neels H., van Nuijs A.L.N., de Doncker M., Maudens K., Covaci A., Crunelle C.L. Nail analysis for the detection of drugs of abuse and pharmaceuticals: a review. Forensic Toxicol. 2015;33:12–36. doi: 10.1007/s11419-014-0258-1. [DOI] [Google Scholar]

[bib29] 29.Rodushkin I., Axelsson M.D. Application of double focusing sector field ICP-MS for multielemental characterization of human hair and nails. Part II. A study of the inhabitants of northern Sweden. Sci. Total Environ. 2000;262:21–36. doi: 10.1016/s0048-9697(00)00531-3. [DOI] [PubMed] [Google Scholar]

[bib30] 30.Wongsasuluk P., Chotpantarat S., Siriwong W., Robson M. Using hair and fingernails in binary logistic regression for bio-monitoring of heavy metals/metalloid in groundwater in intensively agricultural areas, Thailand. Environ. Res. 2018;162:106–118. doi: 10.1016/j.envres.2017.11.024. [DOI] [PubMed] [Google Scholar]

[bib31] 31.Kumar A., Garg S., Hanmandlu M. Biometric authentication using fingernail plates. Expert Syst. Appl. 2014;41:373–386. doi: 10.1109/ICETETS.2016.7603054. [DOI] [Google Scholar]

[bib32] 32.Parmar P., Rathod G.B. Forensic Onychology: an essential entity against crime. Indian Acad. Forensic Med. 2012;34:4. [Google Scholar]

[bib33] 33.Fokias K., Dierckx L., van de Voorde W., Bekaert B. Age determination through DNA methylation patterns in fingernails and toenails. Forensic Sci. Int. Genet. 2023;4 doi: 10.1016/j.fsigen.2023.102846. [DOI] [PubMed] [Google Scholar]

[bib34] 34.Sharma A., Verma R., Kumar R., Chauhan R., Sharma V. Chemometric analysis of ATR-FTIR spectra of fingernail clippings for classification and prediction of sex in forensic context. Microchem. J. 2020;159 doi: 10.1016/j.microc.2020.105504. [DOI] [Google Scholar]

[bib35] 35.Yadav A., Nimi C., Bhatia D., Rani N., Singh R. Estimation of age and sex from fingernail clippings by using ATR-FTIR spectroscopy coupled with chemometric interpretation. Int. J. Leg. Med. 2024;138:2401–2410. doi: 10.1007/s00414-024-03275-3. [DOI] [PubMed] [Google Scholar]

[bib36] 36.Mitu B., Trojan V., Halamkova L. Sex determination of human nails based on attenuated total Reflection Fourier Transform Infrared Spectroscopy in forensic context. Sensors. 2023;23:9412. doi: 10.3390/s23239412. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib37] 37.Widjaja E., Lim G.H., An A. A novel method for human gender classification using Raman spectroscopy of fingernail clippings. Analyst. 2008;133:493–498. doi: 10.1039/B712389B. [DOI] [PubMed] [Google Scholar]

[bib38] 38.Mitu B., Cerda M., Hrib R., Troian V., Halamkova L. Attenuated total reflection Fourier Transform Infrared Spectroscopy for forensic screening of long-term alcohol consumption from human nails. ACS Omega. 2023;8(24):22203–22210. doi: 10.1021/acsomega.3c02579. [DOI] [PMC free article] [PubMed] [Google Scholar]

[bib39] 39.Romano E., Cataldo P.G., Iramain M.A., Castillo M.V., Manzur M.E., Brandan S.A. Identification of cholesterol in different media by using the FT-IR, FT-Raman and UV–visible spectra combined with DFT calculations. J. Mol. Liq. 2024;403 doi: 10.1016/j.molliq.2024.124879. [DOI] [Google Scholar]

PERMALINK

The application of multiple linear regression methods to FTIR spectra of fingernails for predicting gender and age of human subjects

L Mihaly Cozmuta

Abstract

Graphical abstract

Highlights

1. Introduction

2. Materials and methods

2.1. Samples collection and preparation

2.2. Instrumental parameters

2.3. Pre-treatment of the data

2.4. Statistical analysis

2.4.1. Data variability

2.4.2. Regression methods used in data analysis

2.4.3. Cross-validation method

3. Results and discussion

3.1. Spectral features

Table 1.

3.2. Data variability

Table 2.

Table 3.

Fig. 1.

Fig. 2.

3.3. Identification of independent variables correlated with gender and age of subjects

Fig. 3.

3.4. Results of regression models

Table 4.

Fig. 4.

Fig. 5.

Fig. 6.

Fig. 7.

Fig. 8.

Table 5.

Fig. 9.

Fig. 10.

4. Conclusions

Consent for volunteers

Declaration of competing interest

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases