Skip to main content
ACS Omega logoLink to ACS Omega
. 2021 Jul 22;6(30):19665–19674. doi: 10.1021/acsomega.1c02317

Method Superior to Traditional Spectral Identification: FT-NIR Two-Dimensional Correlation Spectroscopy Combined with Deep Learning to Identify the Shelf Life of Fresh Phlebopus portentosus

Li Wang , Jieqing Li , Tao Li §, Honggao Liu ∥,*, Yuanzhong Wang ⊥,*
PMCID: PMC8340397  PMID: 34368554

Abstract

graphic file with name ao1c02317_0008.jpg

The taste of fresh mushrooms is always appealing. Phlebopus portentosus is the only porcini that can be cultivated artificially in the world, with a daily output of up to 2 tons and a large sales market. Fresh mushrooms are very susceptible to microbial attacks when stored at 0–2 °C for more than 5 days. Therefore, the freshness of P. portentosus must be evaluated during its refrigeration to ensure food safety. According to their freshness, the samples were divided into three categories, namely, category I (1–2 days, 0–48 h, recommended for consumption), category II (3–4 days, 48–96 h, recommended for consumption), and category III (5–6 days, 96–144 h, not recommended). In our study, a fast and reliable shelf life identification method was established through Fourier transform near-infrared (FT-NIR) spectroscopy combined with a machine learning method. Deep learning (DL) is a new focus in the field of food research, so we established a deep learning classification model, traditional support-vector machine (SVM), partial least-squares discriminant analysis (PLS-DA), and an extreme learning machine (ELM) model to identify the shelf life of P. portentosus. The results showed that FT-NIR two-dimensional correlation spectroscopy (2DCOS) combined with the deep learning model was more suitable for the identification of fresh mushroom shelf life and the model had the best robustness. In conclusion, FT-NIR combined with machine learning had the advantages of being nondestructive, fast, and highly accurate in identifying the shelf life of P. portentosus. This method may become a promising rapid analysis tool, which can quickly identify the shelf life of fresh edible mushrooms.

1. Introduction

Edible mushrooms are a huge biological group with a wide variety of species. Because of their rich flavor and nutrients, they have a high medicinal and edible value.1 They are an important source of human food and a valuable environmental biological resource.2Phlebopus portentosus (Berk. and Broome) Boedijn, belonging to the order Boletales, is a popular edible mushroom in China and Thailand.3P. portentosus is widely distributed in Asia, America, Oceania, and other tropical regions.4,5 It is not an ectomycorrhizal fungus but a saprophytic fungus and is cultivated by saprophytic methods.6 So far, P. portentosus is the only species in Boletales that can be artificially cultivated worldwide.5 However, in most cases, the mycelium of P. portentosus forms fungus–insect galls with mealworms in the soil. The number of galls has a positive effect on its growth; that is, the more the number, the higher the biomass.7P. portentosus fruiting bodies are rich in protein, crude fat, polysaccharides, amino acids, and mineral elements. Because of its antioxidant, antiproliferative, and neuroprotective properties, it can be used as a medicine or functional food.8 At present, the cultivation industry of P. portentosus is well developed and can produce two tons per day.

Fresh mushrooms contain water, mineral salts, vitamins, and various enzymes, including amylase, laccase, catalase, cellulase, and peroxidase. The activity of these enzymes causes the mushroom’s nutrients to continue to decline during the storage period.9 Besides, fresh mushrooms are extremely perishable under normal temperature conditions. Their shelf life is usually 1–2 days at room temperature and about 3–5 days at 0–2 °C. After being stored at 0–2 °C for 5 days, mushrooms are extremely vulnerable to microorganisms.10,11 Moreover, with the increase of storage time, the nutrients (protein, fat, carbohydrates) and aroma components (1-octen-3-ol, 1-octen-3-one) of porcini decrease significantly.12,13

Due to the shelf life of fresh mushrooms being very short, the evaluation of their freshness is greatly important. It is important to ensure that mushrooms sold in the market are fresh, which is a prerequisite for protecting the health of consumers. In addition, quickly identifying the freshness of mushrooms helps to assess their market value and avoid food waste. Moisture is an influential factor affecting the deterioration of fresh mushrooms. To prevent the growth of microorganisms in fresh mushrooms, it is necessary to minimize the water content.14,15 However, the taste of mushrooms changes after dehydration, which will affect the consumer experience. Conventional freshness detection methods include microbiological methods and physical and chemical index detection methods.1619 However, these methods require sample preprocessing, and the detection process is complicated and time-consuming. In addition, the fingerprint analysis,2022 liquid chromatography–mass spectrometry (LC–MS), combined with the chemometrics23 or metabolomics24 approach commonly used in the food and pharmaceutical fields also requires a lot of time and reagents. None of these methods can satisfy consumers for real-time monitoring of the freshness of edible mushrooms. In recent years, Fourier transform near-infrared (FT-NIR) spectroscopy has been favored by people as a fast, nondestructive, and low-cost detection method. The distinguishing feature of FT-NIR is that the sample does not require complicated preprocessing, the acquisition speed is fast, and the resolution is high. It is a noninvasive detection tool. So far, FT-NIR as a nondestructive testing method has played an important role in the fields of food, Chinese medicine, and chemical industry.25,26 In this study, we used FT-NIR, a nondestructive analysis technique, to detect the freshness of P. portentosus samples.

In addition, it is necessary to establish a mathematical model to better understand the variation of absorbance with time during the cold storage process. In this study, partial least-squares discriminant analysis (PLS-DA), support-vector machine (SVM), extreme learning machine (ELM), and a deep learning (DL) model were applied to experimental data, and the storage date of the sample was identified according to the spectral changes. Among them, DL is a new focus in the field of machine learning research. It has the advantage of having a strong learning ability to capture complex features and prevent overfitting.27 The traditional DL model recognizes based on the morphological features in the image of the object. But we cannot identify the shelf life of mushrooms in the picture. It seems more reliable to identify the shelf life of mushrooms by collecting chemical information of samples through FT-NIR. However, the possibility of using FT-NIR combined with machine learning to distinguish between fresh boletus samples and old samples has not been explored. Therefore, the main purpose of this study is to distinguish samples from different refrigeration times and to use FT-NIR combined with chemometrics as a detection tool to distinguish gray samples from fresh samples. All samples were stored in a refrigerator at 0 °C, and absorbance was observed every 6 h for a total of 6 days. To facilitate the distinction, we divided fresh samples into three categories, namely, category I (1–2 days, 0–48 h, recommended for consumption), category II (3–4 days, 48–96 h, recommended for consumption), and category III (5–6 days, 96–144 h, not recommended).

2. Results and Discussion

2.1. Original Spectral Features

Near-infrared spectroscopy reflects the frequency doubling and combination of frequency information (such as O–H, N–H, C–H) of the atomic vibration of the sample, and the frequency of its vibration is consistent with that of the near-infrared spectrum.28 Combination of frequency vibration refers to a combination of two (or several) different vibrations such as stretching vibration and bending vibration. Theoretically, the vibration frequency is the sum of two vibration frequencies. Frequency-doubling vibration refers to vibration performed at an integer multiple of the fundamental frequency. Figure 1A shows the raw FT-NIR spectra of different shelf lives. It is obvious that these spectra have similar characteristic peaks, but the absorbance shows differences. According to previous literature, some characteristic peaks of the FT-NIR spectrum had been described.29 The band of 5100–5200 cm–1 is attributed to the N–H, C=O vibrating group and the combination of the frequency vibration of water molecules. The vicinity of 6944 cm–1 represents the first-order frequency doubling of the stretching vibration of water molecules.30 As shown in Figure 1A, the near-infrared spectra of the samples within 1–6 days are very similar, but the difference in absorbance values is greater in the wavenumber range of 6944 and 5154 cm–1. This shows that the FT-NIR spectrum can directly reflect the change of the water molecule content in the sample, thereby indirectly inferring the freshness of the sample. To further determine how the sample spectrum changes with storage time, we analyzed the average spectrum of categories I, II, and III (Figure 1B). The absorbance value of category I was between those of categories II and III, which may be due to the loss of water in the fruit body cells and the increase of intercellular water after storage for 1–2 days. Therefore, FT-NIR could easily capture water molecules, which increased the absorbance value. The reason for the low absorbance value of category III is presumably because the moisture content in the fruit body decreased after storage for 4 days. Therefore, it is feasible to identify the shelf life of P. portentosus by FT-NIR spectroscopy. In addition, according to principal component analysis (PCA), the first two principal components represent 99.5% of the sample information (Figure S2). Visually, the first two principal components cannot distinguish P. portentosus in different storage periods. Simple PCA cannot realize the identification of the shelf life of P. portentosus, and a more reliable machine learning model is needed to realize it.31 Finally, it is difficult to identify the shelf life from the original spectrum of P. portentosus, so a supervised pattern-recognition method is required.

Figure 1.

Figure 1

Samples: (A) raw spectrum and (B) average spectrum.

2.2. Different Models Established Using FT-NIR

2.2.1. SVM Model

The calculation of SVM depends on two parameters, namely, the penalty factor (c) and the kernel parameter (g).32 The grid search algorithm is used to find the optimal combination of c and g parameters. The parameter c represents the acceptance degree of the SVM model to abnormal points.33 The higher the c, the more unacceptable is the error and the more the model is prone to overfitting. The smaller the c, the more the model is prone to underfitting. The parameter g is a γ value of the radial basis function kernel, which is the distribution of the data set mapped to the high-latitude feature space, which will affect the training speed of the model classification.34 The larger the g, the fewer the support vectors and the faster the model training speed. Conversely, the smaller the g, the slower the model training speed. The relevant parameters of the SVM model are shown in Table 1. Compared with the original data, the accuracies of the training set and the prediction set after first-derivative (FD) and second-derivative (SD) preprocessing have significantly improved (Figure S1). The SD-processed model had the best performance, and the best parameters were Best c = 32 and Best g = 4.8828 × 10–4. The accuracies of the training set and the test set were as high as 96.03 and 100%, respectively. However, the literature supports that the model fits well when the range of c is 2–2–24 and the range of g is 2–4–24.35 A high value of parameter c means that the model is at risk of overfitting. Therefore, this result shows that due to the high parameter c, the SVM model based on FT-NIR is not sufficient to identify the shelf life of P. portentosus.

Table 1. Results of the SVM Model for Identifying the Shelf Life of P. portentosus Based on Different Preprocessing Methods.
data matrices best c best g accuracy of training set (%) accuracy of test set (%)
raw 1.31072 × 105 1.7263 × 10–4 78.17 80.95
FD 9.26819 × 104 3.0518 × 10–5 97.61 94.05
SD 32 4.8828 × 10–4 96.03 100

2.2.2. PLS-DA Model

For the PLS-DA model, R2 explains the ability of the model to fit the data, and R2 close to 1 is one of the necessary conditions for the model to be robust. Q2 indicates the ability of the model to predict new data, and a larger Q2 (Q2 > 0.5) indicates that the model has a good predictive performance.36 In addition, a rational model should have a lower root-mean-square error of cross-validation (RMSECV), a root-mean-square error of estimation (RMSEE), and a root-mean-square error of prediction (RMSEP). According to the highest Q2 and the lowest RMSECV, the best number of latent variables (LVs) was selected and the corresponding PLS-DA model was established. The smallest RMSECV can collect as many LVs as possible without overfitting. Similar to the results of SVM, the model performance after FD and SD preprocessing was significantly higher than the original data. Among them, the model after SD preprocessing had the best performance. Q2, RMSECV, accuracy of the training set, and accuracy of the test set were 0.769, 0.2314, 99.21%, and 97.62%, respectively (Table 2). To further verify the fit of the PLS-DA model, we performed 200 iterations of permutation tests on the SD-preprocessed model. As shown in Figure 2, the Q2 and R2 values of the original data (right) for the permutation test plots of the P. portentosus shelf life model are higher than the permutation values (left). The X-axis represents the similarity between the permutation test data (simulated value) and the original model, and the Y-axis represents the values of R2 and Q2 in the permutation test plot. The rightmost is the real value, and the left is the analog value. The criteria for judging the effectiveness of the permutation test are as follows: all blue Q2 values to the left are lower than the original points to the right and the blue regression line of the Q2 points intersects the vertical axis (on the left) at or below zero. The results show that the PLS-DA model was regarded as a model suitable for identifying the shelf life of P. portentosus, and it does not overfit.

Table 2. Results of the PLS-DA Model for Identifying the Shelf Life of P. portentosus Based on Different Preprocessing Methods.
data matrices R2 Q2 LVs RMSECV RMSEE RMSEP accuracy of training set (%) accuracy of test set (%)
raw 0.506 0.419 14 0.3647 0.3412 0.3533 82.14 79.76
FD 0.81 0.671 15 0.2956 0.2374 0.2685 99.21 91.67
SD 0.905 0.769 10 0.2314 0.1476 0.2262 99.21 97.62
Figure 2.

Figure 2

Permutation test results of the PLS-DA model of the SD data set.

2.2.3. ELM Model

ELM has the advantages of high learning efficiency and strong generalization ability in many fields.37 Compared with SVM, ELM has a shorter learning time and is not easy to fall into a local optimal solution, and the model performance is slightly better or equivalent. The sigmoid (sig) function was used for the output of hidden layer neurons, which was convenient to deal with two classification problems.38 The number of hidden layer neurons was set to 71. As shown in Table 3 and Figure S3, the model established by the spectral matrix after SD preprocessing has better results, and the accuracies of the training set and the test set were 96.83 and 93.10%, respectively.

Table 3. Results of the ELM Model for Identifying the Shelf Life of P. portentosus Based on Different Preprocessing Methods.
data matrices hidden neurons accuracy of training set (%) accuracy of test set (%)
raw 71 86.11 70.11
FD 71 98.81 90.80
SD 71 96.83 93.10

2.2.4. DL Model

Based on the synchronous and asynchronous two-dimensional correlation spectroscopy (2DCOS) images, a deep learning model was established to identify the shelf life of P. portentosus. A 12-layer residual network was constructed. The weight attenuation coefficient λ of the model was 0.0001, and the learning rate was 0.01. The discriminative performance of the model is judged by the accuracy and loss value. The loss function plays a vital role in deep learning.39 The model reaches the state of convergence by minimizing the loss function to reduce the error of the model-predicted value.40Figure 3 shows representative synchronous, asynchronous, and integration 2DCOS (i2DCOS) images, from which it can be clearly seen that the synchronous 2DCOS of fresh P. portentosus has a significant difference at the 6944 and 5154 cm–1 bands. Figure 4A,B shows the recognition results of synchronous 2DCOS. The accuracy of both the training set and the test set was 100% and the loss value was 0.012 when the number of iterations epoch was 28. Although the accuracy of both the training set and the test set was already 100% when the epoch was 5, the loss value was very high. Therefore, the model achieved best performance when the epoch was 28. Figure 4C,D shows the accuracy of the model based on asynchronous and i2DCOS images. The accuracy of the model training set based on asynchronous was 100%, and the accuracy of the test set did not exceed 60%. The accuracy of the model training set based on i2DCOS was 100%, and the accuracy of the test set did not exceed 80%. The results show that the low test set accuracy cannot be used as the shelf life for identifying P. portentosus. Therefore, the deep learning model based on synchronization is the best. The external validation set is used to further verify the robustness of the DL model. As shown in Table S1, the external validation images based on the synchronization spectrum have been correctly classified, which proves that 2DCOS combined with DL is a reliable way to identify the shelf life of P. portentosus.

Figure 3.

Figure 3

Synchronous, asynchronous, and i2DCOS images of P. portentosus with different shelf lives: (a–c) category I; (d–f) category II; and (g–i) category III.

Figure 4.

Figure 4

Results of the deep learning model of P. portentosus. (A) Model accuracy based on synchronization; (B) model loss value based on synchronization; (C) model accuracy based on asynchronization; and (D) model accuracy based on i2DCOS.

However, it is not enough to evaluate the model based on accuracy alone. In this study, SVM, PLS-DA, ELM, and DL models were established to identify the shelf life of P. portentosus, and their performance was evaluated by the following parameters: Sen, Spe, and Eff. The confusion matrix (Table S1) and parameter performance (Table 4) of the four models show that the DL models had the best results. The accuracy of the DL model based on the original spectrum was 100%, and the Sen, Spe, and Eff value was 1.00. A loss value of 0.012 means that the model has good convergence and is an effective tool for identifying the shelf life of P. portentosus. In the PLS-DA model, SD effectively improves the signal sensitivity, and the robustness of the model is significantly increased.41 Although the accuracy of the PLS-DA model test set is 97.62%, and the Eff is above 0.97, category III cannot be allowed to be misclassified. Similarly, the ELM model is not suitable for identifying the shelf life of P. portentosus. The advantages of a model are often evaluated from many aspects. The average Eff of the SVM model (SD) is as high as 1.000, but the higher parameter c indicates the risk of overfitting the model. In general, 2DCOS combined with DL was an effective identification model for predicting the shelf life of P. portentosus. It had the advantages of high accuracy and good model robustness and was an effective and fast identification tool.

Table 4. Comparison of Different Model Parameters.
    raw
FD
SD
methods parameter I II III I II III I II III
SVM Sen 0.893 0.714 0.821 0.964 0.929 0.929 1.000 1.000 1.000
Spe 0.893 0.911 0.911 0.929 0.982 1.000 1.000 1.000 1.000
Eff 0.893 0.807 0.865 0.946 0.955 0.964 1.000 1.000 1.000
PLS-DA Sen 0.750 0.786 0.857 0.929 0.900 0.857 0.964 1.000 0.964
Spe 0.911 0.857 0.929 0.946 0.981 0.982 0.982 1.000 0.982
Eff 0.826 0.821 0.892 0.937 0.940 0.918 0.973 1.000 0.973
ELM Sen 0.724 0.690 0.759 1.000 0.828 0.897 0.931 0.931 0.931
Spe 0.828 0.793 0.966 0.931 0.948 0.983 0.983 0.948 0.966
Eff 0.774 0.740 0.856 0.965 0.886 0.939 0.957 0.940 0.948
DL Sen 1.000 1.000 1.000            
Spe 1.000 1.000 1.000            
Eff 1.000 1.000 1.000            

3. Conclusions

Fresh mushrooms are highly perishable, and food safety problems caused by eating inedible ones should be avoided. Therefore, ensuring the edible safety of mushrooms is very important for consumers. In this study, FT-NIR spectroscopy was first attempted to correlate the shelf life of P. portentosus. The near-infrared spectrum collected by the FT-NIR spectrometer distinguished fresh samples from aged samples by an established mathematical model. Four machine learning methods, namely, SVM, PLS-DA, ELM, and DL were used to study and compare the classification performance. The results show that the combination of 2DCOS and deep learning is an effective and accurate method to distinguish the shelf life of P. portentosus. This method does not require complicated sample preparation and data preprocessing, and is a nondestructive, fast, and cheaper method than the traditional analysis. Therefore, it provides a good choice for quickly identifying the shelf life of P. portentosus in the edible mushroom market. This method may be applied to other edible mushrooms or foods with a short shelf life.

4. Materials and Methods

4.1. Sample Collection

We obtained 14 samples that were sold in the “Mushuihua” wild mushroom trading market in Yunnan, China. The seller confirmed that the samples were freshly picked on the same day. After internal transcribed spacer (ITS) sequencing, it was confirmed to be P. portentosus (97.85%, MN962555.1). The sample was wiped clean with a dry cotton cloth to remove surface debris. After that, the fresh mushrooms were packed in polyethylene bags and stored at 0 °C. FT-NIR spectra were collected every 6 h on the surface of the boletus caps, and each sample was repeated three times from 0 to 6 days.

4.2. FT-NIR Spectral Acquisition

An FT-NIR spectrometer (Thermo Scientific Inc., Antaris II) was used to obtain FT-NIR spectra of fresh samples. The wavelength range of the spectrometer was 10 000–4000 cm–1, the resolution was 8 cm–1, and the signal was scanned 64 times. Each scan took about 1 min. The background was deducted before testing the sample. Three random positions of the sample cap were selected to measure the FT-NIR spectrum every 6 h, and the average value was taken. The spectrometer was warmed up for 2 h before measurements.

4.3. Spectral Data Preprocessing

Before establishing the mathematical model, it is necessary to preprocess the original spectrum. The reason for preprocessing is that the near-infrared spectrum data contains signals such as baseline drift and noise. In addition, there are spectral line shifts caused by factors such as sample size and environment. Preprocessing the spectral data can reduce interference information and enhance the predictive ability of the model. Although the original spectral signals of the samples are very similar, the derivative can highlight subtle differences between similar signal curves and eliminate baseline drift and vertical displacement.42,43

In this study, we used FD and SD methods to get a better model performance. The first-derivative (FD) spectrum is the slope at each point of the original spectrum. It peaks where the original spectrum has maximum slope, and it crosses zero where the original has peaks. The second-derivative (SD) spectrum is a measure of the curvature at each point in the original spectrum.44 Both FD and SD can effectively resolve overlapping peaks, and improve sensitivity and resolution, but will reduce the signal-to-noise ratio. The original FT-NIR spectral data was imported into OMNIC software (Version 8.2, Thermo Fisher Scientific Inc., Waltham, MA) with absorbance as the ordinate. Then, the spectral image is converted into a matrix data set through SIMAC-P+ 13.0 (Umetrics). The matrix data set (m × n) represents the change in the number of variables, where m represents the number of samples and n represents the corresponding wavenumber. The original FT-NIR spectrum data consists of (336 × 1557), and the matrix is reduced to (336 × 1543) after FD or SD preprocessing.

4.4. Two-Dimensional Correlation Spectroscopy (2DCOS) Image

2DCOS maps the one-dimensional spectrum in a two-dimensional space, which improves the original spectral resolution. It reflects the subtle changes in the spectrum when the sample is disturbed. For the DL model, the original FT-NIR spectrum must be converted into a 2DCOS image for the next analysis. Through complex correlation analysis, 2D synchronous correlation spectrum Φ (v1, v2) and 2D asynchronous correlation spectrum Ψ (v1, v2) were generated. The dynamic spectrum intensity of the variable v is represented by S, t is the external disturbance, and m is the spectrum measured at m steps with equal intervals between the disturbance t.

4.4. 1
4.4. 2
4.4. 3

where N is the Hilbert–Noda matrix, defined as follows

4.4. 4

The product of the intensities of the synchronous and asynchronous 2DCOS spectra produced by the same sample is defined as the i2DCOS intensity I (v1, v2) between the variables v1 and v2

4.4. 5

The synchronous, asynchronous, and i2DCOS spectra of all samples are obtained by formulas 2, 3, and 5, respectively.

4.5. Data Analysis

Supervised learning is one of the common methods of machine learning.45 It is a training method with a clear purpose, and the samples need to be divided into training sets and test sets. Therefore, the samples of each storage period were divided into 75% training set and 25% prediction set by the Kennard–Stone algorithm.46 Compared with unsupervised learning, it has three obvious characteristics: (1) it has a clear goal, (2) it requires labeled training data, and (3) the model performance is easy to evaluate. The first step in supervised learning is to choose a suitable model. SVM, PLS-DA, ELM, and DL are four powerful pattern-recognition techniques. They play an important role in the fields of food and Chinese herbal medicines.47,48 SVM can identify subtle differences in complex data sets and provide a reliable classification model.34,49 PLS-DA has strong applicability and is a multidimensional reduction tool.50 The main advantage of ELM is that for traditional neural networks, especially the single hidden layer feedforward neural networks (SLFNs), it is faster than traditional learning algorithms under the premise of ensuring learning accuracy.36 DL technology based on visual feature recognition is a research hotspot in the fields of food, herbal medicine, and medical diagnosis. It overcomes the problem of disappearance or explosion of the gradient of the deep neural network in the previous deep learning and has good convergence and accuracy.51

SVM is a two-class classification model. The basic model of SVM is defined as a linear classifier with the maximum margin in the feature space. Its learning strategy is to maximize the margin (d or the margin is the largest, as shown in Figure 5A). SVM has a wide range of applications in classification problems due to its easy-to-understand classification ideas and better model robustnesss.52 For a nonlinear data set, SVM mapped the data set to a higher dimension, but it is more difficult to operate in a high-dimensional space. A radial basis function was used for the establishment of the SVM model. A kernel function can simplify this process, that is, transform the inner product of the high-dimensional data space into operation on the original data space.53,54 In this manner, the classification problem of nonlinear data sets is realized. The establishment of the SVM model was performed using MATLAB (R2017a version, MathWorks).

Figure 5.

Figure 5

(A) SVM classification principle; (B) ELM neural network; and (C) DL neural network.

PLS-DA uses the method of partial least-squares regression to establish a model while reducing the dimensionality of the data and conducts a discriminant analysis of the regression results. PLS-DA can reduce the multiple linear effects in variables and make the model get better results.48,55 In this study, the spectral data of P. portentosus was used as the independent variable X and the freshness of P. portentosus was used as the dependent variable Y. The values of Y correspond to different categories. PLS-DA extracts the characteristic vectors of the spectral data matrix X and the category matrix Y at the same time, which strengthens the function of mushroom category information in spectral analysis. This method extracts spectral information with the highest correlation with the freshness of mushrooms, thereby maximizing the difference among the extracted spectra of different categories. PLS-DA is realized through SIMAC-P+ 14.1 (Umetrics, Sweden) software.

ELM is a single hidden layer forward neural network learning method that characterizes the mapping relationship between spectral attributes and classification labels by establishing a multilayer neuron connection structure.56,57 This method only needs to set the number of hidden layer network nodes before training; the connection weights of the input layer and the hidden layer can be set randomly, and there is no need to adjust after setting. In addition, the connection weight β between the hidden layer and the output layer does not need to be adjusted iteratively but is determined at one time by solving equations. Through the above methods, the generalization performance of the ELM model is good, which greatly improves the running speed.38 The network structure of the ELM model is shown in Figure 5B. Wi represents the input weight, β represents the output weight, xi represents the i-th data, and tm represents the mark corresponding to the m-th data example. The number of hidden layer nodes of the ELM is L. The ELM model is built using MATLAB (R2017a version, MathWorks).

Unlike ELM, the number of hidden layers of DL is usually greater than five, which is a complex model with a deep structure (Figure 5C). The DL model can extract more information due to deeper hidden layers. Deep learning uses machines to extract features to express data to avoid errors caused by manually extracting feature data and realizes an end-to-end model. Residual Network (ResNet) is a type of Convolutional Neural Network (CNN). ResNet is an obvious advantage in the deep learning network structure because it overcomes the problem of vanishing or exploding gradients.39,51 As shown in Figure 6, the input x passes through the convolutional layer to obtain the feature-transformed output F(x), which is added to the input x to obtain the final output H(x) = F(x) + x. When the feature represented by x is mature enough, F(x) will become 0. x passes through the identity mapping route, which simplifies the difficulty of model training. The ResNet model uses the Relu function to complete nonlinear activation, and global average pooling (GAP) solves the problem of excessively large parameters in the fully connected layer to obtain the best model.

Figure 6.

Figure 6

Schematic diagram of the ResNet module.

We divide the 2DCOS image into 60% training set, 30% test set, and 10% external validation set in the deep learning model. The role of external verification is to verify whether the deep learning model is overfitting.

Model performance is evaluated by sensitivity (Sen), specificity (Spe), and efficiency (Eff). These three parameters are based on true positive (TP), false positive (FP), true negative (TN), and false negative (FN). TP and TN describe correctly identified positive and negative samples, respectively. FP and FN indicate misidentified positive and negative samples, respectively. Sen shows the ability to correctly identify samples in the classification group, and Spe shows the ability to reject samples of other categories. The combination of Sen and Spe is the parameter Eff.58

4.5.

Supporting Information Available

The Supporting Information is available free of charge at https://pubs.acs.org/doi/10.1021/acsomega.1c02317.

  • Results of SVM, ELM, and PCA and the confusion matrix (PDF)

This work was supported by the National Natural Science Foundation of China (Grant Number 32060570) and the Joint Special Project of Agricultural Fundamental Research of Yunnan Province (Grant Number 2018FG001-033).

The authors declare no competing financial interest.

Notes

Informed consent was obtained from all individual participants included in this article.

Supplementary Material

ao1c02317_si_001.pdf (557.4KB, pdf)

References

  1. Wang X.; Zhang J.; Wu L.; Zhao Y.; Li T.; Li J.; Wang Y.; Liu H. A mini-review of chemical composition and nutritional value of edible wild-grown mushroom from China. Food Chem. 2014, 151, 279–285. 10.1016/j.foodchem.2013.11.062. [DOI] [PubMed] [Google Scholar]
  2. Ezekiel C. N.; Sulyok M.; Frisvad J. C.; Somorin Y. M.; Warth B.; Houbraken J.; Samson R. A.; Krska R.; Odebode A. C. Fungal and mycotoxin assessment of dried edible mushroom in Nigeria. Int. J. Food Microbiol. 2013, 162, 231–236. 10.1016/j.ijfoodmicro.2013.01.025. [DOI] [PubMed] [Google Scholar]
  3. Sanmee R.; Dell B.; Lumyong P.; Izumori K.; Lumyong S. Nutritive value of popular wild edible mushrooms from northern Thailand. Food Chem. 2003, 82, 527–532. 10.1016/S0308-8146(02)00595-2. [DOI] [Google Scholar]
  4. Segedin B. P. An annotated checklist of Agarics and Boleti recorded from New Zealand. N. Z. J. Bot. 1987, 25, 185–215. 10.1080/0028825X.1987.10410067. [DOI] [Google Scholar]
  5. Yang R. H.; Bao D. P.; Guo T.; Li Y.; Ji G. Y.; Ji K. P.; Tan Q. Bacterial Profiling and Dynamic Succession Analysis of Phlebopus portentosus Casing Soil Using MiSeq Sequencing. Front. Microbiol. 2019, 10, 1927 10.3389/fmicb.2019.01927. [DOI] [PMC free article] [PubMed] [Google Scholar]
  6. Ji K.; Cao Y.; Zhang C.; He M.; Liu J.; Wang W.; Wang Y. Cultivation of Phlebopus portentosus in southern China. Mycol. Prog. 2011, 10, 293–300. 10.1007/s11557-010-0700-7. [DOI] [Google Scholar]
  7. Zhang C.; He M.; Cao Y.; Liu J.; Gao F.; Wang W.; Ji K.; Shao S.; Wang Y. Fungus-insect gall of Phlebopus portentosus. Mycologia 2015, 107, 12–20. 10.3852/13-267. [DOI] [PubMed] [Google Scholar]
  8. Sun Z.; Hu M.; Sun Z.; Zhu N.; Yang J.; Ma G.; Xu X. Pyrrole Alkaloids from the Edible Mushroom Phlebopus portentosus with Their Bioactive Activities. Molecules 2018, 23, 1198 10.3390/molecules23051198. [DOI] [PMC free article] [PubMed] [Google Scholar]
  9. Aphichart K.; Songchan P.; Prakitsin S.; Jittra P.; Polkit S. Antioxidation and antiproliferation properties of polysaccharide-protein complex extracted from Phaeogyroporus portentosus (Berk. Broome) McNabb. Afr. J. Microbiol. Res. 2013, 7, 1668–1680. 10.5897/AJMR12.345. [DOI] [Google Scholar]
  10. Eissa H. A. A. Effect of chitosan coating on shelf life and quality of fresh-cut mushroom. J. Food Qual. 2007, 30, 623–645. 10.1111/j.1745-4557.2007.00147.x. [DOI] [Google Scholar]
  11. Singh P.; Langowski H. C.; Wani A. A.; Saengerlaub S. Recent advances in extending the shelf life of fresh Agaricus mushrooms: a review. J. Sci. Food Agric. 2010, 90, 1393–1402. 10.1002/jsfa.3971. [DOI] [PubMed] [Google Scholar]
  12. Baskar C.; Nesakumar N.; Kesavan S.; Balaguru Rayappan J. B.; Alwarappan S. ATR-FTIR as a versatile analytical tool for the rapid determination of storage life of fresh Agaricus bisporus via its moisture content. Postharvest Biol. Technol. 2019, 154, 159–168. 10.1016/j.postharvbio.2019.05.006. [DOI] [Google Scholar]
  13. Fernandes Â.; Barreira J. C. M.; Günaydi T.; Alkan H.; Antonio A. L.; Oliveira M. B. P. P.; Martins A.; Ferreira I. C. F. R. Effect of gamma irradiation and extended storage on selected chemical constituents and antioxidant activities of sliced mushroom. Food Control 2017, 72, 328–337. 10.1016/j.foodcont.2016.04.044. [DOI] [Google Scholar]
  14. Jaworska G.; Bernaś E. The effect of preliminary processing and period of storage on the quality of frozen Boletus edulis (Bull: Fr.) mushrooms. Food Chem. 2009, 113, 936–943. 10.1016/j.foodchem.2008.08.023. [DOI] [Google Scholar]
  15. Supakarn S.; Theerakulpisut S.; Artnaseaw A. Equilibrium moisture content and thin layer drying model of shiitake mushrooms using a vacuum heat-pump dryer. Chiang Mai Univ. J. Nat. Sci. 2018, 17, 1–12. 10.12982/CMUJNS.2018.0001. [DOI] [Google Scholar]
  16. Huang S.; Xiong Y.; Zou Y.; Dong Q.; Ding F.; Liu X.; Li H. A novel colorimetric indicator based on agar incorporated with Arnebia euchroma root extracts for monitoring fish freshness. Food Hydrocolloids 2019, 90, 198–205. 10.1016/j.foodhyd.2018.12.009. [DOI] [Google Scholar]
  17. Qiao L.; Tang X.; Dong J. A feasibility quantification study of total volatile basic nitrogen (TVB-N) content in duck meat for freshness evaluation. Food Chem. 2017, 237, 1179–1185. 10.1016/j.foodchem.2017.06.031. [DOI] [PubMed] [Google Scholar]
  18. Chen Q.; Yi T.; Tang Y.; Wong L. L.; Huang X.; Zhao Z.; Chen H. Comparative authentication of three ‘snow lotus’’ herbs by macroscopic and microscopic features. Microsc. Res. Tech. 2014, 77, 631–641. 10.1002/jemt.22381. [DOI] [PubMed] [Google Scholar]
  19. Tang Y.; He X.; Quanlan C.; Lanlan F.; Jianye Z.; Zhongzhen Z.; Dong L.; Zhitao L.; Yi T.; Chen H. A mixed microscopic method for differentiating seven species of “Bixie”-related Chinese Materia Medica. Microsc. Res. Tech. 2014, 77, 57–70. 10.1002/jemt.22313. [DOI] [PubMed] [Google Scholar]
  20. Xue Y.; Zhu L.; Yi T. Fingerprint analysis of Resina draconis by ultra-performance liquid chromatography. Chem. Cent. J. 2017, 11, 67 10.1186/s13065-017-0299-8. [DOI] [PMC free article] [PubMed] [Google Scholar]
  21. Yu S.; Zhu L.; Xiao Z.; Shen J.; Li J.; Lai H.; Li J.; Chen H.; Zhao Z.; Yi T. Rapid Fingerprint Analysis of Flos Carthami by Ultra-Performance Liquid Chromatography and Similarity Evaluation. J. Chromatogr. Sci. 2016, 54, 1619–1624. 10.1093/chromsci/bmw115. [DOI] [PubMed] [Google Scholar]
  22. Fang J.; Zhu L.; Yi T.; Zhang J.; Yi L.; Liang Z.; Xia L.; Feng J.; Xu J.; Tang Y.; et al. Fingerprint analysis of processed Rhizoma Chuanxiong by high-performance liquid chromatography coupled with diode array detection. Chin. Med. 2015, 10, 2 10.1186/s13020-015-0031-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  23. Chen Q.; Zhu L.; Tang Y.; Kwan H.; Zhao Z.; Chen H.; Yi T. Comparative evaluation of chemical profiles of three representative ’snow lotus’ herbs by UPLC-DAD-QTOF-MS combined with principal component and hierarchical cluster analyses. Drug Test. Anal. 2017, 9, 1105–1115. 10.1002/dta.2123. [DOI] [PubMed] [Google Scholar]
  24. Zhu L.; Liang Z.; Yi T.; Ma Y.; Zhao Z.; Guo B.; Zhang J.; Chen H. Comparison of chemical profiles between the root and aerial parts from three Bupleurum species based on a UHPLC-QTOF-MS metabolomics approach. BMC Complementary Altern. Med. 2017, 17, 305 10.1186/s12906-017-1816-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  25. Walsh K. B.; Blasco J.; Zude-Sasse M.; Sun X. Visible-NIR ‘point’ spectroscopy in postharvest fruit and vegetable assessment: The science behind three decades of commercial use. Postharvest Biol. Technol. 2020, 168, 111246 10.1016/j.postharvbio.2020.111246. [DOI] [Google Scholar]
  26. Zhao C.; Qiao X.; Shao Q.; Hassan M.; Ma Z. Evolution of the Lignin Chemical Structure during the Bioethanol Production Process and Its Inhibition to Enzymatic Hydrolysis. Energy Fuels 2020, 34, 5938–5947. 10.1021/acs.energyfuels.0c00293. [DOI] [Google Scholar]
  27. Ghasemi F.; Mehridehnavi A.; Perez-Garrido A.; Perez-Sanchez H. Neural network and deep-learning algorithms used in QSAR studies: merits and drawbacks. Drug Discovery Today 2018, 23, 1784–1790. 10.1016/j.drudis.2018.06.016. [DOI] [PubMed] [Google Scholar]
  28. Chmielarz M.; Sampels S.; Blomqvist J.; Brandenburg J.; Wende F.; Sandgren M.; Passoth V. FT-NIR: a tool for rapid intracellular lipid quantification in oleaginous yeasts. Biotechnol. Biofuels 2019, 12, 169 10.1186/s13068-019-1513-9. [DOI] [PMC free article] [PubMed] [Google Scholar]
  29. Wang Y.; Zuo Z.; Huang H.; Wang Y. Original plant traceability of Dendrobium species using multi-spectroscopy fusion and mathematical models. R. Soc. Open Sci. 2019, 6, 190399 10.1098/rsos.190399. [DOI] [PMC free article] [PubMed] [Google Scholar]
  30. Takeuchi M.; Martra G.; Coluccia S.; Anpo M. Evaluation of the Adsorption States of H2O on Oxide Surfaces by Vibrational Absorption: Near- and Mid-Infrared Spectroscopy. J. Near Infrared Spectrosc. 2009, 17, 373–384. 10.1255/jnirs.843. [DOI] [Google Scholar]
  31. Wang Y.; Li J.; Liu H.; Wang Y. Attenuated Total Reflection-Fourier Transform Infrared Spectroscopy (ATR-FTIR) Combined with Chemometrics Methods for the Classification of Lingzhi Species. Molecules 2019, 24, 2210 10.3390/molecules24122210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  32. Li Y.; Zhang J.; Li T.; Liu H.; Li J.; Wang Y. Geographical traceability of wild Boletus edulis based on data fusion of FT-MIR and ICP-AES coupled with data mining methods (SVM). Spectrochim. Acta, Part A 2017, 177, 20–27. 10.1016/j.saa.2017.01.029. [DOI] [PubMed] [Google Scholar]
  33. Wu X.; Zhang Q.; Wang Y. Traceability of wild Paris polyphylla Smith var. yunnanensis based on data fusion strategy of FT-MIR and UV–Vis combined with SVM and random forest. Spectrochim. Acta, Part A 2018, 205, 479–488. 10.1016/j.saa.2018.07.067. [DOI] [PubMed] [Google Scholar]
  34. Devos O.; Downey G.; Duponchel L. Simultaneous data pre-processing and SVM classification model selection based on a parallel genetic algorithm applied to spectroscopic data of olive oils. Food Chem. 2014, 148, 124–130. 10.1016/j.foodchem.2013.10.020. [DOI] [PubMed] [Google Scholar]
  35. Hu Y.; Li J.; Fan M.; Wang Y. Identify different species in yunnan wild edible bolete by infrared spectrum based on support vector machine. Food Sci. 2021, 42, 248–256. 10.7506/spkx1002-6630-20191016-151. [DOI] [Google Scholar]
  36. Liu L.; Zuo Z.; Wang Y.; Xu F. A fast multi-source information fusion strategy based on FTIR spectroscopy for geographical authentication of wild Gentiana rigescens. Microchem. J. 2020, 159, 105360 10.1016/j.microc.2020.105360. [DOI] [Google Scholar]
  37. Li Q.; Huang Y.; Zhang J.; Min S. A fast determination of insecticide deltamethrin by spectral data fusion of UV–vis and NIR based on extreme learning machine. Spectrochim. Acta, Part A 2021, 247, 119119 10.1016/j.saa.2020.119119. [DOI] [PubMed] [Google Scholar]
  38. Yu H.; Sun X.; Wang J. A Dynamic ELM with Balanced Variance and Bias for Long-Term Online Prediction. Neural Process. Lett. 2019, 49, 1257–1271. 10.1007/s11063-018-9865-x. [DOI] [Google Scholar]
  39. Yue J.; Huang H.; Wang Y. A practical method superior to traditional spectral identification: Two-dimensional correlation spectroscopy combined with deep learning to identify Paris species. Microchem. J. 2021, 160, 105731 10.1016/j.microc.2020.105731. [DOI] [PMC free article] [PubMed] [Google Scholar]
  40. Dong J.; Zhang J.; Zuo Z.; Wang Y. Deep learning for species identification of bolete mushrooms with two-dimensional correlation spectral (2DCOS) images. Spectrochim. Acta, Part A 2021, 249, 119211 10.1016/j.saa.2020.119211. [DOI] [PubMed] [Google Scholar]
  41. Oliveri P.; Malegori C.; Simonetti R.; Casale M. The impact of signal pre-processing on the final interpretation of analytical outcomes - A tutorial. Anal. Chim. Acta 2019, 1058, 9–17. 10.1016/j.aca.2018.10.055. [DOI] [PubMed] [Google Scholar]
  42. Hassoun A.; Shumilina E.; Di Donato F.; Foschi M.; Simal-Gandara J.; Biancolillo A. Emerging Techniques for Differentiation of Fresh and Frozen-Thawed Seafoods: Highlighting the Potential of Spectroscopic Techniques. Molecules 2020, 25, 4472 10.3390/molecules25194472. [DOI] [PMC free article] [PubMed] [Google Scholar]
  43. Oliveri P.; Malegori C.; Simonetti R.; Casale M. The impact of signal pre-processing on the final interpretation of analytical outcomes - A tutorial. Anal. Chim. Acta 2019, 1058, 9–17. 10.1016/j.aca.2018.10.055. [DOI] [PubMed] [Google Scholar]
  44. Ns T.; Isaksson T.; Fearn T.; Davies T. A.. A User Friendly Guide to Multivariate Calibration and Classification, NIR Publications: Chichester, 2002. [Google Scholar]
  45. Wang Q.; Huang H.; Wang Y. FTIR and UV spectra for the prediction of triterpene acids in Macrohyporia cocos. Microchem. J. 2020, 158, 105167 10.1016/j.microc.2020.105167. [DOI] [Google Scholar]
  46. Berrueta L. A.; Alonso-Salces R. M.; Héberger K. Supervised pattern recognition in food analysis. J. Chromatogr. A 2007, 1158, 196–214. 10.1016/j.chroma.2007.05.024. [DOI] [PubMed] [Google Scholar]
  47. Rajer-Kanduč K.; Zupan J.; Majcen N. Separation of data on the training and test set for modelling: a case study for modelling of five colour properties of a white pigment. Chemom. Intell. Lab. Syst. 2003, 65, 221–229. 10.1016/S0169-7439(02)00110-7. [DOI] [Google Scholar]
  48. Liu T.; Jiang H.; Chen Q. Qualitative identification of rice actual storage period using olfactory visualization technique combined with chemometrics analysis. Microchem. J. 2020, 159, 105339 10.1016/j.microc.2020.105339. [DOI] [Google Scholar]
  49. Wang Y.; Li J.; Liu H.; Wang Y. Attenuated Total Reflection-Fourier Transform Infrared Spectroscopy (ATR-FTIR) Combined with Chemometrics Methods for the Classification of Lingzhi Species. Molecules 2019, 24, 2210 10.3390/molecules24122210. [DOI] [PMC free article] [PubMed] [Google Scholar]
  50. Huang S.; Cai N.; Pacheco P. P.; Narrandes S.; Wang Y.; Xu W. Applications of Support Vector Machine (SVM) Learning in Cancer Genomics. Cancer Genomics Proteomics 2018, 15, 41–51. 10.21873/cgp.20063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  51. Litjens G.; Kooi T.; Bejnordi B. E.; Setio A.; Ciompi F.; Ghafoorian M.; van der Laak J.; van Ginneken B.; Sanchez C. I. A survey on deep learning in medical image analysis. Med. Image Anal. 2017, 42, 60–88. 10.1016/j.media.2017.07.005. [DOI] [PubMed] [Google Scholar]
  52. Rezaei-Ravari M.; Eftekhari M.; Saberi-Movahed F. Regularizing extreme learning machine by dual locally linear embedding manifold learning for training multi-label neural network classifiers. Eng. Appl. Artif. Intell. 2021, 97, 104062 10.1016/j.engappai.2020.104062. [DOI] [Google Scholar]
  53. Sujay Raghavendra N.; Deka P. C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 2014, 19, 372–386. 10.1016/j.asoc.2014.02.002. [DOI] [Google Scholar]
  54. Li Y.; Wang Y. Synergistic strategy for the geographical traceability of wild Boletus tomentipes by means of data fusion analysis. Microchem. J. 2018, 140, 38–46. 10.1016/j.microc.2018.04.001. [DOI] [Google Scholar]
  55. Jain P.; Garibaldi J. M.; Hirst J. D. Supervised machine learning algorithms for protein structure classification. Comput. Biol. Chem. 2009, 33, 216–223. 10.1016/j.compbiolchem.2009.04.004. [DOI] [PubMed] [Google Scholar]
  56. Górski Ł.; Sordo W.; Ciepiela F.; Kubiak W. W.; Jakubowska M. Voltammetric classification of ciders with PLS-DA. Talanta 2016, 146, 231–236. 10.1016/j.talanta.2015.08.027. [DOI] [PubMed] [Google Scholar]
  57. Huang G.; Zhu Q.; Siew C. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. 10.1016/j.neucom.2005.12.126. [DOI] [Google Scholar]
  58. Oliveri P.; Downey G. Multivariate class modeling for the verification of food-authenticity claims. TrAC, Trends Anal. Chem. 2012, 35, 74–86. 10.1016/j.trac.2012.02.005. [DOI] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

ao1c02317_si_001.pdf (557.4KB, pdf)

Articles from ACS Omega are provided here courtesy of American Chemical Society

RESOURCES