Skip to main content
. 2020 Oct 15;18:2920–2930. doi: 10.1016/j.csbj.2020.10.006

Table 1.

Short description and characteristics of the methods for pre- processing.

Method Description Characteristic
Denoise Kernel smoothing Smooths the spectra based on a normal kernel function Parameter free
Savitzky-Golay differentiation Estimates the derivative by consecutively fitting window-wised sub-sets of adjoining data points with a degree (custom designed) polynomial using linear least squares Parameter-free; can be used for both baseline correction and smoothing/noise reduction.
Baseline removal MPLS Finds a rough background based on a penalized least squares function Relatively time-consuming; competitive results; insensitive to the parameters.
SNV Transposes and then auto-scales the data. Parameter free; scales the data
MSC Each input spectrum is regressed against a reference (e.g. the mean spectrum) and the results are used to correct the input spectrum. Reference dependent; scales the data.
Cosmic ray removal Sharp spike detection Detects spikes which are significantly narrower than the peaks in the spectrum. Insensitive to the relatively wide spikes; threshold dependent.
Abnormal spike detection A series of replicate spectra are compared. A spike is detected and removed since the probability of a spike occurring at the same point in multiple spectra is considered low. Time-consuming since multiple spectra must be compared.
Image curvature correction Optimizes optical systems by comparing spectra from different rows of pixels on the detector. User intervention needed for implementation; parameter based; time-consuming.
Mapping based technique The abnormal spikes are detected by comparing the neighboring spectra from the map. A relatively large number of pixels needed for the accuracy of the detection.
Scaling method Normalization by a peak (e.g. maximal peak). Divides every row (spectrum) by the value at the selected peak of that row (e.g. maximal peak).
Xn,:scaled=X(n,:)X(n,peak)
Emphasizes the variation of the Raman bands against the selected peak
Auto-scaling Subtracts the mean and then divides the standard deviation of that row.
Xn,:scaled=X(n,:)-X(n,:)-std(Xn,peak)πr2
The shape of the spectra may be lost; reduces the variation in the objects and gathers the objects towards the center.
Row normalization (length/area) Divides every row/ object by the length (Manhattan distance)/area (Euclidean distance) of that row.Xn,:scaled=X(n,:)sum(X(n,:))(length)Xn,:scaled=X(n,:)sum(X(n,:)2)(area) The variation of objects is reduced.
Column normalization (length/area) Divides every column/variable by the length (Manhattan distance)/area (Euclidean distance) of that column.
X:,wscaled=X(:,w)sum(X(:,w))(length)
X:,wscaled=X(:,w)sum(X(:,w)2)(area)
The shape of the spectra may be lost; Reduces the variation from variables
Mean-center Subtracts the mean of each row for all the elementsXn,:scaled=X(n,:)-X(n,:)- Reduces the deviation of the data from its center; gathers the objects towards the center.

n/w represents the nth/wth row /column of the spectral matrix X for scaling. All the X blocks in the paper are arranged in a way that objects are stored in different rows and variables are stored in different columns.