Skip to main content
. Author manuscript; available in PMC: 2021 Feb 19.
Published in final edited form as: Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3

Table 2.

Names, descriptions, formulas, and interpretations of the 11 peak integration quality metrics used in our study, categorized into the sets they were obtained/adapted from

Name Description Formulaa Interpretationb
M7 (Eshghi et al.)
Apex-Boundary Ratio (ABR) Uses boundary-over-apex intensity ratio to assess completeness of integration ABR=max(I1,IN)IA LV ⇒ HQ
HV ⇒ LQ
Range: (0,1]
Elution Shift (ES) Assesses retention time shift of samples by comparing time-position of peak apex ES=abs(tA,smed(tA,1,,tA,S))PB
PB=avg(tN,1,,tN,S)avg(t1,1,,t1,S)
LV ⇒ HQ
HV ⇒ LQ
HV ⇒ HQ
FWHM2Base (F2B) Assesses separation of peaks by measuring peak-width at half-max vs. peak-width at base F2B=tHA,2tHA,1tNt1
If tHA,2 does not exist, F2B=0
LV ⇒ LQ
Range: (0,1]
Jaggedness (J) Captures shape quality by calculating the number of changes in direction over length of intensity vector D = diff ([I1,…,IN])
D′ = {sign(d), dD, ifd > ff * IA0, dD, ifd < ff * IA
J=sum(D)N
LV ⇒ HQ
HV ⇒ LQ
Modality (M) Measures the first unexpected change in direction of intensity to detect splitting and integration of multiple peaks M = maxDip/IA
maxDip = IlrIff
LV ⇒ HQ
HV ⇒ LQ
RT Consistency (RTC) Assesses retention time alignment of samples by comparing the time at the center index of the time vector RTC=abs(avg(ct1,,ctS)cts)avg(ct1,,ctS)
cts = tN,s − (tN,st1,s)/2
LV ⇒ HQ
HV ⇒ LQ
Symmetry (SY) Measures correlation between left and right halves of a peak SY=cor([I1,,IN2],[IN2,,IN]) HV ⇒ HQ
LV ⇒ LQ
Range: [−1,1]
M4 (Zhang et al.)
Gaussian Similarity (GS) Measures similarity of a peak to Gaussian-fitted curve GaussianSimilarity=CGCG
C = std([I,, .., IN])
G = std([GI1,…, GIN])
where GI equal to value of intensity of Gaussian fitted curve
HV ⇒ HQ
LV ⇒ LQ
Range: [0, 1]
Sharpness (SH) Captures steepness of a peak by summing the ratio of the difference between neighboring points and the point within the pair expected to have the lower value SH=i=2AIiIi1Ii1+i=AN1IiIi+1Ii+1 HV ⇒ HQ
LV ⇒ LQ
Triangle Peak Area Similarity Ratio (TPASR) Estimates shape quality by comparing peak area to area of triangle formed by the apex and boundaries TPASR=abs(trareapkarea)trarea
tr_area = 0.5 * N * IA
pkarea=i=1NIi
LV ⇒ HQ
HV ⇒ LQ
Zig-Zag Index (ZZ) Captures shape quality by measuring the normalized variance between a point and its immediate neighbor on either side ZZ=n=2n=N1(2InIn1In+1)2N*EPI2EPI=IAavg(I1+I2+IN1+IN) LV ⇒ HQ
HV ⇒ LQ
a

The variables used in the formulae of the metrics are defined as follows: Ii represents the value within the intensity vector of a peak at position i, i = {1,2,…,N}. ti represents the value within the retention time vector of a peak at position i, i = {1,2,…,N}. IA and tA represent the value of the maximum intensity (position A = { 1,2,…,N } and the retention time at the corresponding position, respectively; also, tHA represents the retention time at the position of half the maximum intensity. For metrics that require information from multiple samples (e.g. Elution Shift), the second index represents the sample of interest s = {1,2,…, S}, where S is the total number of samples. The formulae also make use of the following functions: avg() stands for average, std() stands for standard normalization, med() stands for median, sign() returns the sign of a real number, diff () is the contiguous pairwise differences between values in a sequence, and abs() stands for absolute value. Each metric is calculated for every individual sample within a peak, and the overall metric value for the peak is calculated as the mean of the sample-level values. For more information on each of these sets of metrics, please refer to the original publications (Eshghi et al 2018 and Zhang et al. 2014), or our MetaClean package

b

In this column, which indicates how the value of a metric indicates the quality of a peak, the abbreviations LV and HV stand for “Low Value” and “High Value” respectively, and LQ and HQ stand for “Low Quality” and “High Quality” respectively. Ranges are also specified for the metrics that are bounded; else, the range is (−∞, ∞)