. Author manuscript; available in PMC: 2021 Feb 19.

Published in final edited form as: Metabolomics. 2020 Oct 21;16(11):117. doi: 10.1007/s11306-020-01738-3

Table 2.

Names, descriptions, formulas, and interpretations of the 11 peak integration quality metrics used in our study, categorized into the sets they were obtained/adapted from

Name	Description	Formula^a	Interpretation^b
M7 (Eshghi et al.)
Apex-Boundary Ratio (ABR)	Uses boundary-over-apex intensity ratio to assess completeness of integration	$A B R = \frac{max (I_{1}, I_{N})}{I_{A}}$	LV ⇒ HQ HV ⇒ LQ Range: (0,1]
Elution Shift (ES)	Assesses retention time shift of samples by comparing time-position of peak apex	$E S = \frac{a b s (t_{A, s} - m e d (t_{A, 1}, \dots, t_{A, S}))}{P B}$ $P B = avg (t_{N, 1}, \dots, t_{N, S}) - avg (t_{1, 1}, \dots, t_{1, S})$	LV ⇒ HQ HV ⇒ LQ HV ⇒ HQ
FWHM2Base (F2B)	Assesses separation of peaks by measuring peak-width at half-max vs. peak-width at base	$F 2 B = \frac{t_{H A, 2} - t_{H A, 1}}{t_{N} - t_{1}}$ If t_HA,2 does not exist, F2B=0	LV ⇒ LQ Range: (0,1]
Jaggedness (J)	Captures shape quality by calculating the number of changes in direction over length of intensity vector	D = diff ([I₁,…,I_N]) D′ = {sign(d), d ∈ D, ifd > ff * I_A0, d ∈ D, ifd < ff * I_A $J = \frac{sum (D^{'})}{N}$	LV ⇒ HQ HV ⇒ LQ
Modality (M)	Measures the first unexpected change in direction of intensity to detect splitting and integration of multiple peaks	M = maxDip/I_A maxDip = I_lr − I_ff	LV ⇒ HQ HV ⇒ LQ
RT Consistency (RTC)	Assesses retention time alignment of samples by comparing the time at the center index of the time vector	$R T C = \frac{a b s (avg ({ct}_{1}, \dots, c t_{S}) - c t_{s})}{avg (c t_{1}, \dots, c t_{S})}$ ct_s = t_N,s − (t_N,s − t_1,s)/2	LV ⇒ HQ HV ⇒ LQ
Symmetry (SY)	Measures correlation between left and right halves of a peak	$S Y = cor ([I_{1}, \dots, I_{\frac{N}{2}}], [I_{\frac{N}{2}}, \dots, I_{N}])$	HV ⇒ HQ LV ⇒ LQ Range: [−1,1]
M4 (Zhang et al.)
Gaussian Similarity (GS)	Measures similarity of a peak to Gaussian-fitted curve	$GaussianSimilarity = \frac{C \cdot G}{‖ C ‖ \cdot ‖ G ‖}$ C = std([I_,, .., I_N]) G = std([GI₁,…, GI_N]) where GI equal to value of intensity of Gaussian fitted curve	HV ⇒ HQ LV ⇒ LQ Range: [0, 1]
Sharpness (SH)	Captures steepness of a peak by summing the ratio of the difference between neighboring points and the point within the pair expected to have the lower value	$S H = \sum_{i = 2}^{A} \frac{I_{i} - I_{i - 1}}{I_{i - 1}} + \sum_{i = A}^{N - 1} \frac{I_{i} - I_{i + 1}}{I_{i + 1}}$	HV ⇒ HQ LV ⇒ LQ
Triangle Peak Area Similarity Ratio (TPASR)	Estimates shape quality by comparing peak area to area of triangle formed by the apex and boundaries	$TPASR = \frac{a b s (t r_{-} area - p k_{-} area)}{t r_{-} area}$ tr_area = 0.5 * N * I_A $pkarea = \sum_{i = 1}^{N} I_{i}$	LV ⇒ HQ HV ⇒ LQ
Zig-Zag Index (ZZ)	Captures shape quality by measuring the normalized variance between a point and its immediate neighbor on either side	$Z Z = \frac{\sum_{n = 2}^{n = N - 1} {(2 I_{n} - I_{n - 1} - I_{n + 1})}^{2}}{N * E P I^{2}}$ $E P I = I_{A} - avg (I_{1} + I_{2} + I_{N - 1} + I_{N})$	LV ⇒ HQ HV ⇒ LQ

The variables used in the formulae of the metrics are defined as follows: Ii represents the value within the intensity vector of a peak at position i, i = {1,2,…,N}. t_i represents the value within the retention time vector of a peak at position i, i = {1,2,…,N}. I_A and t_A represent the value of the maximum intensity (position A = { 1,2,…,N } and the retention time at the corresponding position, respectively; also, t_HA represents the retention time at the position of half the maximum intensity. For metrics that require information from multiple samples (e.g. Elution Shift), the second index represents the sample of interest s = {1,2,…, S}, where S is the total number of samples. The formulae also make use of the following functions: avg() stands for average, std() stands for standard normalization, med() stands for median, sign() returns the sign of a real number, diff () is the contiguous pairwise differences between values in a sequence, and abs() stands for absolute value. Each metric is calculated for every individual sample within a peak, and the overall metric value for the peak is calculated as the mean of the sample-level values. For more information on each of these sets of metrics, please refer to the original publications (Eshghi et al 2018 and Zhang et al. 2014), or our MetaClean package

In this column, which indicates how the value of a metric indicates the quality of a peak, the abbreviations LV and HV stand for “Low Value” and “High Value” respectively, and LQ and HQ stand for “Low Quality” and “High Quality” respectively. Ranges are also specified for the metrics that are bounded; else, the range is (−∞, ∞)