Summary
of the data at each step of the workflow. Row 1: (Step
1) Inter-dataset distances for matched features in the (RT, MZ, log10FI) domains. Black dots are unique matches, blue circles
are matches in clusters, and orange dots are matches outside the log10FI threshold limits. Row 2: (Step 2a) Black dots are the
same as in Row 1, red circles are expected values at the (RT, MZ,
log10FI) of the reference feature in the match. Row 3:
(Step 2b) Residuals of the expected values. Row 4: (additional Step
2b) Normalized residuals obtained by dividing by the threshold point
at their median + 3 × MAD. Row 5: (Step 2d) After defining weights W = [1,1,0.2] (Step 2c, not shown) penalization scores are
obtained and used to color the same plots as in Row 1 (RT and MZ)
and the comparison of log10FI of target and reference.
Penalization scores are used (Step 2e, not shown) to decide the best
match in clusters with multiple matches. Row 6: (Step 3) Tightening
of thresholds used to define poor matches using the method “scores”
at the threshold limit of median + 3 × MAD. Matches (part of
clusters) previously discarded in blue, poor matches in red, and good
matches in black.