interlab: A Python Module for Analyzing Interlaboratory Comparison Data

David A Sheen

doi:10.6028/jres.124.006

. 2019 Mar 15;124:1–2. doi: 10.6028/jres.124.006

interlab: A Python Module for Analyzing Interlaboratory Comparison Data

David A Sheen ¹

PMCID: PMC7339679 PMID: 34877156

1. Summary

interlab was developed as a software tool to perform consensus analysis on spectral data from interlaboratory studies. It is designed to estimate the spread in the spectral data and to identify possible outliers among both spectral populations and facilities in the study. Use of this code allows researchers to identify laboratories producing data closest to the consensus values, thereby ensuring that untargeted studies are using the most precise data available to them. The software was originally developed for analyzing nuclear magnetic resonance spectroscopic data [1, 2] but can be applied to nearly any array data, including Raman or Fourier-transform infrared spectroscopy and gas or liquid chromatography. Details on the implementation of the code can be found in Ref. [1].

The input for the code consists of a set of sample labels identifying the physical objects measured in the interlaboratory study, facility labels that identify the facility of origin of the measurements, and the data themselves. It is the user’s responsibility to format the data and metadata so that the code can read it. In addition, the user must specify the distance function that will be used to compare the spectra and the statistical distribution that these distances will be fit to; it is the user’s responsibility to ensure the data are suitable for the distance function they choose, e.g., if the distance function is valid only for probability mass functions, then each sample’s data must be sum-normalized and everywhere nonnegative. One example is given in the example notebook, https://pages.nist.gov/interlab_py/analysis_demo.html.

Given the input data, the code will perform the following tasks:

Calculates the interspectral distances based on the user-selected distance function;
Fits the user-selected distribution function to the distance data and calculates the corresponding probability scores (for a normal distribution, this is the Z-score);
Identifies outliers within each spectral population;
Conducts a principal components analysis on the probability scores (e.g., Z-scores) and compute the projected statistical distance;
Uses the projected statistical distance to determine the data set outliers.

The software cannot be used out of the box. Users must create an interface to their own software, and that interface will be specific to the user’s application. The example notebook, https://pages.nist.gov/interlab_py/analysis_demo.html, demonstrates one such possible interface.

2. Software Specifications

NIST Operating Unit(s)	Material Measurement Laboratory
Category	Uncertainty analysis
Targeted Users	PIs for large interlaboratory studies
Operating System(s)	Cross-platform, where Python, NumPy, and SciPy are installed
Programming Language	Python 3.6
Inputs/Outputs	This software is not usable out-of-the-box. The user must write additional code to interface their model with this software.
Documentation	https://pages.nist.gov/interlab_py
Accessibility	N/A small-scale research tool
Disclaimer	https://www.nist.gov/director/licensing

Open in a new tab

3. Methods for Validation

The code largely consists of scipy [3] and scikit-learn [4] functions, which are already tested. The outputs of each stage of the code are analyzed and validated in the example notebook, https://pages.nist.gov/interlab_py/analysis_demo.html.

Biography

About the author: David Sheen is a Physicist in the Chemical Sciences Division at NIST. His work is on uncertainty analysis in complex systems including high-temperature chemical kinetics and metabolomics. The National Institute of Standards and Technology is an agency of the U.S. Department of Commerce.

4. References

[1].Sheen DA, Rocha WFC, Lippa KA, Bearden DW (2017) A scoring metric for multivariate data for reproducibility analysis using chemometric methods. Chemometrics and Intelligent Laboratory Systems 162:10–20. 10.1016/j.chemolab.2016.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Rocha WFC, Sheen DA, Bearden DW (2018) Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation. Analytical and Bioanalytical Chemistry 410(24):6305–6319. 10.1007/s00216-018-1240-2 [DOI] [PubMed] [Google Scholar]
[3].Jones E, Oliphant T, Peterson P (2001) SciPy: Open source scientific tools for Python. Available at http://www.scipy.org/
[4].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830. [Google Scholar]

[R1] [1].Sheen DA, Rocha WFC, Lippa KA, Bearden DW (2017) A scoring metric for multivariate data for reproducibility analysis using chemometric methods. Chemometrics and Intelligent Laboratory Systems 162:10–20. 10.1016/j.chemolab.2016.12.010 [DOI] [PMC free article] [PubMed] [Google Scholar]

[R2] [2].Rocha WFC, Sheen DA, Bearden DW (2018) Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation. Analytical and Bioanalytical Chemistry 410(24):6305–6319. 10.1007/s00216-018-1240-2 [DOI] [PubMed] [Google Scholar]

[R3] [3].Jones E, Oliphant T, Peterson P (2001) SciPy: Open source scientific tools for Python. Available at http://www.scipy.org/

[R4] [4].Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12:2825–2830. [Google Scholar]

PERMALINK

interlab: A Python Module for Analyzing Interlaboratory Comparison Data

David A Sheen

1. Summary

2. Software Specifications

3. Methods for Validation

Biography

4. References

ACTIONS

PERMALINK

RESOURCES

Cite

Add to Collections

PERMALINK

interlab: A Python Module for Analyzing Interlaboratory Comparison Data

David A Sheen

1. Summary

2. Software Specifications

3. Methods for Validation

Biography

4. References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases