Skip to main content
Journal of Mass Spectrometry and Advances in the Clinical Lab logoLink to Journal of Mass Spectrometry and Advances in the Clinical Lab
editorial
. 2022 Mar 17;24:41–42. doi: 10.1016/j.jmsacl.2022.03.001

Clinical Pathology and the Data Science revolution

Dustin R Bunch a,b,, Daniel T Holmes c,d
PMCID: PMC8942826  PMID: 35340694

While clinical laboratory medicine has always been replete with data, there has been relatively little effort applied from within our discipline to leverage it for clinical, operational, and financial insights. While the concept of “Data Science” as a scientific discipline is not new [1], in the past decade the development, maturation and democratization open-source Data Science tools has made it possible for any determined laboratorian to incorporate their use into clinical practice. Data Science is a multidisciplinary field incorporating aspects of computer science, mathematics, statistics, and predictive analytics for the purposes of extracting actionable insights from large datasets produced by any business or scientific sector. As it pertains to healthcare, Data Science can be leveraged for everything from diagnostic decision support to workforce analysis to the automation of repetitive data entry tasks.

Sometimes the nomenclature used in Data Science is confusing as there are a number of frequently arising and overlapping concepts: data analytics, big data, artificial intelligence (AI) and machine learning (ML). The overlap makes precise definitions challenging.

Data Science itself has been defined by Kelleher and Tierney as “a set of principles, problem definitions, algorithms, and processes for extracting nonobvious and useful patterns from large data sets” [2] while Donoho defines it as “The science of learning from data; it studies the methods involved in the analysis and processing of data and proposes technology to improve methods in an evidence-based manner” [3].

Data Analytics can be considered a necessary subset of Data Science directed at performing data cleansing, data merging, descriptive statistical analysis, and data visualization tasks for the purposes of drawing meaningful insights. Data Analytics is more often focused on business intelligence rather than scientific inferences per se.

AI is a term coined in the 1950’s which has become very broad in its meaning. Fundamentally, AI is any computational process that creates the appearance of human intelligence. One can therefore leverage Data Science tools to build components of any AI system. ML is a subset of AI wherein the computational algorithm is able to learn from experience (that is through the provision of new example cases) without being explicitly programmed to do so [4]. Lee and Durant give a simplified beginner’s tutorial of ML applied mass spectrometry data with code provided in both python and R to suit the reader [5]. Like AI, the development of any machine learner will involve a large component of Data Science.

Big Data more or less refers to data sets that are large, unwieldly, and can be characterized by the 3V’s of volume, velocity, and variety [6], [7]. Volume indicates the amount of data generated overwhelms traditional software tools and specific strategies (e.g. distributed computing) are required to perform the computational tasks [8]. Velocity indicates that the data is generated and accumulates rapidly. Variety, in the healthcare context, is easy to understand: numerical results, narrative notes and reports in the clinical chart, images from scanned external reports, radiology, anatomical pathology, and sequencing data from genomic studies. Clinical laboratory data fulfills this definition of Big Data with sources including the laboratory information systems, electronic health record, analytical instruments, and other ancillary systems. These data are frequently non-standardized and unstructured in their formats, requiring data cleansing/merging (“wrangling”) prior to analysis.

The data-science tasks intrinsic to laboratory medicine have created a need for the clinical laboratorian to adopt new tools which include programming languages (R, Python, Julia), literate programming tools (Markdown), and web-app development tools (Shiny, Dash) and deployment strategies designed for reliability and long-term stability (e.g. use of cloud infrastructure, containers). These tools allow software development to fill gaps where no commercial solutions exist. Method validation, QC/QA, data automation, instrument interfacing, automated reports, and dashboard creation are typical targets for application development in clinical laboratory medicine. Examples of these can be found in Haymond’s article on creating dashboards for business intelligence [9] and Geistanger’s automated workflow for stability data [10].

Every area of the healthcare system is increasingly affected by Data Science and as more data is generated and stored, so also develops the need to leverage it for clinical and operational insights. The scientific leadership of Mass Spectrometry and Advances in the Clinical Laboratory (MSACL.org) recognized the need for Data Science education nearly a decade ago and began promoting it as a discipline in clinical laboratory medicine through short courses, seminars, and now a special issue in JMSACL. Many of the clinical laboratory Data Science thought-leaders have either taken or contributed to MSACL short courses and have supported this special issue through articles or peer review. As an emerging area of research and operational interest, the gathering of the insights and experiences of the laboratory community seems essential to the development of the next generation of clinical laboratory scientists. The goal of this Data Science Special Issue is to draw on the laboratory community’s knowledge to showcase the wide variety of Data Science applications deployed in laboratory medicine.

We have gathered articles that span a large swath of Data Science applications in the clinical laboratory from the aforementioned article by Haymond that demonstrates creating business intelligence dashboards [9], automated specimen stability statistics by Geistanger et al. [10], reproducible manuscripts in R Markdown [11], and workflows for continuous [12] and indirect reference intervals [13] to the applications to mass spectrometry including mass spectrometry imagining by Shedlock et al. [14] and Balluff et al. [15], MS quality metrics by Wilkes et al. [16] and Pablo et al. [17], and MS ML with Lee et al. [5].

We hope that this issue serves to inspire others to engage with Data Science in their workplaces.



Respectfully,

Dustin Bunch and Dan Holmes

References

  • 1.C. Hayashi What is data science? Fundamental concepts and a heuristic example. Vol. Tokyo: Springer Japan, 1998. pp. 40-51.
  • 2.J.D. Kelleher, B. Tierney, Data Science. Cambridge, Massachusetts: The MIT Press; 2018. xi, p. 264.
  • 3.Donoho D. 50 years of data science. J. Comput. Graph. Stat. 2017;26:745–766. doi: 10.1080/10618600.2017.1384734. [DOI] [Google Scholar]
  • 4.Rashidi H.H., Tran N.K., Betts E.V., Howell L.P., Green R. Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Acad. Pathol. 2019;6 doi: 10.1177/2374289519873088. 2374289519873088. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 5.Lee E.S., Durant T.J.S. Supervised machine learning in the mass spectrometry laboratory: A tutorial. J. Mass Spectrom. Adv. Clin. Lab. 2022;23:1–6. doi: 10.1016/j.jmsacl.2021.12.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Dash S., Shakyawar S.K., Sharma M., Kaushik S. Big data in healthcare: Management, analysis and future prospects. J. Big Data. 2019;6:54. doi: 10.1186/s40537-019-0217-0. [DOI] [Google Scholar]
  • 7.Tolan N.V., Parnas M.L., Baudhuin L.M., Cervinski M.A., Chan A.S., Holmes D.T., Horowitz G., Klee E.W., Kumar R.B., Master S.R. “Big data” in laboratory medicine. Clin. Chem. 2015;61:1433–1440. doi: 10.1373/clinchem.2015.248591. [DOI] [PubMed] [Google Scholar]
  • 8.Zaharia M., Xin R.S., Wendell P., Das T., Armbrust M., Dave A., Meng X., Rosen J., Venkataraman S., Franklin M.J., Ghodsi A., Gonzalez J., Shenker S., Stoica I. Apache spark: A unified engine for big data processing. Commun. ACM. 2016;59:56–65. doi: 10.1145/2934664. [DOI] [Google Scholar]
  • 9.Haymond S. Create laboratory business intelligence dashboards for free using r: A tutorial using the flexdashboard package. J. Mass Spectrom. Adv. Clin. Lab. 2022;23:39–43. doi: 10.1016/j.jmsacl.2021.12.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Geistanger A., Braese K., Laubender R. Automated data analytics workflow for stability experiments based on regression analysis. J. Mass Spectrom. Adv. Clin. Lab. 2022;24:5–14. doi: 10.1016/j.jmsacl.2022.01.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 11.Holmes D.T., Mobini M., McCudden C.R. Reproducible manuscript preparation with RMarkdown application to JMSACL and other Elsevier journals. J. Mass Spectrom. Adv. Clin. Lab. 2021;22:8–16. doi: 10.1016/j.jmsacl.2021.09.002. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Holmes D.T., van der Gugten J.G., Jung B., McCudden C.R. Continuous reference intervals for pediatric testosterone, sex hormone binding globulin and free testosterone using quantile regression. J. Mass Spectrom. Adv. Clin. Lab. 2021;22:64–70. doi: 10.1016/j.jmsacl.2021.10.005. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Bunch D.R. Indirect reference intervals using an R pipeline. J. Mass Spectrom. Adv. Clin. Lab. 2022;24:22–30. doi: 10.1016/j.jmsacl.2022.02.004. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Shedlock C.J., Stumpo K.A. Data parsing in mass spectrometry imaging using r studio and cardinal: A tutorial. J. Mass Spectrom. Adv. Clin. Lab. 2022;23:58–70. doi: 10.1016/j.jmsacl.2021.12.007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Balluff B., Heeren R.M.A., Race A.M. An overview of image registration for aligning mass spectrometry imaging with clinically relevant imaging modalities. J. Mass Spectrom. Adv. Clin. Lab. 2022;23:26–38. doi: 10.1016/j.jmsacl.2021.12.006. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Wilkes E.H., Whitlock M.J., Williams E.L. A data-driven approach for the detection of internal standard outliers in targeted LC-MS/MS assays. J. Mass Spectrom. Adv. Clin. Lab. 2021;20:42–47. doi: 10.1016/j.jmsacl.2021.06.001. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Pablo A., Hoofnagle A.N., Mathias P.C. Listening to your mass spectrometer: An open-source toolkit to visualize mass spectrometer data. J. Mass Spectrom. Adv. Clin. Lab. 2022;23:44–49. doi: 10.1016/j.jmsacl.2021.12.003. [DOI] [PMC free article] [PubMed] [Google Scholar]

Articles from Journal of Mass Spectrometry and Advances in the Clinical Lab are provided here courtesy of Elsevier

RESOURCES