Detection of impurities in m-cresol purple with Soft Independent Modeling of Class Analogy for the quality control of spectrophotometric pH measurements in seawater

Michael Fong; Yuichiro Takeshita; Regina Easley; Jason Waters

doi:10.1016/j.marchem.2024.104362

. Author manuscript; available in PMC: 2025 Feb 1.

Published in final edited form as: Mar Chem. 2024 Feb;259:10.1016/j.marchem.2024.104362. doi: 10.1016/j.marchem.2024.104362

Detection of impurities in m-cresol purple with Soft Independent Modeling of Class Analogy for the quality control of spectrophotometric pH measurements in seawater

Michael Fong ¹, Yuichiro Takeshita ², Regina Easley ¹, Jason Waters ¹

PMCID: PMC10895926 NIHMSID: NIHMS1968317 PMID: 38414838

Abstract

Accurate spectrophotometric pH measurements in seawater are critical to documenting long-term changes in ocean acidity and carbon chemistry, and for calibration of autonomous pH sensors. The recent development of purified indicator dyes greatly improved the accuracy of spectrophotometric pH measurements by removing interfering impurities that cause biases in pH that can grow over the seawater pH range to >0.01 above pH 8. However, some batches of purified indicators still contain significant residual impurities that lead to unacceptably large biases in pH for oceanic and estuarine climate quality measurements. While high-performance liquid chromatography (HPLC) is the standard method for verifying dye purity, alternative approaches that are simple to implement and require less specialized equipment are desirable. We developed a model to detect impurities in the pH indicator m-cresol purple (mCP) using a variant of the classification technique Soft Independent Modeling of Class Analogy (SIMCA). The classification model was trained with pure mCP spectra (350 nm to 750 nm at 1 nm resolution) at pH 12 and tested on independent samples of unpurified and purified mCP with varying levels of impurities (determined by HPLC) and measured on two different spectrophotometers. All the dyes identified as pure by the SIMCA model were sufficiently low in residual impurities that their apparent biases in pH were < 0.002 in buffered artificial seawater solutions at a salinity of 35 and over a pH range of 7.2 to 8.2. Other methods that can also detect residual impurities relevant to climate quality measurements include estimating the impurity absorption at 434 nm and assessing the apparent pH biases relative to a reference purified dye in buffered solutions or natural seawater. Laboratories that produce and distribute purified mCP should apply the SIMCA method or other suitable methods to verify that residual impurities do not significantly bias pH measurements. To apply the SIMCA method, users should download the data and model developed in this work and measure a small number of instrument standardization and model validation samples. This method represents a key step in the development of a measurement quality framework necessary to attain the uncertainty goals articulated by the Global Ocean Acidification Observing Network (GOA-ON) for climate quality measurements (i.e., ±0.003 in pH).

Keywords: spectrophotometric pH, seawater pH, SIMCA, m-cresol purple

Introduction

Seawater pH measurements are an increasingly important component of ocean carbon observations. As an indicator of acidity, pH provides a direct quantification of the overall change in ocean pH, which can be separated into contributions attributable to natural variability and the uptake of anthropogenic carbon dioxide (CO₂) from the atmosphere (Byrne et al., 2010). Furthermore, pH can provide additional information about the ocean CO₂ system when a second inorganic carbon parameter (e.g., dissolved inorganic carbon, total alkalinity, or partial pressure of CO₂) is measured (Byrne et al., 2010; Clayton et al., 1995). Although direct measurements of seawater pH have historically been sparse and mostly limited to shipboard measurements (e.g., Olsen et al., 2020), the number of seawater pH measurements has increased by orders of magnitude in recent years particularly due to the development of autonomous pH sensors (see examples in Johnson et al., 2016; Okazaki et al., 2017), which are being implemented on an international global profiling float array (Johnson et al., 2017; Matsumoto et al., 2022; Maurer et al., 2021).

While seawater pH measurement technologies are diverse, their calibration depends on spectrophotometric pH measurements with the indicator dye m-cresol purple, considered the benchmark measurement method for seawater pH (Dickson, 2010; Dickson et al., 2007). The development of the spectrophotometric pH method with colorimetric indicator dyes that began in the late 1980s (Byrne and Breland, 1989; Clayton and Byrne, 1993; Zhang and Byrne, 1996) made highly precise seawater pH measurements (with a repeatability standard deviation of ≈ 0.0004 in pH) possible and eventually routine on open ocean repeat hydrography cruises (Talley et al., 2016). These measurements underpin long-term observations of changes in ocean pH and carbon chemistry over multi-decadal timescales (Byrne et al., 2010) and are also used to validate autonomous pH sensor data (Carter et al., 2018; Williams et al., 2016).

Although the spectrophotometric method has the potential to offer high quality seawater pH measurements meeting the Global Ocean Acidification Observing Network’s (GOA-ON) proposed uncertainty goals for detecting long-term climate-related changes in ocean carbon chemistry (±0.003 in pH; Newton et al., 2014), its combined standard uncertainty is likely far larger than its uncertainty due to repeatability effects (Bockmon and Dickson, 2015). One of the largest sources of uncertainty in spectrophotometric pH measurements is from impurities in commercially available indicator dyes (Liu et al., 2011). These impurities vary in amount and nature between different manufacturers and between different lots from a single manufacturer. As some of the impurities absorb light strongly at one of the wavelengths used for spectrophotometric pH determination, they may result in errors in pH as large as 0.02 (Yao et al., 2007).

Since this discovery, purified dyes have been developed and produced by several laboratories (DeGrandpre et al., 2014; Liu et al., 2011; Patsavas et al., 2013; Rivaro et al., 2021), which has greatly improved the accuracy and consistency of spectrophotometric pH measurements. A recent study showed that pH measurements consistent to within 0.0012 of the pH measured with a reference dye over the seawater pH range (pH 7.2 to 8.2) can be attained with adequately purified dyes (for six out of eight batches of dyes tested). However, some batches of purified dyes were found to contain significant amounts of residual impurities leading to pH errors as large as −0.008, thus highlighting the need for better quality control of the dye purification process (Takeshita et al., 2021).

High performance liquid chromatography (HPLC) is currently the standard method for verifying the purity of the indicator dyes and is demonstrated to be effective at detecting residual impurities after purification (Takeshita et al., 2021). However, as HPLC requires expensive and specialized equipment, alternative approaches may be desirable. Douglas and Byrne (2017a) developed a simple spectrophotometric method for estimating the impurity absorption at 434 nm $({{}_{434}A}_{i m p})$ from measurements in high pH solutions where the absorbance due to the indicator dye is minimal at that wavelength. Although this method was originally developed as an impurity correction when using unpurified indicators, Takeshita et al. (2021) and Woosley (2021) suggest that the approach may be used to verify dye purity, where a pure dye should have an ${{}_{434}A}_{i m p}$ that is statistically indistinguishable from zero. However, there are outstanding questions regarding which molar absorption coefficient ratio (e.g., Degrandpre et al., 2014; Liu et al., 2011) is appropriate for calculating ${{}_{434}A}_{i m p}$ for dye purity assessments, and the approach may not always successfully distinguish pure from impure dyes (Takeshita et al., 2021; Woosley, 2021). More research is therefore needed to evaluate and validate this approach.

Spectrophotometric pH measurements will greatly benefit from a sensitive and accessible method for verifying dye purity. In this work, we developed a model to detect impurities in mCP using a variant of Soft Independent Modeling of Class Analogy (SIMCA), a classification technique based on Principal Components Analysis (PCA) (Wold, 1976; Zontov et al., 2017). SIMCA has been applied to problems such as the detection of adulterants in food and drugs and shown to perform more reliably than discriminant analysis techniques for authentication problems (Oliveri and Downey, 2012; Rodionova et al., 2016b). We applied this approach to the acceptance testing of purified mCP by constructing a SIMCA model from a training dataset consisting of the UV-visible spectra of pure mCP samples (verified by HPLC) in high pH solutions and testing the model on independent samples of mCP with varying degrees of purity and measured on two different spectrophotometers. Furthermore, we determined the apparent pH biases of test dye samples (relative to a reference pure dye) in buffered artificial seawater solutions to demonstrate the sensitivity of SIMCA to the levels of residual impurities relevant to climate quality pH measurements.

This approach offers a simple and effective alternative to HPLC for verifying the purity of mCP and can be easily implemented by laboratories that purify mCP and those that routinely make spectrophotometric pH measurements. The data and code for implementing the SIMCA model are provided in the Supplementary Information.

Background

Spectrophotometric pH measurements

The principles of spectrophotometric pH determination have been described in previous studies (Byrne and Breland, 1989; Clayton and Byrne, 1993; Zhang and Byrne, 1996) and is summarized in this section. The pH of a solution containing a pH-sensitive indicator dye is determined by estimating the relative proportions of the indicator’s acid (HI⁻) and base forms (I²⁻), typically from the ratio of the absorbances at the two wavelengths corresponding to the absorbance maxima of the two dye forms (i.e., R = A₅₇₈/A₄₃₄ for mCP). Liu et al. (2011) used an equation of the following form:

p H = p (K_{a} (H I^{-}) . e_{2}) + \log (\frac{R - e_{1}}{1 - \frac{R^{e_{3}}}{e_{2}}})

(1),

where e₁, e₂, and e₃ are ratios of the molar absorption coefficients (ε) of the two dye forms as defined in Eq. 2, and K_a(HI⁻)is the acid dissociation constant of mCP.

e_{1} = {{}_{578}ε}_{H I} / {{}_{434}ε}_{H I}, e_{2} = {{}_{578}ε}_{I} / {{}_{434}ε}_{H I}, e_{3} = {{}_{434}ε}_{I} / {{}_{434}ε}_{H I}

(2),

In Eq. 1, Liu et al. (2011) determined the constants e₁, e₃/e₂, and p(K_a(HI⁻)·e₂) over a range of temperatures (5 °C to 35 °C) and salinities (20 to 40) for purified mCP. These characterizations were extended to salinity 0 by Douglas and Byrne (2017b) and Müller and Rehder (2018) and to hypersaline media at sub-zero temperatures by Loucaides et al. (2017). The constants introduced in this section as well as other symbols and acronyms used in this manuscript are summarized in the glossary of terms and symbols.

Previous studies have found that dye impurities absorb more strongly at 434 nm than 578 nm. (Liu et al., 2011; Takeshita et al., 2021; Yao et al., 2007). Thus, when using an unpurified dye, the apparent pH calculated from Eq. 1 and the purified dye constants will be lower than the pH obtained with a purified dye, and this discrepancy will grow larger at high pH as the absorbance of the dye at 434 nm decreases.

Estimating the impurity absorption at 434 nm

At pH 12, nearly all the dye is in the basic form (I²⁻), which has minimum absorption at 434 nm. Thus, the signal-to-background ratio for estimating the impurity absorption is optimal at this wavelength. Douglas and Byrne (2017a) derived the following equation for estimating ${{}_{434}A}_{i m p}$ :

{{}_{434}A}_{i m p} = (1 - \frac{e_{3}}{e_{2}} R_{NaOH}) {{}_{434}A}_{NaOH}

(3),

where R_NaOH is the absorbance ratio of an impure dye solution measured at pH 12 (typically a solution with a NaOH amount content of 0.01 mol kg⁻¹ solution), ${{}_{434}A}_{NaOH}$ is the absorbance at 434 nm in the same solution, and e₃/e₂ is the reference value for one of the molar absorption coefficient ratios of purified mCP which is also determined at pH 12 (Liu et al., 2011).

Detection of impurities with SIMCA

This study used the data-driven version of SIMCA (DD-SIMCA) developed by Pomerantsev and Rodionova (2014) and implemented in a MATLAB program by Zontov et al. (2017). The SIMCA method as applied to the acceptance testing of a sample requires only pure samples for training the model and defining the acceptance boundaries. The procedure has three steps, the full details of which are in Pomerantsev (2008) and Pomerantsev and Rodionova (2014).

In the first step, a PCA model is developed from a training dataset consisting of measurements of the pure sample spectra. PCA (Pearson, 1901) decomposes the data matrix into orthogonal basis vectors (i.e., the eigenvectors, principal components, or “loadings”) and “scores” (position of the data along each eigenvector). The training dataset A is decomposed into

A = T V^{T} + E

(4)

where A is an I x J matrix (e.g., I spectra measured at J wavelengths), T is the I x N scores matrix, V^T is the J x N loadings matrix, and E is the I x J residuals matrix, and N is the number of principal components, which must be optimized.

For each training sample i, the results of the PCA decomposition are used to calculate the scores distance (h_i), representing the deviation of the sample within the model space, and orthogonal distance (v_i), representing the deviation from the model space:

h_{i} = \sum_{n = 1}^{N} \frac{t {(i, j)}^{2}}{λ_{n}}, v_{i} = \sum_{j = 1}^{J} e {(i, j)}^{2}

(5),

where t(i,j) and e(i,j) are the score and residual, respectively, for the ith sample in the jth column in their respective matrices and $λ_{n}$ is the nth eigenvalue of the variance-covariance matrix $A^{T} A$ .

The total distance c of the sample is

c = D F_{h} \frac{h}{h_{0}} + D F_{v} \frac{v}{v_{0}}

(6),

where DF_h , DF_v are degrees of freedom and h₀, v₀ are scaling factors. These parameters are unknown and are estimated from the training dataset (hence “data-driven” SIMCA), modeling the scores and orthogonal distances with a chi-squared distribution (Pomerantsev, 2008; Pomerantsev and Rodionova, 2014).

The third and final step is to determine the acceptance area for the pure samples for a given Type I error α (i.e., the probability of incorrectly rejecting a pure sample). The acceptance area is the region where the total distance of the sample is less than or equal to the critical distance c_crit for a defined α (typically 0.05). In other words,

c \leq c_{c r i t} (α)

(7),

where c_crit is the (1-α) quantile of the chi-squared distribution with DF_h + DF_v degrees of freedom.

To optimize the model, the number of principal components is varied and the Type I and Type II errors (i.e., probability of incorrectly accepting an adulterated sample) are evaluated on an independent test dataset consisting of pure and adulterated samples. The model with the optimal number of principal components is selected by either minimizing the Type I error only or by considering the best trade-off between the Type I and Type II errors, depending on the application (Rodionova et al., 2016a). Both Type I and Type II errors were optimized in the SIMCA model for mCP, as incorrectly accepting a dye with significant residual impurities presents a greater risk than incorrectly rejecting a pure sample while it is still desirable to minimize the rate of false rejections of a pure sample.

Once the optimized model and acceptance areas are constructed, the model can be applied to classify new samples. The classification results are typically visualized in an acceptance plot like in Fig. 1 where samples that fall within the region bound by the green curve are in the acceptance area. In Fig. 1, the x-axis and y-axis represent the scores and orthogonal distances, respectively, transformed onto a logarithmic scale for better visualization.

Fig. 1. — Acceptance plot showing the classification of mCP samples in the test dataset (see Table 1) with the optimized SIMCA model (with 5 principal components). Each point represents the classification result for a single measurement of that dye sample. The x and y axes can be interpreted as the scores and orthogonal distances, respectively. Measurements falling within the acceptance region bounded by the green curve (defining critical distances for a Type I error of 5 %) are classified as pure dyes.

In this study, the SIMCA model was trained with the UV-visible spectra (350 nm to 750 nm at 1 nm resolution) of pure mCP samples in a solution of sodium chloride (NaCl) and sodium hydroxide (NaOH) with a pH of 12 and an ionic strength of 0.7 mol kg⁻¹ solution. Once optimized, the model can be applied to classify new mCP samples based on their spectra measured in the same type of solution. Most impurities have substantially different spectra from mCP at pH 12 (see the examples in Liu et al., 2011 and Fig. 2 inset), and thus, are most easily detected at this pH. The SIMCA approach is similar to that of Douglas and Byrne (2017a), but uses information from the full spectrum rather than at a single wavelength.

Fig. 2. — Example chromatogram for the Alfa Aesar mCP at a detection wavelength of 434 nm. The percent area of the mCP peak (retention time = 9.5 min) was 98.82 % for this injection. Six minor peaks (labeled) associated with impurities were identified. The solid blue curve in the inset shows the UV-visible spectrum for the most abundant impurity (Peak 2), and the dashed purple curve shows the spectrum for mCP at pH 12.

Instrument standardization methods

Applying a SIMCA model developed on one spectrophotometer to classify samples measured on a different spectrophotometer requires a correction for differences in the instrumental response. Instrument standardization methods map the response of a secondary instrument to match the response of the primary instrument. The standardization typically requires that a subset of the samples used to develop the calibration model on the primary instrument be measured on a secondary instrument to determine the transformation function between the two instruments. In this study, the efficacy of the Direct Standardization (DS) and Piecewise Direct Standardization (PDS) methods were tested (Wang et al., 1991) as a preprocessing step when using SIMCA to classify samples measured on a secondary spectrophotometer. In this section, a brief introduction to the two instrument standardization methods is provided.

In DS, the full spectrum on the secondary instrument is used to fit each individual point on the spectrum measured on the primary instrument. The responses on both instruments are related to each other by

{\bar{R}}_{1} = {\bar{R}}_{2} F

(8),

Where ${\bar{R}}_{1}$ is the response matrix for a subset of the calibration samples measured on the primary instrument, ${\bar{R}}_{2}$ is the response matrix for the same samples measured on the secondary instrument, and F is the transformation matrix (dimensioned wavelength by wavelength). F is calculated from

F = {\bar{R}}_{2}^{+} {\bar{R}}_{1}

(9),

where ${\bar{R}}_{2}^{+}$ is the pseudoinverse of ${\bar{R}}_{2}$ .

PDS is a variant of DS which fits the response at each wavelength on the primary instrument to the responses over small wavelength windows on the secondary instrument. The transformation matrix F consists of local regression models determined from Principal Component Regression (PCR) or Partial Least Squares (PLS). Because the rank of response matrix for selected windows on the secondary instrument is smaller than the rank of the response matrix for full spectra, it is possible to use a smaller number of subset samples compared to DS, with some successful applications of PDS using as few as three instrument standardization samples (Wang et al., 1991).

The subset of samples selected for standardization must span the instrumental responses for the full range of samples measured on both instruments. The subset samples to be measured on the secondary instrument are typically selected based on their multivariate leverage. The sample with the greatest deviation from the mean of the calibration samples is selected as the first sample, and the rest of the samples are then orthogonalized to the first sample and the procedure repeated until the desired number of subset samples is obtained. The full derivation of the DS and PDS methods and details of the subset selection method are described in Wang et al. (1991).

Methods:

mCP samples and purity analysis

Nine lots of mCP powder (listed in Table 1) were obtained from six sources (Alchem Laboratories Corporation, Ricca Chemical Company, Acros Organics, Alfa Aesar, Loba Chemie, and the University of South Florida). The dyes from the University of South Florida (FB4, Flash 1, and USF in Table 1) and Loba Chemie were the molecular acid form of mCP. All the other dyes were the sodium salt form of mCP. Except for the dyes from Acros, Ricca, and Alfa Aesar, all other dyes were purified following the procedures in Patsavas et al. (2013).

Table 1.

Information on the mCP dye samples (percent purity at 434 nm from HPLC, number of dye solutions prepared, number of spectra measured) included in the (a) training and (b) test datasets for the SIMCA model. The classification results with the SIMCA model are given as the percentage of measurements accepted as a pure dye.

(a) Training Set (n = 78 spectra)
Dye Lot	% Purity^a	No. solns	No. spectra	% Accepted^b
AL883-67	99.95	3	37	97.3
FB4	100	1	12	100
USF	100	1	29	93.1

(b) Test Set (n = 93 spectra)
Dye Lot	% Purity^a	No. solns	No. spectra	% Accepted^b
Acros A0313094	98.65	2	12	0
Ricca 2404675	98.75	1	12	0
Alfa Aesar N07A013	98.84	1	12	0
AL883-41	99.1	2	6	0
Loba Chemie MCP/1/16004	99.5	2	6	0
AL883-22	99.86	3	18	83.3
AL883-67t	99.95	7	21	95.2
Flash 1	100	1	6	83.3

Open in a new tab

From HPLC

Percentage of measurements accepted as a pure dye by the optimized SIMCA model in Figure 1.

The purity of each dye was determined by HPLC analysis. Chromatographic separation was performed on the dyes using a Dionex Ultimate 3000 LC system with a photodiode array detector and a linear gradient method that varied the composition of the mobile phase from 75 % Solvent A (95 % ammonium formate/formic acid buffer and 5 % methanol) and 25 % Solvent B (5 % ammonium formate/formic acid buffer and 95 % methanol) to 100 % Solvent B over 24 minutes at a constant flow rate of 0.3 mL min⁻¹. The ammonium formate/formic acid buffer had a total formate amount concentration of 5 mmol L⁻¹ and a pH of ≈ 3.5. The HPLC column was a Restek Ultra Aqueous C18 column (3.0 μm particle size, 2.1 mm inner diameter, 100 mm length), which was kept at 25 °C in a thermostatted oven during analysis. Dye solutions (with an amount content of ≈ 0.6 mmol kg⁻¹ solution mCP) were prepared for the HPLC analysis by dissolving the mCP in water or Solvent A and filtering with 0.45 μm nylon filters. The injection volume was 15 μL. The purity of the sample was determined from the percent area of the mCP peak on the chromatogram at a detection wavelength of 434 nm (Fig. 2). The percent purity values determined by HPLC for each lot of dye are given in Table 1. While these estimates do not represent the true organic purity of mCP, they reflect the optical purity at one of the critical wavelengths for spectrophotometric pH determination where impurities are known to interfere strongly.

Preparation of solutions for spectrophotometric measurements

mCP stock solutions (with a mCP amount content of ≈ 2 mmol kg⁻¹ solution) were prepared for spectrophotometric analyses by dissolving the appropriate amount of mCP powder in a background of ≈ 0.7 mol kg⁻¹ solution of NaCl and adjusting the pH of the dye solutions with HCl or NaOH solution to between approximately pH 7.21 to 8.33, measured with a glass electrode calibrated with Orion Ross buffer solutions (pH 7.00 ± 0.01 and pH 10.01 ± 0.02 at 25 °C, as per the manufacturer’s claimed acceptance ranges). For the dyes in the molecular acid form, the dye solutions were first sonicated for 1 hour in the NaCl background solution with ≈ 0.025 mol kg⁻¹ solution amount content of NaOH to facilitate dissolution before the final pH adjustment. The dye solutions were prepared in a NaCl background to minimize changes in ionic strength when adding the dye to the samples (also in a NaCl background) during the spectrophotometric analysis. Replicate stock solutions were prepared for some of the dyes (see Table 1). AL883-67t and AL883-67 were the same lot of material, with AL883-67t being a portion of the material used in a separate study to characterize the homogeneity and AL883-67 the leftover material from the bottling of AL883-67t. The dye solution labeled “USF” was from one of the USF dye lots (Flash 1 or FB4) and was prepared as a reference solution for the study of the homogeneity of AL883-67t. However, the exact lot used to prepare this solution was not recorded. The USF and AL883-67t dye solutions were not prepared in a NaCl background. All dye solutions were stored at ambient laboratory temperatures and shielded from light.

High pH solutions (pH 12) were prepared for measurements of the UV-visible spectra of the dyes by adding a standardized NaOH solution (Fisher Scientific sodium hydroxide solution, certified 0.995 mol L⁻¹ to 1.005 mol L⁻¹ solution, Lot 215879) to a NaCl background solution to attain a final NaOH amount content of 0.01 mol kg⁻¹ solution and an ionic strength of 0.7 mol kg⁻¹ solution. These high pH solutions were prepared and used for the spectrophotometric analyses within 24 hours.

Buffered artificial seawater solutions ranging from pH ≈ 7.2 to ≈ 8.2 (covering the pH range of seawater) with a salinity of 35 were prepared for assessing the pH biases of the dyes relative to the reference dye. Four buffers (4 L each) with different Tris/Tris-HCl ratios (to obtain evenly spaced pH values from 7.2 to 8.2) and a constant total Tris molality of 0.08 mol kg⁻¹ H₂O were prepared in an artificial seawater background following the procedures described in Pratt (2014).

Spectrophotometric measurements

An Agilent Cary 100 spectrophotometer was used to measure the UV-visible spectra of the mCP solutions and to make spectrophotometric pH measurements. The photometric and wavelength accuracy of the Agilent Cary 100 spectrophotometer were assessed from measurements of National Institute of Standards and Technology (NIST) Standard Reference Material SRM 930d and SRM 1930d Glass Filters for Spectrophotometry and NIST SRM 2034 Holmium Oxide Solution Wavelength Standard to verify that the instrument meets the performance requirements for spectrophotometric pH measurements of seawater samples. The spectrophotometer was operated at a spectral bandwidth of 2 nm and initially on the double beam mode with a 10 cm water-jacketed Suprasil quartz flow-through cell filled with 18.2 MΩ-cm water in the reference beam and a cell of identical material and construction filled with the sample solution in the sample beam. However, one of the cells became unavailable part way through the study, so the spectrophotometer was operated on the single beam mode for the remainder of the study (starting in July 2022).

The cell temperatures were maintained at approximately 25 °C by a Lauda Brinkman RM6 recirculating water bath connected to the water-jacketed 10 cm cell and measured with Omega Model HSRTD-3-100-A-240-E Resistance Temperature Detectors (RTD) placed in the inflow and outflow of the jacketed portion of the cell. The reported measurement temperature was taken as the average of the two temperatures. The RTDs were calibrated against a Hart Scientific 5628 Platinum Resistance Thermometer (PRT), which was calibrated against secondary PRTs maintained by the NIST Calibration Group.

The Apollo SciTech spectrophotometric seawater pH analyzer system (Model AS-pH1) was used to automate the dispensing of the sample solution and dye. The dye addition was adjusted to achieve absorbances within the range of 0.2 to 1.2 at 434 nm and 578 nm. The background solution (sample without dye) and sample solution with dye were both equilibrated for 90 seconds in the cell to reach the desired temperature. The background spectrum was measured once and the sample spectrum three times. The background-subtracted sample spectra were then averaged and adjusted for the baseline shift by subtracting the average of the absorbances from 725 nm to 735 nm from the absorbances at each wavelength, following the approach of Carter et al. (2013). This measurement process was used for measurements of the dye spectra in the pH 12 solutions as well as for the pH measurements in Tris buffers.

Because of the strong temperature sensitivity of the Tris pH (DelValls and Dickson, 1998), precautions to ensure complete thermal equilibration in the Tris buffers prior to the pH measurements were taken by keeping the Tris buffers in a water bath overnight at the target temperature and during the pH measurements. However, despite these precautions, the length of tubing from the bottle to the sample dispensing syringe may have resulted in heat loss during the transfer of the sample into the cell, and thus our measurements may have needed a longer equilibration time. The effects of measurement temperature variability on the Tris buffer measurements are discussed in the Results. The Tris buffer measurements were conducted at 25.04 °C ± 0.007 °C (mean ± std. dev.) for the pH ≈ 8.2 buffer and at 25.12 °C ± 0.019 °C for the other three buffers. The combined standard uncertainty of the temperature, which includes the uncertainty of the mean and the calibration uncertainties for the two temperature probes, was estimated to be 0.011 °C for the pH ≈ 8.2 buffer measurements and 0.012 °C for the other buffers. Measurements of the mCP spectra in NaOH solutions were conducted at 25.02 °C ± 0.038 °C (mean ± std. dev.), with a combined standard uncertainty of 0.012 °C.

Evaluating apparent pH biases

The differences in pH measured with each dye relative to the reference USF dye solution were evaluated in Tris-buffered artificial seawater solutions to estimate the apparent pH bias due to dye impurities. The constants of Liu et al. (2011) were used to calculate pH from Eq. 1. For each lot of dye listed in Table 1, one of the replicates was chosen for the apparent pH bias assessment. The pH of each dye solution was measured once in each of the four buffers, while the reference dye solution was typically measured three times at the beginning of the run and then once after every three measurements with the other dyes for a total of 9 to 11 reference measurements per run. Erroneous measurements were identified and repeated. The apparent pH biases were estimated from the difference between the pH measured with the sample dye and the average of the reference dye pH measurements.

Δ pH = {pH}_{samp} - {\bar{pH}}_{ref}

(10),

The expanded uncertainty $U_{Δ pH}$ (representing the 95 % coverage probability) of the differences was calculated from Eq. 11 with the appropriate coverage factor k calculated from the Welch-Satterthwaite equation (JCGM, 2008).

U_{Δ pH} = k \sqrt{u_{{pH}_{samp}}^{2} + u_{{\bar{pH}}_{ref}}^{2}}

(11)

$u_{{\bar{pH}}_{ref}}$ is the standard uncertainty of the reference dye pH, estimated from the standard deviation of the mean. $u_{{pH}_{samp}}$ is the standard uncertainty of the sample dye pH values, assumed to have the same uncertainty as the reference dye pH measurements. As pH_samp represents a single measurement, $u_{{pH}_{samp}}$ was estimated from the standard deviation of the reference dye pH measurements rather than the standard deviation of the mean.

The expanded uncertainties ranged from 0.0008 to 0.0014 (k = 2.24 to 2.18 with 9.8 to 11.8 effective degrees of freedom), and thus only ΔpH values larger than these values are significantly different from zero at 95 % coverage probability. The differences and the expanded uncertainties are plotted in Fig. 3.

Fig. 3. — Differences in pH relative to the reference USF dye solution (ΔpH) measured in buffered artificial seawater solutions (approximate pH range: 7.2 to 8.2) with seven lots of purified mCP and three lots of unpurified mCP. The error bars represent the expanded uncertainty (at 95 % coverage probability) of the difference. ΔpH values that fall within the dashed red lines are within the GOA-ON tolerance limit for *climate quality* pH measurements (±0.003 in pH). Note that the y-axis for the unpurified dyes is different.

Evaluation of ₄₃₄A_imp

₄₃₄A_imp was calculated for each dye from the measurements at pH 12 to evaluate the effectiveness of this approach for detecting impurities. Because there have been questions as to which value of e₃/e₂ should be used in Eq. 3, ₄₃₄A_imp was calculated using the constants of Liu et al. (2011), Degrandpre et al. (2014), and the value calculated from repeated measurements of the reference USF dye over the course of this study (n = 29 measurements, e₃/e₂ = 1/R at pH 12). ₄₃₄A_imp values were calculated from spectra normalized by the isosbestic absorbance (the absorbance at 488.1 nm, calculated as the weighted average of the absorbances at 488 nm and 489 nm as in Carter et al. (2013)) so that the values represent the ₄₃₄A_imp at an approximately constant dye amount content.

For a dye to be considered impure, it should have a positive ₄₃₄A_imp value with an uncertainty defining a two-sided interval that does not overlap with zero at a defined coverage probability. For the ₄₃₄A_imp values calculated with literature values of e₃/e₂, the 95 % confidence interval of the mean ₄₃₄A_imp was calculated for each dye, which represents the precision of the mean ₄₃₄A_imp estimated from repeated measurements of the dye over the course of the study. However, any biases in the literature values of e₃/e₂ will necessarily affect the interpretation of the calculated ₄₃₄A_imp values (see later discussion in Results).

For the ₄₃₄A_imp values calculated with the e₃/e₂ determined in this work, the expanded uncertainty of the mean $(U ({}_{434}{\bar{A}}_{i m p}))$ at the 95 % coverage probability was calculated by propagating the uncertainties of the means of e₃/e₂, ₅₇₈A_NaOH, and ₄₃₄A_NaOH in Eq. 3 and multiplying the combined standard uncertainty by the appropriate coverage factor (k).

U ({}_{434}{\bar{A}}_{i m p}) = k \sqrt{{(\frac{\partial_{434} A_{i m p}}{\frac{\partial^{e_{3}}}{e_{2}}})}^{2} u_{\frac{{\bar{e}}_{3}}{e_{2}}}^{2}} + {(\frac{\partial_{434} A_{i m p}}{\partial_{578} A_{NaOH}})}^{2} u_{{}_{578}{\bar{A}}_{NaOH}}^{2} + {(\frac{\partial_{434} A_{i m p}}{\partial_{434} A_{NaOH}})}^{2} u_{{}_{434}{\bar{A}}_{NaOH}}^{2}

(12)

\frac{\partial_{434} A_{i m p}}{\frac{\partial^{e_{3}}}{e_{2}}} = - {}_{578}{\bar{A}} {}_{NaOH}, \frac{\partial_{434} A_{i m p}}{\partial_{578} A_{NaOH}} = - \frac{{\bar{e}}_{3}}{e_{2}}, \frac{\partial_{434} A_{i m p}}{\partial_{434} A_{NaOH}} = 1

(13)

The uncertainties of the input parameters (e₃/e₂, ₅₇₈A_NaOH, and ₄₃₄A_NaOH) were estimated from the standard deviations of the means of the parameters $(\bar{X})$ calculated from repeated measurements of the dyes over the course of the study. The sensitivity coefficients were calculated using the mean values of the parameters in Eq. 13. The expanded uncertainty calculated from Eq. 12 thus includes only intermediate precision effects on ₄₃₄A_imp. Details of the uncertainty analysis are provided in Table S1. Because both the reference and test sample dyes were measured on the same spectrophotometer within the same time period, measurement biases that affect the terms e₃/e₂, R_NaOH, and ₄₃₄A_NaOH in Eq. 3 are likely the same and thus cancel when calculating ₄₃₄A_imp. Furthermore, the purity of the reference dye was verified, so biases in ₄₃₄A_imp due to dye impurities are negligible. Therefore, although the calculated uncertainties do not explicitly include systematic contributions, these systematic uncertainty sources do not affect the interpretation of the ₄₃₄A_imp values calculated with our own e₃/e₂.

SIMCA model training and validation

mCP spectra in pH 12 solutions were measured over a period of eight months (March 2022 to November 2022) to build the dataset for the SIMCA model. The full dataset was divided into a training dataset (78 spectra) and a test dataset (93 spectra) as shown in Table 1. The AL883-67, FB4, and USF dye solutions were chosen to be included in the training dataset, as their purity determined from HPLC was ≥99.95 %. The test dataset consisted of measurements of the other dye solutions which include unpurified and purified dyes (with varying degrees of purity).

In addition to the background and baseline corrections as described previously, the spectra were normalized to unit length vectors to minimize variations from differences in dye amount content. An exploratory PCA analysis was then conducted on the full dataset to identify bad measurements. The first principal component of the dataset represents the mean spectrum of mCP at pH 12, while the other components reflect other contributions to the variance of the data, including the effects of impurities and instrumental factors. One of the principal components was exclusively associated with a single measurement (Sample 17), and thus this sample was excluded from further analysis (Fig. S1). Additional outliers in the training dataset were identified by running the DD-SIMCA program in the robust PCA mode and excluding measurements that fell outside the outlier significance level (γ = 0.001) on the acceptance plot following the protocol described in Pomerantsev and Rodionova (2014) and Zontov et al. (2017). The SIMCA model was constructed from the cleaned training dataset, setting the α level to 0.05 and varying the number of principal components from 1 to 10. The Type I error was calculated using the training dataset to verify that it is close to the value used to define the acceptance area, thus indicating that the model is fitting the training data adequately, while the test dataset was used to validate the predictive quality of the models and select the optimal model based on the Type I and Type II errors (Fig. 4).

Fig. 4. — The percentage of samples misclassified by the SIMCA model as a function of the number of principal components used in the model. The percent error for the “pure” dye samples in the training and test datasets represent Type I errors, and the percent errors for the other dyes with residual impurities represent Type II errors.

Testing the SIMCA model on a different spectrophotometer

The performance of the SIMCA model was tested on the data of Takeshita et al. (2021) who measured the spectra of 9 different lots of purified mCP in NaOH/NaCl solutions (pH ≈ 12) at 25 °C on an Agilent 8453 spectrophotometer. This dataset contained a total of 29 spectra, which was divided into a calibration dataset (used for instrument standardization) consisting of up to 6 spectra from Dye 1 (the reference pure dye in their study) and a test dataset consisting of 9 dye spectra with detectable residual impurities (Dye 6, 8, and 9) and 14 pure dye spectra. Dye 7 was not included in our analysis, because this dye was measured on a different date one year apart from the other measurements and did not have sufficient accompanying measurements of the standardization samples to successfully apply instrument standardization techniques. The raw spectra from Takeshita et al. (2021)’s dataset were corrected for the background and baseline change and normalized to unit length prior to further analysis.

Dye 1 was selected as the instrument standardization samples and matched with the same number of spectra from a subset of the FB4 measurements in the dataset in Table 1. Although these dyes may not have been from the same lot, they were both produced by the same laboratory at USF and of sufficiently similar purity to be used as standardization samples. Two sets of three spectra of Dye 1 were measured 11 days apart which included the day that the other dye samples were measured. Thus, the set of measurements used for instrument standardization was representative of the short-term variability of the spectrophotometer at the time that the test samples were measured. DS and PDS were tested for the standardization. The conditions for both methods were optimized by varying the number of standardization samples from 3 to 6 and the window size (for PDS) from 51 to 301 and choosing the conditions that minimized the Type I and Type II errors in the test dataset. The Dye 1 spectra not used for standardization were included in the test dataset. The subset selection and instrument standardization were performed with the functions available in the PLS Toolbox version 8.9.2 from Eigenvector Research, Inc.

Results

Assessment of dye purity from apparent pH biases

The apparent pH biases measured in the Tris/artificial seawater buffers over a pH range of ≈ 7.2 to ≈ 8.2 indicated that five of the seven lots of purified dyes (≥ 99.86 % purity from HPLC) had consistent pH performance with the reference USF dye to within 0.0019 units, with most of the ΔpH values indistinguishable from zero within the precision of the measurements (Fig. 3). The remaining two lots of purified dyes (AL883-41 and Loba Chemie) with higher levels of residual impurities (99.1 % and 99.5 % purity from HPLC, respectively) showed significant pH-dependent ΔpH values as large as −0.0065 at pH ≈ 8.19 (Fig. 3). The unpurified mCP from Acros, Alfa Aesar, and Ricca showed even stronger pH-dependent differences, which were as large as −0.0075 at pH ≈ 7.19 and −0.0156 at pH ≈ 8.19 and thus fell outside the GOA-ON tolerance limits for climate quality pH measurements over the full pH range. These observations are consistent with previous studies which reported significant pH-dependent biases in unpurified dyes and inadequately purified dyes (Douglas and Byrne, 2017a; Liu et al., 2011; Takeshita et al., 2021; Yao et al., 2007).

These results demonstrate that measurements of the apparent pH biases in buffered solutions is an effective approach to evaluating whether a dye sample has been adequately purified to be fit for purpose for climate quality pH measurements. However, this approach necessarily requires a sample of known pure dye for comparison, and its sensitivity is limited by the precision in measuring ΔpH. With a typical repeatability standard deviation of 0.0004 for spectrophotometric pH measurements, the expected precision in ΔpH, expressed as an expanded uncertainty (95 % coverage probability), is 0.0008. Values for the expanded uncertainty in ΔpH, shown by the error bars in Fig. 3, were slightly larger than this value, ranging from 0.0009 for the pH ≈ 7.52 buffer to 0.0014 for the pH ≈ 8.19 buffer. While most of the additional variance in ΔpH could be explained by the temperature variability during the measurements, this was not the case for measurements of the pH ≈ 8.19 buffer, where the temperature variability was estimated to contribute 0.0002 to the standard uncertainty of the pH measurements, leaving an additional unexplained standard uncertainty contribution of 0.0004. Although the exact source of the additional uncertainty is unknown, it is possible that gas exchange over the course of the run (≈ 11 h.) could have contributed to increased variability in the pH measurements. Therefore, our detection limit for the apparent pH biases is conservatively estimated as 0.0014.

Assessment of dye purity from ₄₃₄A_imp

Assessing dye purity from ₄₃₄A_imp can be ambiguous because the values are dependent on the value of e₃/e₂ used in Eq. 3, of which there are several options from the literature. Additionally, without information on the uncertainty of the literature e₃/e₂ values, it is difficult to gauge whether the calculated ₄₃₄A_imp values are significantly different from zero. All eight lots of the purified dyes in this study had negative ₄₃₄A_imp values when calculated using the Liu et al. (2011) value of e₃/e₂ in Eq. 3 (Fig. 6). The two lots with higher levels of residual impurities (AL883-41 and Loba Chemie) had ₄₃₄A_imp values closest to zero, even though these two dyes contained significant levels of impurities (see Fig. 3). This suggests a bias in the ₄₃₄A_imp values calculated with the e₃/e₂ of Liu et al. (2011). Using the Degrandpre et al. (2014) value of e₃/e₂ in Eq. 3, AL883-41 and Loba Chemie had positive ₄₃₄A_imp values significantly different from zero. However, most of the other purified dyes that had consistent pH performance with the reference dye (with the exception of AL883-67t and AL883-22) still had negative ₄₃₄A_imp values that were significantly different from zero. The unpurified dyes (Acros, Ricca, and Alfa Aesar) had significantly positive ₄₃₄A_imp values across all choices of e₃/e₂ used in the calculation (Fig. 6).

Fig. 6. — Estimated impurity absorption at 434 nm (₄₃₄A_imp, normalized to an isosbestic absorbance of 1) for each lot of mCP in Table 1. The ₄₃₄A_imp values were calculated according to Eq. 3 with three different values of the constant e₃/e₂ as indicated in the legend: from Liu et al. (2011), Degrandpre et al. (2014), and this study (calculated from measurements of the USF dye). The error bars represent the 95 % confidence interval of the mean (for ₄₃₄A_imp values calculated with the literature constants) or expanded uncertainties (at the 95 % coverage probability, for ₄₃₄A_imp values calculated with the e₃/e₂ determined in this work).

Previous studies (Takeshita et al., 2021; Woosley, 2021) also found negative ₄₃₄A_imp values for purified dyes when using the Liu et al. (2011) value of e₃/e₂ and that ₄₃₄A_imp values calculated with the DeGrandpre et al. (2014) value of e₃/e₂ were more consistent with dye purity. These studies attributed the difference between the Liu et al. (2011) and DeGrandpre et al. (2014) values of e₃/e₂ (which were determined in an artificial seawater and NaCl background, respectively) to the effect of the background medium and recommend that the DeGrandpre et al. (2014) value of e₃/e₂ is used for calculating ₄₃₄A_imp as the measurements for ₄₃₄A_imp assessment are also made in a NaCl background. However, this does not explain the negative ₄₃₄A_imp values we observed in some batches of purified dye when calculated with the DeGrandpre et al. (2014) value of e₃/e₂.

An alternative hypothesis is that there may be residual impurities in the batches of dye characterized in other studies. Any dye with a higher purity would therefore have a negative ₄₃₄A_imp value when calculated with the values of e₃/e₂ from those studies. However, limited data on the e₃/e₂ of mCP in artificial seawater in the literature and questions about the purity of the dyes make it impossible to rule out either hypothesis. (See Table S2 for a comparison of the literature values of e₃/e₂ in various media and the values measured in this study.) Additionally, measurement biases on different spectrophotometers could contribute to the discrepancies. Further research is needed to investigate the cause of the negative ₄₃₄A_imp values and the differences in the e₃/e₂ of mCP in artificial seawater and NaCl solutions.

The ₄₃₄A_imp values calculated using the e₃/e₂ value of the reference USF dye in this study were slightly different from the ₄₃₄A_imp values calculated with the e₃/e₂ value of DeGrandpre et al. (2014), which may be due to differences in the purity of the two dyes or biases in the measurements of e₃/e₂ on different spectrophotometers. When calculated with the e₃/e₂ value of the USF dye, the ₄₃₄A_imp values identified AL883-22, AL883-67t, and AL883-67 as having residual impurities, which is consistent with results from HPLC analysis (Fig. 6, Table 1). FB4 had a slight negative ₄₃₄A_imp value that was significantly different from zero. Compared to the ₄₃₄A_imp assessments made with the e₃/e₂ value of DeGrandpre et al. (2014), the ₄₃₄A_imp values calculated with the e₃/e₂ value of this study were more sensitive to residual impurities, such that the impurities were detectable even in a dye that was 99.95 % pure (e.g., AL883-67t and AL883-67).

Optimization and performance of the SIMCA model

The Type I and Type II errors in the test dataset were minimized with a four principal component SIMCA model (Fig. 4). However, this model resulted in higher Type I error (≈ 14 %) in the dataset of Takeshita et al. (2021) and poorer discrimination between the dye samples with residual impurities compared to the five principal component model (see next section and Fig. S2). It may be that an additional principal component is required to model data measured on a different spectrophotometer due to residual errors after instrument standardization. The five principal component model was therefore chosen as the optimal model, despite having slightly higher Type I and Type II errors in the primary test dataset.

With five principal components, the Type I error was 3.9 % and 7.4 % when evaluated with the training (n = 76 pure dye spectra, 2 outliers excluded) and test datasets (n = 27 pure dye spectra), respectively, with both values close to the target of 5 % Type I error. The Type II error was dependent on the level of impurities in the dyes. Unpurified dyes with high levels of impurities (e.g., Acros, Alfa Aesar, and Ricca) were always correctly identified by models with greater than 2 principal components (i.e., 0 % Type II error). On the other hand, the Type II error for the purified dyes with residual impurities (e.g, AL883-22, AL883-41, and Loba Chemie) was highly sensitive to the number of principal components in the model. AL883-22 (99.86 % purity from HPLC) appeared to be near the limit of detection and had among the highest Type II error (> 75 %) for any number of principal components tested. Models with greater than five principal components resulted in 100 % Type II error for AL883-41, with no significant improvement in the Type I error, while models with fewer than three principal components had large Type II errors for the AL883-22, AL883-41, and Loba Chemie dyes.

The classification results obtained from the optimized model for the test dataset are visualized in the acceptance plot (Fig. 1). As shown in Fig. 1 and Fig. 4, the optimized SIMCA model correctly identified most of the dyes, including purified dyes with residual impurities. Although AL883-22 was the exception, ≈ 17 % of the measurements fell outside the acceptance region, making it unlikely that this group of measurements represents a pure dye given a Type I error of 7.4 %. Unlike the ₄₃₄A_imp assessments, SIMCA was unable to detect residual impurities at the level of 99.95 % purity in AL883-67t, because the model was trained with dye samples with ≥ 99.95 % purity. Compared to assessments of dye purity from apparent pH biases, SIMCA was more sensitive, detecting even the very low level of residual impurities in AL883-22, while the apparent pH biases did not indicate this dye as impure (Fig. 1 and Fig. 3). Furthermore, all the dyes identified as pure by the SIMCA model had apparent pH biases smaller than 0.002 (absolute value), thus establishing its suitability as a method for verifying dye purity for climate quality pH measurements over the seawater pH range.

Performance of the SIMCA model on different spectrophotometers

The five principal component SIMCA model correctly classified the dye samples in the dataset of Takeshita et al. (2021) after performing instrument standardization to adjust for differences between spectrophotometers. The smallest Type I and Type II errors were obtained with PDS using six standardization samples and a window size of 101 (Fig. 5, Fig. S3). DS was also tested, but the resulting Type II error was inferior to that of PDS, likely because of the small number of standardization samples available in the dataset (Fig. S3). With the optimized conditions for PDS, the SIMCA model correctly identified all the pure dyes and Dyes 6, 8, and 9 as containing impurities, which is in agreement with the results of Takeshita et al. (2021). Dyes 8 and 9 contained impurities identified from HPLC, while Dye 6 did not appear to have impurities detectable by HPLC and assessment of ₄₃₄A_imp. However, this dye showed pH-dependent pH differences that grew to ≈ 0.003 at pH > 8, suggesting that it may have contained lower levels of impurities than Dyes 8 and 9. Accordingly, on the SIMCA acceptance plot (Fig. 5), Dye 6 was near the detection limit just beyond the boundary of the acceptance region.

Fig. 5. — Acceptance plot showing the classification of the dye samples in the Takeshita et al. (2021) dataset with the optimized SIMCA model. The raw spectra were transformed with Piecewise Direct Standardization to correct for instrumental differences prior to classification with the SIMCA model. The dye samples are identified by the same numbers as in Takeshita et al. (2021).

Discussion

Comparison of alternative methods for verifying dye purity

In this study, a novel and sensitive method for detecting dye impurities using SIMCA was developed. This method and two other alternative methods to HPLC were evaluated and found to be suitable for verifying that residual impurities in purified dyes are sufficiently low such that the apparent pH biases are within the GOA-ON climate quality tolerance limits. Additionally, these alternative methods may have an advantage over HPLC in that they do not require a minimum separation in retention times between the impurities and mCP, while this may be a source of bias in purity assessments from HPLC if there are impurities close to the mCP peak and separation conditions are not optimized (Rivaro et al., 2021). These alternative methods can therefore serve as an additional check on dye purity. The advantages and disadvantages of each approach recommendations on their use are summarized in this section.

Assessment of apparent pH biases relative to a reference purified dye provides the most direct evaluation of the dye performance as a pH indicator. However, this approach requires a complex preparation of buffered artificial seawater solutions over the range of seawater pH or access to natural seawater samples. Since the dye impurities are known to have the largest impact at high pH (Douglas and Byrne, 2017a; Liu et al., 2011; Takeshita et al., 2021; Yao et al., 2007), it may be sufficient to check the apparent pH bias at a single pH (e.g., pH ≈ 8.2). However, if a lot-specific impurity correction curve is desired, then it will be necessary to measure the apparent pH biases over the full seawater pH range (Yao et al., 2007).

₄₃₄A_imp assessments require relatively simple measurements in a NaOH solution in NaCl background at pH 12. However, due to the uncertainty in the literature values of e₃/e₂, or possibly due to the lack of information on their uncertainty, ₄₃₄A_imp estimates calculated with these values provide an ambiguous assessment of dye purity. This study and previous studies (Takeshita et al., 2021; Woosley, 2021) suggest that the ₄₃₄A_imp approach could be a viable and sensitive approach to assessing dye purity. However, the implementation of this method could be improved. Because the calculated values of ₄₃₄A_imp are highly sensitive to e₃/e₂, which can vary between different instruments, over time on the same instrument, and with differences in the purity of a dye batch, we recommend that the user measures the e₃/e₂ value on a lot of known pure dye over the same time period as the measurements on the test dye sample and calculating ₄₃₄A_imp values relative to the reference dye. The uncertainty in ${}_{434}{\bar{A}}_{i m p}$ can then be estimated by propagating the uncertainties of the various parameters in Eq. 3, estimated from the standard deviation of the mean of the measurements. To achieve the same sensitivity to residual impurities as demonstrated in Fig. 6 (i.e., the same expanded uncertainty), a sufficient number of measurements of the sample and reference dye is required. While each of the ₄₃₄A_imp values in Fig. 6 were calculated from over 30 measurements (of the sample and reference dye combined) over the course of the study, a similar uncertainty could be achieved with fewer measurements under repeatability conditions. For instance, the repeatability standard deviations of our absorbance measurements at 434 nm and 578 nm were on average 0.0011 and 0.0062, respectively, and thus an expanded uncertainty in ₄₃₄A_imp of ≤ 0.001 (similar to the values in Table S1) could be achieved with a minimum of 11 measurements of the sample and reference dye (total of 22 measurements). Users will need to determine the number of measurements needed to achieve their desired uncertainty on their instrument.

Both ₄₃₄A_imp assessments and SIMCA can be more sensitive to low levels of impurities than apparent pH bias assessments. Ensuring accurate classification results with SIMCA, however, will require that the user validate and periodically check the model performance on their instrument, and thus, this approach will require more measurements than ₄₃₄A_imp or apparent pH bias assessments.

Guidelines for using the SIMCA model

Users applying the SIMCA model to classify dye samples on their spectrophotometers will need to measure a small number of instrument standardization samples and additional validation samples. Our results suggest that adequate instrument standardization can be achieved with PDS with as few as 6 standardization samples. However, users will need to optimize the number of standardization samples and window size for PDS for their instrument. The standardization can be achieved by repeated measurements of the spectra at pH 12 of any of the same lot of pure dye in Table 1 (or a dye with verified sufficiently similar purity, ≥ 99.95 %) and matching these measurements with a subset of our measurements of that dye using the subset selection method described in the Background section. These measurements should be spread out over the same time period that the validation and unknown samples are measured to capture the full variability of the spectrophotometer response at that time.

Validation samples are needed to optimize the instrument standardization parameters and verify model performance. This set of measurements should include a known pure dye (ideally from a different lot than the one used for standardization), an unpurified dye, and a purified dye with residual impurities (if available). From these measurements, the user should verify that all the unpurified dye measurements fall outside the acceptance region and that the Type I error of the model is within expected or acceptable levels. Evaluating the Type I error will require making a sufficiently large number of measurements on a pure dye. (e.g., No more than 1 of 20 measurements should fall outside the acceptance region for a Type I error of 5 %.) If a purified dye with residual impurities is available, these measurements can provide additional information on the sensitivity of the model to low levels of impurities. Alternatively, the user can prepare a pure dye solution contaminated with a small amount of unpurified dye corresponding to an apparent pH bias of ≈ 0.003 or smaller at pH ≈ 8.

Because spectrophotometers may drift over time, users will need to verify and maintain the performance of the model, especially if a substantial amount of time had elapsed since the measurement of the standardization and validation samples. The model can be maintained by periodically measuring additional standardization and validation samples to check that model errors are still within acceptable limits and updating the instrument standardization if corrective action is necessary (see Wise and Roginski (2015)). Alternatively, if the SIMCA model is routinely used for acceptance testing, a small number of standardization and validation samples can be included in each measurement run to continuously collect data for maintaining the model.

Because the UV-visible absorption spectra of mCP are dependent on the solution medium (Liu et al., 2011), unknown dye samples must be tested in the same medium that was used to train the SIMCA model, regardless of the medium of the dye’s intended application. Additionally, a cell with good transmission from 350 nm to 750 nm should be used for the measurement. Provided these basic requirements are met, the model, which has been validated against HPLC, can be reliably applied to verify the purity of mCP intended for use across a wide range of salinities in oceanic, coastal, and estuarine waters.

Recommendations for acceptance testing with SIMCA

Laboratories that produce and distribute purified dyes should conduct thorough tests to verify that the purity is within specifications. When conducting acceptance testing on an unknown sample of purified dye, at least 12 measurements on the sample should be made to assess whether the number of measurements that fall outside the acceptance region are within expectations given the model Type I and Type II errors. While the model Type I error is low (≈ 5 %), the Type II error may be >75 % for a dye with low levels of residual impurities near the model detection limit. Hence, a sufficient number of measurements is needed to confidently classify the dye. If a significant number of measurements fall outside the acceptance region, additional tests (i.e., apparent pH bias assessments in buffered solutions or seawater samples) can be performed to check if the apparent pH biases are within the climate quality tolerance limits. These guidelines are summarized in the flowchart in Fig. 7.

Fig. 7. — Flowchart summarizing the procedures and guidelines for using the SIMCA model for acceptance testing of purified mCP.

Conclusion

In this study, we developed a model to detect impurities in a widely used indicator dye for seawater pH measurements and demonstrated its performance on dye samples measured on two different spectrophotometers. Widespread and consistent implementation of procedures for verifying dye purity will ensure the consistency of spectrophotometric pH measurements and conformance with GOA-ON climate quality requirements for pH (±0.003 uncertainty in pH). This work therefore represents a key step towards establishing a quality control framework for seawater pH measurements.

While the SIMCA method has excellent sensitivity to residual dye impurities, any of the methods evaluated in this study are suitable for assessing dye purity. We recommend that laboratories that purify and distribute mCP conduct rigorous acceptance testing, using the most suitable method for their capabilities and, ideally, cross-checking against a second method. A key step to the harmonization of these various approaches is an appropriate reference material which will enable assessment of dye purity relative to a common standard. While such a reference material is not yet available, the SIMCA method and the other methods evaluated in this study are still useful for ensuring the quality of purified mCP.

Finally, we advocate for further community discussion and agreement on quality specifications for the purity of indicator dyes and standard procedures for verifying dye purity, as well as continued interlaboratory comparisons to confirm the consistency of purified dyes and spectrophotometric pH measurements. These discussions and exercises will be critical especially as several different methods for purifying dyes have been developed (Liu et al., 2011; Patsavas et al., 2013; Rivaro et al., 2021) and more laboratories are producing purified dyes.

Supplementary Material

NIHMS1968317-supplement-Supplementary_Material.docx^{(175.2KB, docx)}

Acknowledgements

We would like to thank Benjamin Place for help with the HPLC measurements of the mCP samples. Xinyu Li prepared the solutions of AL883-67t. M.B. Fong was supported by a fellowship from the NIST NRC Postdoctoral Research Associateship Program. Work and personnel at MBARI was funded by the David and Lucile Packard Foundation, NSF OCE-2049117, and NSF OCE-1736864. We also thank David Duewer, David Sheen, Kevin Coakley, and two anonymous reviewers whose comments helped to improve this manuscript. Certain commercial equipment, instruments, or materials are identified in this manuscript to foster understanding. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the materials or equipment identified are necessarily the best available for the purpose.

Glossary of terms and symbols

General acronyms

DS, PDS: Direct Standardization, Piecewise Direct Standardization
GOA-ON: Global Ocean Acidification Observing Network
HPLC: High performance liquid chromatography
mCP: m-cresol purple
NIST: National Institute of Standards and Technology
PCA, PCR: Principal Components Analysis, Principal Components Regression
PLS: Partial Least Squares
PRT: Platinum Resistance Thermometer
RTD: Resistance Temperature Detector
SIMCA: Soft Independent Modeling of Class Analogy
DD-SIMCA: Data-Driven Soft Independent Modeling of Class Analogy
SRM: Standard Reference Material
USF: University of South Florida

mCP and spectrophotometric pH

HI⁻, I²⁻: Singly protonated and fully deprotonated species of mCP
A₄₃₄, A₅₇₈: Absorbance measured at 434 nm and 578 nm, respectively
₄₃₄ A _imp: Absorbance of a mCP impurity at 434 nm
₄₃₄A_NaOH, ₅₇₈A_NaOH: Absorbance at 434 nm and 578 nm, respectively, measured in a mCP solution and a NaOH background at pH 12
ε: Molar absorption coefficient
₄₃₄ε_HI, ₅₇₈ε_HI: Molar absorption coefficients of species HI⁻ and I²⁻ measured at 434 nm
₄₃₄ε_I, ₅₇₈ε_I: and 578 nm as indicated in the subscripts
e₁, e₂, e₃: Molar absorption coefficient ratios as defined in Eq. 2
e₃/e₂: Ratio of e₃ and e₂, determined from a measurement at pH 12
ΔpH: Difference in pH measured with the sample dye relative to the reference dye, as defined in Eq. 10.
pH_samp: pH measured with the sample dye
${\bar{pH}}_{ref}$: Mean of the pH values measured with reference dye
R: Ratio of the absorbance at 578 nm to the absorbance at 434 nm
R _NaOH: The ratio R measured in a NaOH solution at pH 12

Uncertainty propagation

k: Coverage factor used to estimate the expanded uncertainty
DF_eff: Effective degrees of freedom used to estimate the coverage factor
$u_{{\bar{pH}}_{ref}}$: Standard uncertainty of the reference dye pH, estimated from the standard deviation of the mean. This is an uncertainty contribution to ΔpH
$u_{{pH}_{samp}}$: Standard uncertainty of the sample dye pH. This is an uncertainty contribution to ΔpH
U _ΔpH: Expanded uncertainty of ΔpH at 95% coverage probability
$u_{\frac{{\bar{e}}_{3}}{e_{2}}}, u_{{}_{434}{\bar{A}}_{NaOH}}, u_{{}_{578}{\bar{A}}_{NaOH}}$: Standard uncertainties of the means of e₃/e₂, ₄₃₄A_NaOH, and ₅₇₈A_NaOH, respectively. These are contributions to the uncertainty in ₄₃₄A_imp
$\frac{\partial {{}_{434}A}_{i m p}}{\partial \frac{e_{3}}{e_{2}}}, \frac{\partial {{}_{434}A}_{i m p}}{\partial {{}_{578}A}_{NaOH}}, \frac{\partial {{}_{434}A}_{i m p}}{\partial {{}_{434}A}_{NaOH}}$: Sensitivity coefficients calculated as the partial derivative of ₄₃₄A_imp with respect to the input parameters e₃/e₂, ₅₇₈A_NaOH, and ₄₃₄A_NaOH. Used in the the uncertainty propagation calculations to calculate the uncertainty in ₄₃₄A_imp (Eq. 12).
$U ({}_{434}{\bar{A}}_{i m p})$: Expanded uncertainty of the mean of ₄₃₄A_imp at 95% coverage probability

SIMCA

A: Data matrix of spectra (e.g., I spectra measured at J wavelengths)
T: PCA scores matrix
V ^T: PCA loadings matrix
E: Residuals matrix in a PCA model
t(i, j): PCA score for the ith sample in the jth column of the data matrix
e(i,j): Residual for the ith sample in the jth column of the data matrix
λ_n: nth eigenvalue of the variance-covariance matrix A^TA
h_i, v_i: Scores distance and orthogonal distance, respectively, of the sample i in the PCA model space
c: Total distance, which is a combination of the scores and orthogonal distance as defined in Eq. 6
c_crit: Critical distance for defining the acceptance region. See Eq. 7.
h₀, v₀: Scaling factors for h and v, used for calculating the total distance c
DF_h, DF_v: Degrees of freedom for h and v
α: Type I error
γ: Outlier significance level

Instrument Standardization

${\bar{R}}_{1}$: Response matrix for a subset of the calibration samples measured on the primary instrument
${\bar{R}}_{2}$: Response matrix for a subset of the calibration samples measured on the secondary instrument
F: Transformation matrix which maps the response of the secondary Instrument to the primary instrument

References

2021. PLS_Toolbox with MIA_Toolbox 8.9.2. Eigenvector Research, Inc., Manson, WA USA: 98831. [Google Scholar]
Byrne RH and Breland JA, 1989. High-Precision Multiwavelength pH Determinations in Seawater Using Cresol Red. Deep-Sea Research Part a-Oceanographic Research Papers, 36(5): 803–810. [Google Scholar]
Byrne RH, Mecking S, Feely RA and Liu X, 2010. Direct observations of basin-wide acidification of the North Pacific Ocean. Geophysical Research Letters, 37(2). [Google Scholar]
Carter BR et al. , 2018. Updated methods for global locally interpolated estimation of alkalinity, pH, and nitrate. Limnology and Oceanography: Methods, 16(2): 119–131. [Google Scholar]
Carter BR, Radich JA, Doyle HL and Dickson AG, 2013. An automated system for spectrophotometric seawater pH measurements. Limnology and Oceanography: Methods, 11(1): 16–27. [Google Scholar]
Clayton TD and Byrne RH, 1993. Spectrophotometric Seawater pH Measurements - Total Hydrogen-Ion Concentration Scale Calibration of m-Cresol Purple and at-Sea Results. Deep-Sea Research Part I-Oceanographic Research Papers, 40(10): 2115–2129. [Google Scholar]
Clayton TD et al. , 1995. The role of pH measurements in modern oceanic CO₂-system characterizations: Precision and thermodynamic consistency. Deep Sea Research Part II: Topical Studies in Oceanography, 42(2): 411–429. [Google Scholar]
DeGrandpre MD et al. , 2014. Considerations for the measurement of spectrophotometric pH for ocean acidification and other studies. Limnology and Oceanography: Methods, 12(12): 830–839. [Google Scholar]
DelValls TA and Dickson AG, 1998. The pH of buffers based on 2-amino-2-hydroxymethyl-1,3-propanediol (‘tris’) in synthetic sea water. Deep Sea Research Part I: Oceanographic Research Papers, 45(9): 1541–1554. [Google Scholar]
Dickson AG, 2010. The carbon dioxide system in seawater: equilibrium chemistry and measurements. In: Riebesell U, Fabry VJ, Hansson L and Gattuso J-P (Editors), Guide to best practices for ocean acidification research and data reporting. Publications Office of the European Union Luxembourg, pp. 260. [Google Scholar]
Dickson AG, Sabine CL and Christian JR, 2007. Guide to Best Practices for Ocean CO₂ Measurements. PICES Special Publication, 191 pp. [Google Scholar]
Douglas NK and Byrne RH, 2017a. Achieving accurate spectrophotometric pH measurements using unpurified meta-cresol purple. Marine Chemistry, 190: 66–72. [Google Scholar]
Douglas NK and Byrne RH, 2017b. Spectrophotometric pH measurements from river to sea: Calibration of mCP for 0≤S≤40 and 278.15≤T≤308.15K. Marine Chemistry, 197: 64–69. [Google Scholar]
JCGM, 2008. Evaluation of measurement data—Guide to the expression of uncertainty in measurement, pp. 1–116. [Google Scholar]
Johnson KS et al. , 2016. Deep-Sea DuraFET: A Pressure Tolerant pH Sensor Designed for Global Sensor Networks. Analytical Chemistry, 88(6): 3249–3256. [DOI] [PubMed] [Google Scholar]
Johnson KS et al. , 2017. Biogeochemical sensor performance in the SOCCOM profiling float array. Journal of Geophysical Research: Oceans, 122(8): 6416–6436. [Google Scholar]
Liu X, Patsavas MC and Byrne RH, 2011. Purification and characterization of meta-cresol purple for spectrophotometric seawater pH measurements. Environmental Science & Technology, 45(11): 4862–4868. [DOI] [PMC free article] [PubMed] [Google Scholar]
Loucaides S et al. , 2017. Characterization of meta-Cresol Purple for spectrophotometric pH measurements in saline and hypersaline media at sub-zero temperatures. Scientific Reports, 7(1): 2481. [DOI] [PMC free article] [PubMed] [Google Scholar]
Matsumoto GI et al. , 2022. The Global Ocean Biogeochemistry (GO-BGC) Array of Profiling Floats to Observe Changing Ocean Chemistry and Biology. Marine Technology Society Journal, 56(3): 122–123. [Google Scholar]
Maurer TL, Plant JN and Johnson KS, 2021. Delayed-Mode Quality Control of Oxygen, Nitrate, and pH Data on SOCCOM Biogeochemical Profiling Floats. Frontiers in Marine Science, 8. [Google Scholar]
Müller JD and Rehder G, 2018. Metrology of pH Measurements in Brackish Waters—Part 2: Experimental Characterization of Purified meta-Cresol Purple for Spectrophotometric pH_T Measurements. Frontiers in Marine Science, 5. [Google Scholar]
Newton JA, Feely RA, Jewett EB, Williamson P and Mathis J, 2014. Global Ocean Acidification Observing Network: Requirements and Governance Plan. [Google Scholar]
Okazaki RR et al. , 2017. Evaluation of marine pH sensors under controlled and natural conditions for the Wendy Schmidt Ocean Health XPRIZE. Limnology and Oceanography: Methods, 15(6): 586–600. [Google Scholar]
Oliveri P and Downey G, 2012. Multivariate class modeling for the verification of food-authenticity claims. TrAC Trends in Analytical Chemistry, 35: 74–86. [Google Scholar]
Olsen A et al. , 2016. The Global Ocean Data Analysis Project version 2 (GLODAPv2) – an internally consistent data product for the world ocean. Earth Syst. Sci. Data, 8(2): 297–323. [Google Scholar]
Patsavas MC, Byrne RH and Liu X, 2013. Purification of meta-cresol purple and cresol red by flash chromatography: Procedures for ensuring accurate spectrophotometric seawater pH measurements. Marine Chemistry, 150: 19–24. [Google Scholar]
Pearson K, 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11): 559–572. [Google Scholar]
Pomerantsev AL, 2008. Acceptance areas for multivariate classification derived by projection methods. Journal of Chemometrics, 22(11-12): 601–609. [Google Scholar]
Pomerantsev AL and Rodionova OY, 2014. Concept and role of extreme objects in PCA/SIMCA. Journal of Chemometrics, 28(5): 429–438. [Google Scholar]
Pratt KW, 2014. Measurement of pHT values of Tris buffers in artificial seawater at varying mole ratios of Tris:Tris·HCl. Marine Chemistry, 162: 89–95. [Google Scholar]
Rivaro P, Vivado D, Falco P and Ianni C, 2021. HPLC-DAD Purification and Characterization of Meta-Cresol-Purple for Spectrophotometric Seawater pH Measurements, Water. [DOI] [PMC free article] [PubMed] [Google Scholar]
Rodionova OY, Oliveri P and Pomerantsev AL, 2016a. Rigorous and compliant approaches to one-class classification. Chemometrics and Intelligent Laboratory Systems, 159: 89–96. [Google Scholar]
Rodionova OY, Titova AV and Pomerantsev AL, 2016b. Discriminant analysis is an inappropriate method of authentication. TrAC Trends in Analytical Chemistry, 78: 17–22. [Google Scholar]
Takeshita Y et al. , 2021. Consistency and stability of purified meta-cresol purple for spectrophotometric pH measurements in seawater. Marine Chemistry, 236. [Google Scholar]
Talley LD et al. , 2016. Changes in Ocean Heat, Carbon Content, and Ventilation: A Review of the First Decade of GO-SHIP Global Repeat Hydrography. Annual Review of Marine Science, 8(1): 185–215. [DOI] [PubMed] [Google Scholar]
Wang Y, Veltkamp DJ and Kowalski BR, 1991. Multivariate instrument standardization. Analytical chemistry, 63(23): 2750–2756. [Google Scholar]
Williams NL et al. , 2016. Empirical algorithms to estimate water column pH in the Southern Ocean. Geophysical Research Letters, 43(7): 3415–3422. [Google Scholar]
Wise BM and Roginski RT, 2015. A Calibration Model Maintenance Roadmap. IFAC-PapersOnLine, 48(8): 260–265. [Google Scholar]
Wold S, 1976. Pattern recognition by means of disjoint principal components models. Pattern Recognition, 8(3): 127–139. [Google Scholar]
Woosley RJ, 2021. Long-term stability and storage of meta-cresol purple solutions for seawater pH measurements. Limnology and Oceanography: Methods, 19(12): 810–817. [Google Scholar]
Yao W, Liu X and Byrne RH, 2007. Impurities in indicators used for spectrophotometric seawater pH measurements: Assessment and remedies. Marine Chemistry, 107(2): 167–172. [Google Scholar]
Zhang HN and Byrne RH, 1996. Spectrophotometric pH measurements of surface seawater at in-situ conditions: Absorbance and protonation behavior of thymol blue. Marine Chemistry, 52(1): 17–25. [Google Scholar]
Zontov YV, Rodionova OY, Kucheryavskiy SV and Pomerantsev AL, 2017. DD-SIMCA – A MATLAB GUI tool for data driven SIMCA approach. Chemometrics and Intelligent Laboratory Systems, 167: 23–28. [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary Material

NIHMS1968317-supplement-Supplementary_Material.docx^{(175.2KB, docx)}

[R1] 2021. PLS_Toolbox with MIA_Toolbox 8.9.2. Eigenvector Research, Inc., Manson, WA USA: 98831. [Google Scholar]

[R2] Byrne RH and Breland JA, 1989. High-Precision Multiwavelength pH Determinations in Seawater Using Cresol Red. Deep-Sea Research Part a-Oceanographic Research Papers, 36(5): 803–810. [Google Scholar]

[R3] Byrne RH, Mecking S, Feely RA and Liu X, 2010. Direct observations of basin-wide acidification of the North Pacific Ocean. Geophysical Research Letters, 37(2). [Google Scholar]

[R4] Carter BR et al. , 2018. Updated methods for global locally interpolated estimation of alkalinity, pH, and nitrate. Limnology and Oceanography: Methods, 16(2): 119–131. [Google Scholar]

[R5] Carter BR, Radich JA, Doyle HL and Dickson AG, 2013. An automated system for spectrophotometric seawater pH measurements. Limnology and Oceanography: Methods, 11(1): 16–27. [Google Scholar]

[R6] Clayton TD and Byrne RH, 1993. Spectrophotometric Seawater pH Measurements - Total Hydrogen-Ion Concentration Scale Calibration of m-Cresol Purple and at-Sea Results. Deep-Sea Research Part I-Oceanographic Research Papers, 40(10): 2115–2129. [Google Scholar]

[R7] Clayton TD et al. , 1995. The role of pH measurements in modern oceanic CO₂-system characterizations: Precision and thermodynamic consistency. Deep Sea Research Part II: Topical Studies in Oceanography, 42(2): 411–429. [Google Scholar]

[R8] DeGrandpre MD et al. , 2014. Considerations for the measurement of spectrophotometric pH for ocean acidification and other studies. Limnology and Oceanography: Methods, 12(12): 830–839. [Google Scholar]

[R9] DelValls TA and Dickson AG, 1998. The pH of buffers based on 2-amino-2-hydroxymethyl-1,3-propanediol (‘tris’) in synthetic sea water. Deep Sea Research Part I: Oceanographic Research Papers, 45(9): 1541–1554. [Google Scholar]

[R10] Dickson AG, 2010. The carbon dioxide system in seawater: equilibrium chemistry and measurements. In: Riebesell U, Fabry VJ, Hansson L and Gattuso J-P (Editors), Guide to best practices for ocean acidification research and data reporting. Publications Office of the European Union Luxembourg, pp. 260. [Google Scholar]

[R11] Dickson AG, Sabine CL and Christian JR, 2007. Guide to Best Practices for Ocean CO₂ Measurements. PICES Special Publication, 191 pp. [Google Scholar]

[R12] Douglas NK and Byrne RH, 2017a. Achieving accurate spectrophotometric pH measurements using unpurified meta-cresol purple. Marine Chemistry, 190: 66–72. [Google Scholar]

[R13] Douglas NK and Byrne RH, 2017b. Spectrophotometric pH measurements from river to sea: Calibration of mCP for 0≤S≤40 and 278.15≤T≤308.15K. Marine Chemistry, 197: 64–69. [Google Scholar]

[R14] JCGM, 2008. Evaluation of measurement data—Guide to the expression of uncertainty in measurement, pp. 1–116. [Google Scholar]

[R15] Johnson KS et al. , 2016. Deep-Sea DuraFET: A Pressure Tolerant pH Sensor Designed for Global Sensor Networks. Analytical Chemistry, 88(6): 3249–3256. [DOI] [PubMed] [Google Scholar]

[R16] Johnson KS et al. , 2017. Biogeochemical sensor performance in the SOCCOM profiling float array. Journal of Geophysical Research: Oceans, 122(8): 6416–6436. [Google Scholar]

[R17] Liu X, Patsavas MC and Byrne RH, 2011. Purification and characterization of meta-cresol purple for spectrophotometric seawater pH measurements. Environmental Science & Technology, 45(11): 4862–4868. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] Loucaides S et al. , 2017. Characterization of meta-Cresol Purple for spectrophotometric pH measurements in saline and hypersaline media at sub-zero temperatures. Scientific Reports, 7(1): 2481. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R19] Matsumoto GI et al. , 2022. The Global Ocean Biogeochemistry (GO-BGC) Array of Profiling Floats to Observe Changing Ocean Chemistry and Biology. Marine Technology Society Journal, 56(3): 122–123. [Google Scholar]

[R20] Maurer TL, Plant JN and Johnson KS, 2021. Delayed-Mode Quality Control of Oxygen, Nitrate, and pH Data on SOCCOM Biogeochemical Profiling Floats. Frontiers in Marine Science, 8. [Google Scholar]

[R21] Müller JD and Rehder G, 2018. Metrology of pH Measurements in Brackish Waters—Part 2: Experimental Characterization of Purified meta-Cresol Purple for Spectrophotometric pH_T Measurements. Frontiers in Marine Science, 5. [Google Scholar]

[R22] Newton JA, Feely RA, Jewett EB, Williamson P and Mathis J, 2014. Global Ocean Acidification Observing Network: Requirements and Governance Plan. [Google Scholar]

[R23] Okazaki RR et al. , 2017. Evaluation of marine pH sensors under controlled and natural conditions for the Wendy Schmidt Ocean Health XPRIZE. Limnology and Oceanography: Methods, 15(6): 586–600. [Google Scholar]

[R24] Oliveri P and Downey G, 2012. Multivariate class modeling for the verification of food-authenticity claims. TrAC Trends in Analytical Chemistry, 35: 74–86. [Google Scholar]

[R25] Olsen A et al. , 2016. The Global Ocean Data Analysis Project version 2 (GLODAPv2) – an internally consistent data product for the world ocean. Earth Syst. Sci. Data, 8(2): 297–323. [Google Scholar]

[R26] Patsavas MC, Byrne RH and Liu X, 2013. Purification of meta-cresol purple and cresol red by flash chromatography: Procedures for ensuring accurate spectrophotometric seawater pH measurements. Marine Chemistry, 150: 19–24. [Google Scholar]

[R27] Pearson K, 1901. LIII. On lines and planes of closest fit to systems of points in space. The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 2(11): 559–572. [Google Scholar]

[R28] Pomerantsev AL, 2008. Acceptance areas for multivariate classification derived by projection methods. Journal of Chemometrics, 22(11-12): 601–609. [Google Scholar]

[R29] Pomerantsev AL and Rodionova OY, 2014. Concept and role of extreme objects in PCA/SIMCA. Journal of Chemometrics, 28(5): 429–438. [Google Scholar]

[R30] Pratt KW, 2014. Measurement of pHT values of Tris buffers in artificial seawater at varying mole ratios of Tris:Tris·HCl. Marine Chemistry, 162: 89–95. [Google Scholar]

[R31] Rivaro P, Vivado D, Falco P and Ianni C, 2021. HPLC-DAD Purification and Characterization of Meta-Cresol-Purple for Spectrophotometric Seawater pH Measurements, Water. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R32] Rodionova OY, Oliveri P and Pomerantsev AL, 2016a. Rigorous and compliant approaches to one-class classification. Chemometrics and Intelligent Laboratory Systems, 159: 89–96. [Google Scholar]

[R33] Rodionova OY, Titova AV and Pomerantsev AL, 2016b. Discriminant analysis is an inappropriate method of authentication. TrAC Trends in Analytical Chemistry, 78: 17–22. [Google Scholar]

[R34] Takeshita Y et al. , 2021. Consistency and stability of purified meta-cresol purple for spectrophotometric pH measurements in seawater. Marine Chemistry, 236. [Google Scholar]

[R35] Talley LD et al. , 2016. Changes in Ocean Heat, Carbon Content, and Ventilation: A Review of the First Decade of GO-SHIP Global Repeat Hydrography. Annual Review of Marine Science, 8(1): 185–215. [DOI] [PubMed] [Google Scholar]

[R36] Wang Y, Veltkamp DJ and Kowalski BR, 1991. Multivariate instrument standardization. Analytical chemistry, 63(23): 2750–2756. [Google Scholar]

[R37] Williams NL et al. , 2016. Empirical algorithms to estimate water column pH in the Southern Ocean. Geophysical Research Letters, 43(7): 3415–3422. [Google Scholar]

[R38] Wise BM and Roginski RT, 2015. A Calibration Model Maintenance Roadmap. IFAC-PapersOnLine, 48(8): 260–265. [Google Scholar]

[R39] Wold S, 1976. Pattern recognition by means of disjoint principal components models. Pattern Recognition, 8(3): 127–139. [Google Scholar]

[R40] Woosley RJ, 2021. Long-term stability and storage of meta-cresol purple solutions for seawater pH measurements. Limnology and Oceanography: Methods, 19(12): 810–817. [Google Scholar]

[R41] Yao W, Liu X and Byrne RH, 2007. Impurities in indicators used for spectrophotometric seawater pH measurements: Assessment and remedies. Marine Chemistry, 107(2): 167–172. [Google Scholar]

[R42] Zhang HN and Byrne RH, 1996. Spectrophotometric pH measurements of surface seawater at in-situ conditions: Absorbance and protonation behavior of thymol blue. Marine Chemistry, 52(1): 17–25. [Google Scholar]

[R43] Zontov YV, Rodionova OY, Kucheryavskiy SV and Pomerantsev AL, 2017. DD-SIMCA – A MATLAB GUI tool for data driven SIMCA approach. Chemometrics and Intelligent Laboratory Systems, 167: 23–28. [Google Scholar]

PERMALINK

Detection of impurities in m-cresol purple with Soft Independent Modeling of Class Analogy for the quality control of spectrophotometric pH measurements in seawater

Michael Fong

Yuichiro Takeshita

Regina Easley

Jason Waters

Abstract

Introduction

Background

Spectrophotometric pH measurements

Estimating the impurity absorption at 434 nm

Detection of impurities with SIMCA

Fig. 1.

Fig. 2.

Instrument standardization methods

Methods:

mCP samples and purity analysis

Table 1.

Preparation of solutions for spectrophotometric measurements

Spectrophotometric measurements

Evaluating apparent pH biases

Fig. 3.

Evaluation of 434Aimp

SIMCA model training and validation

Fig. 4.

Testing the SIMCA model on a different spectrophotometer

Results

Assessment of dye purity from apparent pH biases

Assessment of dye purity from 434Aimp

Fig. 6.

Optimization and performance of the SIMCA model

Performance of the SIMCA model on different spectrophotometers

Fig. 5.

Discussion

Comparison of alternative methods for verifying dye purity

Guidelines for using the SIMCA model

Recommendations for acceptance testing with SIMCA

Fig. 7.

Conclusion

Supplementary Material

Acknowledgements

Glossary of terms and symbols

mCP and spectrophotometric pH

Uncertainty propagation

SIMCA

Instrument Standardization

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

Evaluation of ₄₃₄A_imp

Assessment of dye purity from ₄₃₄A_imp