Skip to main content
NIST Author Manuscripts logoLink to NIST Author Manuscripts
. Author manuscript; available in PMC: 2025 Sep 16.
Published in final edited form as: Phys Rev E. 2025 Feb;111(2-1):024412. doi: 10.1103/PhysRevE.111.024412

Uncertainty Quantification of Antibody Measurements: Physical Principles and Implications for Standardization

Paul N Patrone 1, Lili Wang 1, Sheng Lin-Gibson 1, Anthony J Kearsley 1
PMCID: PMC12434682  NIHMSID: NIHMS2107449  PMID: 40103042

Abstract

Harmonizing serology measurements (i.e. rendering them interchangeable) is critical for comparing results across different diagnostics platforms, developing associated reference materials, and thereby informing medical decisions. However, the theoretical foundations of such tasks have yet to be fully explored in terms of antibody thermodynamics and uncertainty quantification (UQ). In the context of SARS-CoV-2, for example, this has restricted the usefulness of standards currently deployed, limited the scope of materials considered as viable standards, and ultimately decreased confidence in serology. To address these problems, we develop rigorous theories of antibody normalization and harmonization. We begin by proposing a mathematical definition of harmonization equipped with structure needed to quantify uncertainty associated with the choice of standard, assay, etc. We then show how a thermodynamic description of serology measurements (i) relates this structure to the Gibbs free-energy of antibody binding, and thereby (ii) induces a regression analysis that directly harmonizes measurements. We supplement this with a novel, optimization-based normalization (not harmonization!) method that validates consistency between the behavior of a reference material and biological samples. A key result of these analyses is that under physically reasonable conditions, the choice of reference material does not increase uncertainty associated with harmonization. We validate main ideas via an interlab study that considers monoclonal antibodies as a reference for SARS-CoV-2 serology measurements and discuss connections to correlates of protection.

Keywords: Thermodynamics, SARS-CoV-2, Uncertainty, Antibody, Serology

I. PREFACE

This manuscript arose from the confused and sometimes murky world of SARS-CoV-2 antibody testing as it existed in late 2021 and 2022 [1, 2]. While it is hard to estimate the total number – hundreds? – of blood and saliva assays that were developed during the pandemic, one thing is clear: the community never agreed upon a scale for comparing the resulting measurements [37]. Thus, it was impossible to say, for example, how many anti-SARS-CoV-2 antibodies anyone had, much less determine if someone was immune from infection. The issue, it turns out, was not lack of standards. Rather, we argue in this work that the community required new methods for interpreting measurements of those standards. In other words, the problem was one of physics and math, not medicine and biology.

In the exposition that follows, we frame this argument in the context of two questions: (i) what are the physical processes that cause antibody measurements to differ; and (ii) how do we account for this physics to harmonize (i.e. render interchangeable) serology measurements? In answering these questions, we decided to follow the series of steps that a data analyst would need to achieve harmonization in practice. In part, this is to facilitate reproducibility and promote the perspective that validation steps can (and in our opinion, should) be incorporated into all aspects of the data flow. But it also highlights an important difference between harmonization and the concept of normalization, which we feel has been a point of confusion. However, this exposition necessarily makes the manuscript highly interdisciplinary, leveraging ideas from not only statistical mechanics and thermodynamics, but also applied math, optimization, biology, and immunology. We anticipate that readers will have a typical background in physics and/or applied mathematics, but not necessarily any of the other fields. If needed, readers can consult Appendix A for more information and context on serology before proceeding to Sec. II, which serves as a formal introduction for this manuscript. Where possible, we have also put technical math details in the appendices.

II. INTRODUCTION

The COVID-19 pandemic highlighted the importance of antibody tests as a means to characterize humoral response, e.g. in high-risk populations such as cancer patients [8]. However, the rapid development of many different SARS-CoV-2 assays led to questions regarding the degree to which such measurements quantify immunity [912]. In response, public health and research institutions established reference materials to harmonize antibody scales [8, 1315]. While these efforts improved correlation between measurements, they did not conclusively achieve harmonization [36]. They also led to new questions: how do we compare reference materials, and how do we select the “best” one for a given clinical or research setting?1 Moreover, the development of antibody standards did not suggest an obvious definition for correlates of protection, which remains an elusive concept to this day [912, 16].

The difficulty of addressing these questions arises from a fundamental ambiguity in the information extracted from serology measurements [2, 7]. In theory, the goal is to quantify titers or “concentrations” of antibodies that target a given antigen. Binding assays address this by quantifying the relative number of antibodies in a sample that attach to a substrate. However, the corresponding binding kinetics depend on the assay itself, e.g. through the details of this substrate [17]. Thus, it is more appropriate to assert that such assays characterize the properties of a chemical reaction, for which there is no concept of an absolute and independent “bound antibody number” absent information about the other reactants.2 This ambiguity becomes worse when we recognize that all reference materials suffer the same problem, which leads to the possibility of expressing an ill-defined sample concentration in terms similarly ill-defined standard. Typical single-point reference calibrations, e.g. based on end-point titers [18], also suffer from extrapolation error [19] and thereby further complicate harmonization. Thus, meaningful comparison of antibody measurements and standards cannot be realized without accounting for the physical processes and sources of uncertainty that affect their use.

The present work addresses these problems via a hierarchy of data analyses that marry concepts from statistical mechanics with uncertainty quantification to establish a physics-based foundation for harmonization. We first motivate this hierarchy through a Gibbs free-energy description of antibody binding, which (surprisingly) implies that only the assays, not the reference, control the degree to which harmonization is possible. Importantly, this theory induces a probabilistic model that can be used to validate the thermodynamic assumptions by quantifying – and in some cases removing – measurement variability due to: (i) choice of reference material; (ii) assay; (iii) instrument and operator effects; (iv) uncertainty inherent in samples; and (v) interactions between these elements. This analysis can also be used to determine if and by how much a specific reference material increases uncertainty in harmonization, which becomes a metric for comparing standards. Throughout, we validate these ideas in the context of an interlab study that considers synthetic, monoclonal antibodies (mAbs) as serology standards [20].

A key theme that permeates this work is the need to incorporate uncertainty quantification (UQ) into all aspects of the data analysis. While this entails obvious tasks such as statistical modeling and uncertainty propagation [21], we adopt the broader definition of UQ as encompassing verification and validation exercises that assess and increase confidence in models and measurement processes per se. For example, synthetic mAbs are viewed as being fundamentally different from human antibodies, and thus unsuitable for the purposes of harmonization. To address this stigma, our workflow includes a novel normalization procedure to determine if both types of antibodies exhibit the same behavior in a measurement system. This validation step can thereby indirectly assess whether mAbs are governed by the same physical processes as human antibodies, which is a prerequisite for using the former as a standard for the latter.

A main challenge in this work is the need to revisit and clarify seemingly elementary and resolved ideas in serology. For example, the concept of harmonization has been used in a variety of contexts [2224], but to the best of our knowledge, it has not been defined with the precision needed to fully ground it in metrology. Thus, a preliminary task in our analysis is to mathematically define this concept and equip it with the structure needed to permit later UQ. The implications of this exercise are not trivial. It uncovers hidden structure in the definition of harmonization that has yet to be exploited and provides a direct connection to the physics of antibody measurements. Moreover, it illustrates why harmonization and normalization are not the same task.

A secondary motivation for this work is the fact that current methods to assess fitness of purpose for serology standards are insufficiently grounded in the physics of harmonization and focus primarily on their technical performance, sometimes ignoring issues such as ease of manufacturing, distribution times, etc. For example, there is widespread belief that harmonization via SARS-CoV-2 standards can only be achieved using human-derived, pooled references, although to the best of our knowledge there are no studies validating this conjecture. As a result, the development of current SARS-CoV-2 standards (which are all human derived) took nearly a year [14, 15] despite being needed much sooner. Moreover, such standards are inherently limited stock and must contend with changes between lots [25]. These issues suggest the need to better understand the impacts of using alternatives such as mAbs, which permit better scale-up and quicker development. Indeed our companion manuscript finds – and the present work justifies – that mAbs and human-derived standards are identical from a performance standpoint when using our physically derived definition of harmonization [26].

A limitation of our work is that we do not define or fully connect our work with a real-world definitions of immunity. This is due primarily to lack of available data [27, 28]. Studies that address this problem would need to connect information about neutralization measurements to notions of risk and clinical outcomes. To the best of our knowledge, the these latter tasks remain open challenges in the SARS-CoV-2 testing community and thus fall beyond the scope of our work [27, 28]. However, we point to extensions of our basic definitions and theory to neutralization assays, a more direct but complicated tool for characterizing immunity.

The rest of the manuscript is organized as follows. In Sec. III, we provide physical and mathematical definitions of key concepts used throughout the manuscript. Section IV gives a global overview of our analysis hierarchy by considering normalization in the context of signal generation (IVA) and deriving a thermodynamic theory of antibody binding that induces a probabilistic perspective needed for harmonization (IVB). Section V fully develops the remaining elements of this hierarchy in terms of a regression analysis. Section VI provides historical context for our work, discusses limitations, and considers future directions. Appendices provide additional background on serology testing and summarize technical mathematical arguments.

III. NOTATION AND TERMINOLOGY

Given the interdisciplinary nature of this work, we begin by considering definitions and conventions that inform our mathematical analysis.

A. Definitions

  1. A concentration cˆ is normalized if it is given in units of antibodies per volume.

  2. The word sample always refers a specimen taken from a human and to which an assay (i.e. a serology test) is applied.

  3. The words reference and standard always refer to a measurand, which can be synthetic or human-derived, used to normalize measurements taken on samples.

  4. In physical terms, we interpret harmonization as the process of determining a mathematical rule that tells one how to modify the numerical value of normalized antibody concentrations for each assay so that their corresponding measurements all agree and can be used interchangeably. This rule is only considered meaningful if it does not depend on the sample, but only the assay, reference, and concentration values. In more mathematical terms, let s=1,2,,S and n=1,2,,N index samples and assays for some maximum values S and N. Also let r denote a fixed reference. We say that the assays are harmonized by reference r if for any normalized concentrations cˆs,n,r and cˆs,n,r (corresponding to sample s measured with assays n and n and normalized by r), we can find a function T(n,c,r) such that
    Tn,cˆs,n,r,r=Tn,cˆs,n,r,r=χs,r, (1)
    where χs,r is an assay-independent consensus value associated with sample s. Consistent with the above physical intuition, this function does not depend directly on the sample index s, only its normalized concentrations cˆs,n,r. See Refs. [22, 23] and the references therein for related ideas. We always assume that χs,r has the same units as cˆ, i.e. antibodies per unit volume.
  5. If we identify parameters ϵs,n (which could be random) such that
    Tn,cˆs,n,r,r1+ϵs,n=Tn,cˆs,n,r1+ϵs,n=χs,r, (2)
    we say that the assays can be approximately harmonized with a relative uncertainty quantified by the ϵ. In principle we could let ϵs,n also depend on r, but in later sections we find this assumption unnecessary. In a slight abuse of terminology, we sometimes refer to the concentrations Tn,cˆs,n,r,r as harmonized (without the modifier “approximately”) when the meaning is clear from context.

B. Notation and Conventions

  • For clarity, we reserve certain indices for special purposes. Lowercase m refers to either a sample or a reference. Lowercase s and r refer exclusively to samples and references. Lowercase n always refers to an assay. Lowercase k is used generically as an integer index except in any of the previously mentioned cases. We caution the reader that throughout the manuscript, indices are often the primary (and sometimes only) way we indicate functional dependence between variables. For example, we use Fs to denote a fluorescence value that changes with the discrete sample index s, whereas cs,n is a concentration depending on both the discrete sample and assay indices s and n.

  • In certain cases, we need to non-dimensionalize the arguments of transcendental functions by dividing through by the units. In such cases, we use the symbol U to indicate a quantity whose value is 1 multiplied by the units associated with .

  • We treat Gibbs free-energies G (and differences ΔG thereof) as dimensionless, having been divided by the temperature expressed in units of energy.

  • We denote a normal random variable with mean μ and variance σ2 via 𝒩μ,σ2.

IV. PHYSICAL PRINCIPLES OF OUR ANALYSIS

The task of harmonizing measurements necessarily requires that they first be put on some scale, i.e. via normalization. Thus, it stands to reason that harmonization can be impacted in a detrimental way if we do not first consider normalization per se.

In the subsections that follow, we consider both tasks from a physics-based perspective, as this suggests validation exercises and uncertainty models that can be used to interrogate the quality of data. Subsection IV A analyzes normalization in the generic context of signal generation and yields a test for answering the question, “when does a reference material behave like a typical sample?” Subsection IV B unravels the underlying thermodynamic processes and answers the question, “what causes measurements to differ between instruments?”

The reader should keep in mind that for both normalization and harmonization, the technical exposition is always as follows. We first solve the “forward problem” by formulating and analyzing a model of the measurement (whose output values are known) as a function of physical parameters, which are often unknown in practice. This model then motivates a regression analysis that solves the “reverse problem” by extracting the unknown parameters from the data.

A. Generic Aspects of Signal Generation: Implications for Antibody Normalization and Units

Normalization aims to quantify the concentration c of antibodies that bind to a substrate. However, this concentration is never measured directly; neither is it possible to measure the total concentration of antibodies except as part of the manufacturing process for certain reference materials. Instead, the instrument outputs some numerical value such as median fluorescence intensity (MFI) F expressed, for example, in units of voltage. This F is typically assumed to be proportional to the bound concentration. The constant of proportionality, which we denote Γ, has units such as voltage (i.e. some proxy for MFI) times volume per number of bound antibodies; viz.

F=Γc. (3)

In turn, c is assumed to be proportional to the total concentration y of antibodies of a fixed type via the theory described in Sec. IV B; see also Refs. [18, 19]. We denote the corresponding proportionality constant by K, which is dimensionless but should be thought of as having units of bound antibody number per total antibody number. Thus, one trivially finds

c=KyF=ΓKy, (4)

where the product ΓK has units of voltage per concentration of total antibodies.

Normalization is typically performed by computing a ratio of the form

yˆs,r=FsFr (5a)
=cscr=KsysKryr (5b)

where subscripts s and r denote corresponding quantities for a sample and reference [18, 29]. While this yˆs,r is ostensibly a normalized antibody value, it is dimensionless; i.e. it is the number of bound sample antibodies per bound reference antibody. To make yˆs,r consistent with Definition I, we multiply by yr, which is assumed to be known, yielding

cˆs,r=yˆs,ryr. (6)

The left-hand side (LHS) of Equation (6) has units of total antibodies per volume.

Four comments are in order.

First, recognize that many of the above quantities (cˆs,r, yˆs,r, cr, cs, etc.) depend on the choice of assay. In subsequent sections we make this explicit by including the subscript n. Here, we have suppressed the assay dependence since it is not central to a discussion of units.

Second, a more appropriate definition of cˆs,r would be yˆs,rcr, since this has the biologically relevant units of bound antibodies per volume [14, 15, 24].3 However, Kr is generally unknown, which implies the same for cr. Thus, one can only normalize bound antibody concentration up to an unknown scale factor associated with the Kr, which turns out to be an equilibrium constant; see Sec. IV B.

Third, while yˆs,r and cˆs,r are nominally different, they are interchangeable from a theoretical standpoint because of Eq. (6). The quantity yˆs,r is mathematically more convenient because it is dimensionless, and we often use this form of the normalized measurement. We refer to yˆs,r as a scaled antibody concentration and take 𝓎s,r=χs,r/yr to be the corresponding consensus value; see Definition IV.

Fourth, normalization is typically accomplished by computing the ratio given by Eq. (5a), i.e. directly in terms of the measurement outputs at a single concentration. However, nothing restricts us from iteratively diluting a sample and measuring the associated range of fluorescence values.4 Mathematically, this amounts to multiplying ym by a dilution factor d, which can take any positive value less than one (0<d<1). Equation (5b) then implies that the scaled antibody concentration is in fact an invariant quantity if both the reference and sample are diluted by the same amount.

This simple observation leads to a normalization method that estimates yˆs,r and validates whether a reference material exhibits the same behavior as a typical sample.

To realize this in practice, we solve the forward modeling problem via a few simple observations. First let x=cd be a diluted bound concentration, and define F(x) be the fluorescence as a function of x. It is reasonable to assume that F(x) is strictly monotone increasing in x; i.e. more bound antibodies increase the measurement signal. Moreover, even when cr and cs are unknown, we can always measure the associated fluorescence curves Fr(d) and Fs(d) by varying the dilution factor. It is not necessary that F(x), Fr(d), or Fs(d) be linear, as photodetectors may saturate if there is too much measurand in an instrument. But we do always assume that concentrations c are linear in d.

In this context, the fundamental assumption of serology measurements can be stated as the joint requirements

Frd=Fcrd,Fs(d)=Fcsd, (7)

and

Fxs=Fxrxs=xr, (8)

where the latter is guaranteed by the monotonocity of F(x). To make use of this, assume that we find a quantity αs such that Fr(d)=Fsαsd. Then Eqs. (7) and (8) imply that this αs is in fact the inverse of the scaled concentration; viz.

cscr=1αs=yˆs,r, (9)

Moreover, we may define

cˆs,rcrαs (10)

to be the normalized concentration.

These observations imply that the reverse problem, i.e. normalization, is tantamount of finding a per-sample scale factor αs that, when applied to the argument of Fs(d), collapses this dilution curve onto the reference dilution curve Fr(d); see Fig. 1.5 This has several benefits over normalization via Eq. (5a): by leveraging more data one reduces susceptability to noise, and it is not necessary to identify a linear regime for the function F(x). In practice, however, finding scale factors αs is more computationally challenging. For example, typical dilutions series only sample a few points from Fr(d) and Fs(d), which can make it difficult to test for data collapse. Moreover, the underlying F(x) is generally not known a priori. In Appendix B, we discuss a constrained optimization approach that overcomes both issues, leveraging only generic information about the structure of F(x).

FIG. 1.

FIG. 1.

Plots of synthetic data motivating Eqs. (7) and (8). Note that colors are matched to axes and do not have interchangeable interpretations. Our normalization algorithm prescribes a mathematical method for reconciling their differences. In both plots, d corresponds to a dilution factor. In the top plot, the left-most blue curves correspond to c=5 and c=2 from left to right. In all plots, the black dilution curve corresponds to a reference material. Changing the concentration of antibodies in the reference traces out the dilution curve. Equivalently, if we fix the concentration (say to c=1), diluting (d<1) or concentrating (d<1) the reference yields the same curve. Importantly, this dual interpretation only applies to the reference. Taking a sample for which c>1 implies that the sample must be diluted more relative to the reference to generate the same fluorescence signal. In order to collapse the blue curve onto the reference, we must scale the dilution by a factor of c (i.e. cs/cr for cr=1 and cs=c), which corresponds to a horizontal shift on a log scale (bottom plot). A similar interpretation applies to the case c<1.

Figure 2 shows an example of this analysis applied to 34 SARS-CoV-2 positive samples that were normalized on the scale of the mAb reference material. We emphasize that while the reference material is a synthetic antibody, the points sampled from its dilution curve fall on the composite function F(x) constructed from all of the data. The degree of data collapse serves as validation that the mAb is not inherently different from a human sample in regards to its behavior under dilution. While beyond the scope of this manuscript, one could establish a cutoff criterion so that a reference is only considered viable if data collapse relative to F(x) can only be achieved to within a certain level.

FIG. 2.

FIG. 2.

Raw data and data collapse associated with 34 SARS-CoV-2 positives samples and a mAb reference material, all measured via a ligand binding assay. See Ref. [20] for experimental details. Top: Raw data associated with the samples. The reference is labeled with the black circles. Lines are guides for the eye, whereas the overlaid discrete data points are the dilution-fluorescence pairs measured by the instrument. The axes are flipped relative to Fig. 1. By eye, it is plausible that a vertical shift applied to each dilution curve is sufficient to collapse them. Bottom: Data collapse associated with minimizing Eq. (B8). The reference material is assumed to have a dimensionless concentration of 1. The solid black curve is a reconstruction of the function f(𝓍); see Eq. (B1) and surrounding text. Note that the reference material has the average (weakly) non-linear behavior of the sample data after transformation. The inset shows the residuals defined here as the difference between the transformed raw data and the estimated dilution curve in black.

B. Thermodynamics of Antibody Measurements

Having analyzed serology at the level of signal generation, we now turn to thermodynamic considerations that (i) explain why different measurement systems yield distinct normalized concentrations and (ii) clarify the assumptions underlying Eq. (4). In this section, we focus primarily on the physics of harmonization per se. In Sec. V we consider the forward and reverse modeling problems associated with this task.

A binding antibody assay can be viewed as a chemical reaction [17, 30]. A free antibody Y attaches to a substrate B to create a bound pair YB; viz.

Y+BYB=C, (11)

where C stands for the antibody-substrate complex. This reaction is assumed to be reversible, so that the system reaches an equilibrium described by detailed balance [30, 31]. For a fixed antibody type Ym (associated with the mth sample or reference) and substrate Bn associated with an the nth assay or antigen epitope, the equilibrium constant Kˆ can depend on both the antibody and the substrate. That is,

cm,nymcm,nbncm,n=Kˆm,n, (12)

where cm,n, ym, and bn are the concentrations of Cm,n, all antibodies (free and bound), and all substrates Bn. The n-dependence of Kˆm,n reflects the fact that changing the substrate can alter the equilibrium concentration, and hence number of bound antibodies. The physical intuition for this dependence arises from the definition

Kˆm,nUKˆ=eΔGm,nΔGm,n=ln(Kˆm,n/UKˆ), (13)

where ΔGm,n is the Gibbs free-energy change associated with Eq. (11) [32]. That is, changing the substrate and/or antibody alters the free-energy landscape, and thus the equilibrium constant. Note that Eq. (12) only models antibody affinity, but not avidity (i.e. capacity for multivalent binding); see Ref. [33] for justification in the context of the examples considered herein.

From a measurement standpoint, it is desirable for cm,n to be independent of the substrate concentration; doing so ensures that the former increases linearly with the total antibody concentration [7, 29]. A straightforward Tayor expansion of Eq. (12) reveals that this condition is approximately satisfied if either (i) bnym and Kˆm,n𝒪(1) unit volume, or (ii) Kˆm,n1 unit volume for bn and ym order one concentrations. We henceforth assume that either (i) or (ii) is true,6 which yields the approximate model

cm,n=Km,nym, (14)

where Km,n is an appropriately rescaled equilibrium constant.7

An interesting question that motivates our harmonization analysis is whether the equilibrium constant is separable, meaning it can be expressed in the form

Km,n=κYmκBn, (15)

for some constants κYm and κBn depending only on the sample or assay, but not both [34].8 Physically, Eq. (15) is interpreted as the condition in which the antibodies’ contribution to the equilibrium constant is independent of the substrate contribution.9 In the context of Eq. (13), this implies that the change in free energy can be expressed as a sum

ΔGm,n=ΔGˆm+ΔGˆn, (16)

where ΔGˆm and ΔGˆn are distinct quantities depending on either the sample or substrate, but not both. Heuristically Eq. (16) is plausible if any antibody that binds to a fixed antigen (corresponding to constant n) always changes the latter’s conformation in the same way, so that ΔGm,n only varies with the internal energy and entropy differences between the antibodies themselves.

To understand the usefulness of separability, recall that antibody normalization amounts to determining the ratio cs,n/cr,n for sample s, reference r, and a fixed assay n. Separability then amounts to the condition that

cs,ncr,n=κYsysκYryr (17)

which is equivalent to

lncs,ncr,n=lnysyrΔGˆs+ΔGˆr. (18)

In other words, relative concentrations of bound antibodies are independent of the assay being used for the measurement, since the right-hand side (RHS) has no dependence on n. Separability therefore implies that normalization automatically harmonizes assays in the sense of Definition I, and we can simply set the function T(n,c)=c.

If we relax the separability assumption, harmonization is no longer guaranteed.10 To see this, assume that

Km,n=κYmκBnexpΔgm,n, (19)

where Δgm,n is a relative free-energy deviation from separability. We require that Δgm,n depend non-trivially on both of its indices. More precisely, when viewed as a matrix with elements s, n, we require Ks,n to have rank greater than one. Taking the logarithm of cs,n/cr,n yields

lncs,ncr,n=lnκYsysκYryrΔgs,n+Δgr,n. (20)

The term Δgs,n is problematic; it implies that the normalized concentration depends on the free-energy of the specific sample-assay pair, which is nominally inconsistent with harmonization.

In practice, we expect that Eq. (20) is a more realistic description of antibody measurements; biological variation between human samples and differences in substrate epitopes will cause some antibodies to bind differently to some assays, invalidating the heuristic picture described below Eq. (16). Thus it is not clear that exact harmonization is possible. However, we can still recover the weaker notion of approximate harmonization given by Def. V. In particular, were it possible to determine the Δgr,n, one could define a transformation

T(n,cˆ,r)=cˆexpΔgr,n, (21)

where we now reveal the explicit dependence of T on the reference material. Combined with Eq. (20), this would imply that

Tn,cˆs,n,r,r1+ϵs,n=Tn,cˆs,n,r,r1+ϵs,n=χs,r=𝓎s,ryr (22)
ϵs,n=expΔgs,n1 (23)

yields a consensus value.

A notable conclusion of Eq. (20) is that lack of harmonization has nothing to do with the choice of reference material. It is due solely sample-assay dependent effects ϵs,n, since negating these implies that Eq. (21) is an exact harmonization rule according to Definition V. Physically, this conclusion arises from the simple fact that all samples are normalized by the same reference material, so they share a common bias associated with Δgr,n. Our companion manuscript validates this result for a collection of several monoclonal antibodies [20]. The next section motivates an analysis to test the validity of Eq. (22).

V. INDUCED PROBABILISTIC PERSPECTIVE

Equation (22) begs two questions: (i) how do we validate the underlying model; (ii) how do we use it to harmonize assays? Our main goal in Sec. V A is to show how Eq. (22) induces a probabilistic model that describes the serology measurements. In Sec. V B, we show how to perform a regression analysis on this model that answers questions (i) and (ii).

A. The Forward Problem: Modeling Harmonized Measurements

Observe that the quantities Δgs,n, Δgr,n, and thus 𝓎s,r are in general unknown, since it is unreasonable to perform detailed measurements characterizing equilibrium constants for all samples and assays. For a fixed assay, however, Δgr,n is common to all samples, whereas Δgs,n is sample dependent. Rearranging Eq. (20) implies that

lnyˆs,n,rΔgr,n+Δgs,n=ln(𝓎s,r) (24)

which suggests interpreting Δgr,n as a constant, reference-dependent bias and Δgs,n as a sample-assay-dependent realization of a random variable.

Because the Δgs,n correspond to Gibbs Free-Energy changes associated with sample antibodies, we treat this quantity as an s-dependent realization of a mean-zero normal random variable with variance ςn2. That is, we assume

Δgs,n=𝒩s0,ςn2. (25)

where ςn2 is an unknown constant, and 𝒩s are independent and identically distributed normal random variables; i.e. the expectation EΔgs,nΔgs,n=0 if ss or nn. If we also treat Δgr,n and 𝓎s,r as unknown scalars, Eq. (24) yields a model of the normalized data yˆs,n,r that nominally solves the forward problem. Moreover, determining the unknowns in Eq. (25) provides all of the information needed to construct T and ϵ in Eq. (22), thereby harmonizing the assays according to Def. VI.

In order for such an analysis to be meaningful, however, it is necessary to account for uncertainty inherent in the measurement process. The true yˆs,n,r are never known exactly due to effects such as pipetting error, instrument artifacts, etc. Thus, it is important to ensure that the associated measurement variability is not confused with the Δgr,n and Δgs,n. We therefore postulate that the quantity ys,n,r one measures is related to yˆs,n,r via the equation

lnyˆs,n,r=lnys,n,r+δs,n,r (26)

where δs,n,r is a “within-lab” uncertainty that models the effects described immediately above; see Ref. [35]. The dependence of δ on s,n, and r is largely incidental, since this quantity should be estimated separately for each such triple. The δ does not account for assay and reference-specific effects. Combining Eqs. (24) and (26) then yields

lnys,n,rΔgr,n+Δgs,n+δs,n,r=ln(𝓎s,r). (27)

In practice, the values of ys,n,r and δs,n,r are given by direct measurement outputs, while the remaining quantities must be determined from the data via regression; such details are postponed until Sec. V B.

We end this section with two small technical issues.

First, Eq. (27) nominally complicates our ability to determine whether the Δgs,n are reference independent, since it re-introduces reference-dependence into residuals of the data. However, we always find that δs,n,r exhibits separability of the form

δs,n,r=δs,n+δr,n. (28)

Under this condition, δr,n can be absorbed into Δgr,n, and we again deduce that the residuals of bias-corrected antibody measurements (relative to the consensus value) should be reference independent. As a result, Eq. (28) restores our ability to validate the model.

In the context of single-point normalization having the form of Eq. (5a), it is straightforward to prove that Eq. (28) is exact. In particular, we can assume that Fs,n and Fr,n are measured up to some relative uncertainty, e.g. Fs,n=Fs,n1+εs,n, where Fs,n is a true or expected fluorescence, and εs,n is measurement uncertainty. Then ratios of the form lnFs,n/Fr,n can be expressed directly in the form of Eq. (28). For the analysis discussed in Sec. IV A, it is more difficult to prove that Eq. (28) holds, but in practice, we find that it is valid. See Figs. 3 and 4, for example.

FIG. 3.

FIG. 3.

Example of the analysis leading to the solution given by Eq. (34). Symbols have the same meanings in all figures and are defined in the top plots. Top left: Normalized data for 38 positive samples measured via 6 different assays. Top right: Normalized data with consensus estimates given by Eq. (34). Bottom left: Harmonized data, i.e. normalized data corrected for the assay-dependent biases Δgr,n so that all samples are distributed about the consensus values. Bottom right: Difference between the harmonized data and consensus values. In all figures, the vertical red bars centered at each datapoint are one-sigma confidence intervals. For the lab-specific datapoints, these confidence intervals are given by δs,n,r. For the consensus values, the confidence intervals are computed from the distribution of values arising from the jackknife analysis, as described in the main text.

FIG. 4.

FIG. 4.

The difference of residuals Δgs,n,WHOΔgs,n,mAb computed from the WHO standard and the 3 other mAbs used in the interlab study associated with Ref. [20]. Symbols and colors have the same meaning as in Fig. 3. Each point with a fixed color and symbol is associated with a different mAb. The difference in residuals is typically less than 0.05, which corresponds to a relative variation in antibody number of 5 %. This demonstrates that the quantity Δgs,n, which is associated with the sample-assay dependent randomness about the consensus value, does not depend on the reference material. The black dots are the distributions of values obtained from the jackknife analysis as described in the main text. Note that only 0.43% of the values in the plot have a magnitude greater than 0.1. Also, for the three samples with the lowest consensus values, only one lab was able to yield uncensored concentrations. As a result, the jackknife analysis yields a total of eight datapoints that fall outside the scale of the plot. These datapoints have characteristic values of 2 in the units of these axes.

Second, we can validate the Eq. (27) by testing the degree to which the Δgs,n empirically depend on the reference. Despite the assumption that these quantities are random, Eq. (24) indicates that remain they point-wise constant with changing r, provided the Δgr,n are correctly determined. This restrictive criterion implies that the residuals between the bias-corrected and consensus antibody estimates should be invariant to the reference, which can be directly checked. We pursue this and related ideas in Sec. V B.

B. The Reverse Problem: Harmonization via Regression

We now address the task of estimating the quantities Δgr,n, Δgs,n, and δs,n,r.

While there are a variety of methods for estimating δs,n,r (e.g. [36, 37]), we consider the common situation wherein measurements are repeated. Extensions and alternatives are discussed in Sec. VI. Assume therefore that we are given S samples, A assays, and one reference material. Each sample and reference are measured τ times for each assay, where t{1,2,,τ} indexes these time-points. We use the analysis of Appendix. B to normalize all antibody measurements relative to the reference dilution curve measured at the same time. Denote the corresponding antibody levels y˜s,n,r,t. Assuming that these measurements are independent in the t index, we construct the arithmetic mean estimator

lnys,n,r=1τt=1τlny˜s,n,r,t (29)

and sample standard uncertainty [38]

σs,n,r2=1τ(τ1)t=1τlnys,n,rlny˜s,n,r,t2, (30)

where we approximate the variance Varδs,n,r=σs,n,r2. Note that the estimate for ys,n,r corresponds to a geometric mean of antibody concentrations. See Eqs. (26) and (27).11 Given a few replicates τ per sample, we make the additional minimal assumption that δs,n,r is a mean-zero normal random variable with variance σs,n,r2.

Returning to Eq. (27),

lnys,n,rΔgr,n+Δgs,n+δs,n,r=ln(𝓎s,r),

assume that r is fixed and observe that the scaled and consensus values ys,n,r and 𝓎s,r depend on the reference material r. Provisionally assume that the assay-sample dependent effects Δgs,n are independent of r, which we check after-the-fact by varying the reference.

In light of Eq. (27), the lnys,n,r distributed as

lnys,n,r=𝒩(ln(𝓎s,r)Δgr,nlnys,n,r,ςn2+σs,n,r2),

which has the corresponding probability density

P(lnys,n,rΔgr,n,𝓎s,r,ςn2)=eln(𝓎s,r)Δgr,nln(ys,n,r)22(ςn2+σs,n,r2)2πςn2+σs,n,r2. (31)

In the event that ys,n,r falls below a detection threshold yθ (corresponding to dilution curves for which no measurement is above the signal-to-noise of 1/10 in the examples above), we can at most define the censored probability

𝒫ys,n,ryθ=lnyθP(zΔgr,n,𝓎s,r,ςn2)dz (32)

in terms of Eq. (31), where we take yθ to be the smallest measured value of ln(ys,n,r)>, and z is a dummy integration variable representing ln(ys,n,r); see also Refs. [39, 40]. [For simplicity of notation, we have suppressed the dependence of 𝒫(ys,n,ryθ) on (Δgr,n,𝓎s,r,ςn2).] Given these quantities, we define a regularized, negative log-likelihood objective function to be

l=s,n:ys,n,r>yθlnP(lnys,n,rΔgr,n,𝓎s,r,ςn2)+ϵ˜3nΔgr,n2s,n:ys,n,ryθln𝒫ys,n,ryθ (33)

by summing Eq. (31) over all samples and assays. Observe that l is a function of the consensus values 𝓎s,r, reference-dependent free energy bias Δgr,n, and assay-dependent free-energy variances ςn2. The value of ϵ˜3 can be set to any positive value (we set ϵ˜3=1), as the regularization only serves to remove a connected set associated with ambiguity in the value of Δgr,n. (That is, we can only determine the Δgr,n up to an additive constant, necessitating the regularization.) Minimizing Eq. (33) yields estimates of these parameters via

Δgr,n,ys,r,{ςn2,}=argminΔgr,n,ys,r,ςn2l. (34)

While Eq. (34) is ultimately a choice of how to define consensus values, it admits an interpretation that is consistent with physical intuition. In particular, recognize that Δgs,n (i.e. the sample-assay component of free energy) is treated as random because in general, we do not know a priori how a sample will interact with an assay. In minimizing l, we therefore seek the consensus values (up to a constant offset fixed by the regularization) and assay-dependent variability that maximizes the probability of the measured data. Thus, we interpret Eq. (34) as the most likely probabilistic representation of the data [41]. Moreover, including σs,n,r2 in the definition of l accounts for the fact that measurements with high uncertainty contribute less to our knowledge of the corresponding consensus values and assay-dependent uncertainties. See Sec. VI for further analysis of this model.

To characterize uncertainties associated with the parameters Δgr,n, 𝓎s,r, and ςn2,, one can perform, for example, a jackknife style analysis [42, 43]. In the examples that follow, we perform this analysis for fixed reference r by omitting from the optimization each ys,n,r one time [44].12 We then take the standard deviations from the distributions of parameter estimates as the corresponding confidence intervals.

Figure 3 shows the results of this analysis applied to mAb 1 considered in Ref. [20]. The top plots show the normalized log-concentrations lnys,n,r with and without the consensus values for six different labs and 38 positive samples, where =1.2×106 is the number of ng/mL of antibody in the undiluted reference. The bottom plots show bias-corrected (i.e. harmonized) estimates lnys,n,rΔgr,n. The remaining variation is quantified by Δgs,n+δs,n [recall Eq. (28)]. Figure 4 shows the differences in residuals between harmonized concentrations and consensus values according to Eq. (34) using the WHO standard and three different mAbs described in Ref. [20]. The bottom plot in particular shows that the residuals change by less than roughly 0.05 on a log scale when switching reference material, which corresponds to roughly 5% relative variation in antibody concentration. This validates that to within good approximation, the residuals can be expressed only in terms of s and n via Δgs,n+δs,n, as predicted by our thermodynamic model. In other words, the lack of exact harmonization is due almost entirely to coordinated assay-sample effects.

To further illustrate this last point independently from the noise terms δs,n and δs,n,r, recognize that the variance ςn2 is an estimate of the statistical properties of the Δgs,n via Eq. (25). Figure 5 shows the both the average of these variances over all n, as well as their assay-specific values. All estimates are independent of the reference material used.

FIG. 5.

FIG. 5.

Further confirmation that approximate harmonization via Eq. (34) is reference independent. Left: Square root of the average of ςn2 over all n assays. Note that the standard deviation is approximately constant across all standards. Right: Maximum likelihood estimate of ςn as a function of assay. On the right plot, the vertical red bars for each datapoint are one standard deviation confidence intervals associated with the jackknife analysis described in the main text. These confidence intervals are generally the same size as or smaller than the corresponding symbols and in some cases, are not visible.

VI. DISCUSSION: BROADER IMPLICATIONS FOR SEROLOGY

A. Deeper Comparison with Past Works

Several studies have considered the impacts of normalization and harmonization, both from the standpoint of establishing reference materials [14, 15, 24] and deploying them in real-world settings [36]. However, none of these works formally defined the relationship between normalization and harmonization, implicitly taking these tasks to be identical. More specifically, the authors tended to use manufacturer-specified scaling values (relative to a pooled standard) to harmonize measurements without the bias-correction term corresponding to Δgr,n. In terms of our analysis, this amounts to the assumption that T(n,cˆ)=cˆ; i.e. the equilibrium constants are separable.

This lack of distinction between the concepts of harmonization and normalization may therefore be responsible for significant confusion within the serology community. For example, a universal conclusion of Refs. [36] has been that “harmonized” (i.e. normalized) measurements are not interchangeable, even when using human-derived standards. Interestingly, the authors also observed at least two common trends: (i) normalized antibody measurements are correlated between different assays; and (ii) for a fixed sample, variability of normalized concentrations between assays increases with increasing antibody titers.

Our analysis provides a likely explanation for both observed results. Considering (i), the implicit choice T(n,cˆ)=cˆ ignores the reference-dependent bias term expΔgr,n appearing in Eqs. (21) and (22). Thus, it is not surprising that the harmonized values are proportional but not identical on average. We predict this missing constant of proportionality should be expΔgr,nΔgr,n for two different assays n and n. Concerning (ii), the increasing uncertainty with antibody concentrations is a direct manifestation of the uncertainty ϵs,n appearing in Eq. (22), which we have already shown yields the relative uncertainty Tn,cˆs,n,r,rϵs,n about consensus values. We anticipate that a post-hoc analysis of the results in Refs. [36] would lead to consistent predictions of the aforementioned physical quantities. A potentially insurmountable, and thus disconcerting corollary is that the sample-assay dependent effects may be so large in real-world settings as to induce significant uncertainty, even using the harmonization techniques that we propose.

Another notable point of comparison is the calibration study performed in Ref. [24]. The authors determined that in establishing the reference material, normalization via a human-derived, pooled standard harmonized samples to within a factor of two or better on average.13This contrasts with Ref. [20], wherein we found that after correcting for reference-induced biases (i.e. Δgr,n), harmonization could only be achieved to within a factor of approximately 2.5 (exp(0.9)) on average; see Fig. 5. Without the bias correction, harmonization was only achieved to within a factor of roughly 12 on average [20]. The result of Ref. [24] is especially surprising because the study therein included IgG measurements from SARS-CoV-2 spike, receptor binding domain (RBD), and nucleocapsid (N) assays, whereas we only considered the first two. In the former study, one might expect that the corresponding set of Δgr,n and Δgs,n would be larger, considering that more types of assays were used.

A deeper analysis of the experimental design in Ref. [24] suggests a resolution to this disparity. It is noteworthy that the corresponding validation study only attempted to harmonize 5 samples (compared to our 38), four of which were also used to develop the standard. In the context of Eq. (24), it is likely that this choice of validation samples causes cancellation between the terms Δgr,n and Δgs,n. Ultimately, this would lead one to underestimate the true uncertainties associated with harmonization via normalization alone (i.e. without using our bias-correction). Thus, we predict that a different validation study not using samples also found in the standard would yield results comparable to ours. This prediction is consistent with Refs. [36].

B. On the Mathematical Interpretation of a Consensus Value and Antibody Standard

It is notable that in Figs. 3 and 4, the measurements from Lab 2 are nearly identical to the consensus values. This begs the obvious question of why the analysis yields such a result, and more generally, what is the interpretation of our consensus value?

In this instance, we note that the inter-day variation δs,n,r for Lab 2 is nearly a decade smaller than the corresponding uncertainties for the next closest lab. Thus, it stands to reason that estimation of the consensus value is dominated by those measurements having the smallest uncertainties. Ostensibly this is problematic: a precise measurement is not necessarily accurate. However, we recall that antibody concentrations can only be estimated up to an unknown multiplicative factor, since the equilibrium constants are rarely, if ever determined; see Secs. IV A and IV B.

These observations suggest that a reasonable definition of consensus is one that minimizes disagreement between the results of different labs. The maximum likelihood estimate given by Eq. (34) interprets disagreement as corresponding to low probability of the observed measurements under the assumption that the joint sample-assay component Δgs,n of the free energy is random. [This modeling choice is justified by the fact that Δgs,n characterizes the immune response of an individual who is arbitrarily selected from a large population.] In this context it is reasonable that a measurement with high uncertainty contributes less to the consensus estimate; its underlying true value may be closer to the remaining measurements than is reflected by its nominal value.

The subjectivity of this decision highlights the fact that from a purely performance standpoint, there may not exist a universal, best reference material for antibodies without further knowledge of equilibrium constants. This stands in contrast to hardened metrological standards based on fundamental physical constants, e.g. as established by the connection between mass and Planck’s constant [45, 46]. Absent such relationships, we are forced to choose definitions for “harmonization” and “consensus antibody value,” and these necessarily control what we mean by a best standard. In our analysis, the concept of “best” is associated with the reference that induces the least additional uncertainty (again a choice) into harmonization, which is in turn fixed by Def. V, Eq. (34), and the supporting modeling assumptions. However, our choices are not unique, and others may lead to distinct notions of a best standard.

A key challenge for the serology community is therefore to agree on a harmonization convention, which is necessary before standards can even be fully established. The difficulty of addressing this problem is seen in Defs. IV and V. They provide a generic structure of what harmonization entails, but critically, do not propose a functional form of the mapping T(c,n) or noise ϵs,n. This latter task is complicated by virtue of being a task in mathematical modeling, although historically it has not be viewed as such.

Here we propose that the modeling paradigm should be dictated by its usefulness. In this respect, our approach has certain benefits grounded in its connection to physics. Because the underlying statistical model is induced by a thermodynamic description of antibody kinetics, it provides an intuitive justification for various choices. For example, Eq. (29), which is a geometric mean over antibody number, is revealed to be an arithmetic mean over Gibbs free-energies, a quantity for which this type of averaging may be appropriate. Perhaps more importantly, the thermodynamics reveals how the reference-dependent bias can be removed as a source of uncertainty. Critically, the resulting equivalence of all standards for purposes of harmonization enables one to consider a broader definition of fitness of purpose. Issues such as development times, manufacturing and distribution constraints, and traceability can become deciding factors in what constitutes a best serology standard.

Ultimately the generality of Def. V permits multiple interpretations of harmonization, and it is plausible that other approaches may further reduce uncertainty. While we believe that our underlying approach is useful, our primary goal in this section and the previous is to highlight the importance of rigorously defining and distinguishing the concepts of harmonization and normalization. These definitions and their realizations (e.g. via mathematical models) play a fundamental metrological role in ensuring reproducibility of measurements and developing reference materials.

C. On the Physical Interpretations of Gibbs Free-Energies and Consensus Values

Antibodies in serology samples are typically polyclonal, as evidenced by the fact that distinct SARS-CoV-2 antigens (for example) can be detected in the same blood [20, 24]. Thus, the reaction process described by Eq. (11) is a simplification of the true chemistry underlying serology assays. A more accurate representation would consider a collection of simultaneous reactions for each type of antibody with associated reaction kinetics. From a mathematical standpoint, however, this is problematic since one does not know a priori how many reactions to model. Moreover, it is reasonable to assume that one type of antibody (or perhaps a small subset thereof) dominates the chemistry of a single assay, which justifies Eq. (11).

This suggests a need to re-interpret Δgs,n. In Sec. IV, this quantity represents the sample-assay specific contributions to the free-energy under the assumption that a single antibody interacts with the assay. For a human-derived sample, we must add at a minimum that the specific type of antibody interacting with the assay can vary with the latter. While this seems obvious – e.g. some SARS-CoV-2 assays distinguish anti-nucleocapsid from anti-spike antibodies – it weakens the concept of a consensus value. That is, χs,r is not the total or even average concentration of antibodies of a specific type (e.g. anti-spike) in a sample. At best, we can say that it is a characteristic concentration conditioned on the number and types of assays used to construct it. In the context of Fig. 3, for example, we might say that the consensus is the typical concentration of anti-SARS-CoV-2 antibodies across all types considered in Ref. [20].

It is important to note that these observations do not change the underlying structure of our analysis, only its interpretation. Likewise, these conclusions do not meaningfully change if we take the reference material to be a human-derived, pooled standard. In such cases, we must re-interpret Δgr,n in the same way as we have done for Δgs,n. For human-derived standards, it is also reasonable that the raw, normalized antibody concentrations might exhibit less variation relative to the consensus as compared to mAbs, since the reference may have a collection of antibodies that will respond to each assay as might a test sample. Indeed, this effect is evident in Ref. [20]. However, as Figs. 4 and 5 illustrate, this decrease in variance of the raw data does not impact the final uncertainty estimates of Δgs,n or reference-induced uncertainty, which we find to be constant. Nor does it in any way change our harmonization method.

D. Probabilistic Connection to Neutralization

Neutralizing assays (cf. Appendix A) are often seen as a better (yet still imperfect) measurement for assessing immunity against a pathogen. Such assays are also significantly more expensive than the binding assays considered in this work. This begs the question of the extent to which we can use binding assays as a proxy for neutralization assays.

To better understand this issue, observe that Eq. (27) yields a probability density for a consensus value given a sample measurement from a specific assay normalized to reference r. We can denote this function by P(𝓎s,rys,n,r,r,n), meaning that the probability of a specific 𝓎s,r is conditioned on the triple (ys,n,r,r,n). If we likewise construct a probability density P(ν𝓎) of a neutralizing value ν (e.g. expressed as a 50 % neutralizing titer or NT50) conditioned on the consensus value, then the probability density of a neutralizing value given a scaled binding value is [41]

Pνys,n,r,r,n=d𝓎P(ν𝓎)P(𝓎ys,n,r,r,n). (35)

Equation (35) provides actionable information. Defining νt as a lower neutralizing threshold that guarantees a degree of immunity, one can quantify the probability 𝒫ννt that a person is protected by computing the integral

𝒫ννtys,n,r,r,n=νtdνPνys,n,r,r,n (36)

One can then find the minimum measured binding level ymin that guarantees the corresponding ν is above νt with confidence 𝒫ννt>95%, for example. In this case, ymin could be interpreted as the 95% “correlate level” for νt.

This suggests that the following may be a useful definition for correlates of protection. Let X and Z denote the outputs of two distinct types of measurements (e.g. a binding and neutralizing assay), and let p be a percentage satisfying 0<p1. We say that X is the p-correlate level for Z if it is the smallest value such that XX implies that the probability of ZZ is greater than p. Mathematically,

X=min𝒳𝒳:X𝒳𝒫ZZp, (37)

where 𝒫ZZ is the probability that ZZ. While seemingly abstract, this definition enables statements of the form, “A binding level of at least X implies that a neutralization level is at least Z with a probability of 95% or greater.” Note that X is a function of Z and the probability p.

To realize this type of analysis in practice, however, it is advisable to quantify uncertainties associated with the parameters appearing in Eq. (27). In addition to the variance associated with δs,n,r, which can estimated from repeat measurements, one should also estimate the uncertainty in the Δgr,n and ςn2. This latter task can be accomplished, e.g. via a jackknife type estimator as we have done.

E. Additional Limitations and Extensions

The primary limitation of this work is the thermodynamic model inducing the probabilistic analysis. Where the underlying physical assumptions are violated, our analysis may not be valid. Examples are discussed in the previous section. Other assays that may invalidate our analysis are those in which antibody avidity plays an important role, since the chemical reactions may be dominated by more complicated binding interactions not described by the Gibbs free-energy of Eq. (13) [31, 47]. However, the invariance of the residuals Δgs,n+δs,n with respect to the reference provides a powerful tool that can be used to check the appropriateness of the assumptions, and thus our analysis.

Despite this limitation, our analyses provides several routes for generalization and/or incorporation of new physical information. For example, the normalization procedure discussed in Sec. IV A can be augmented with constraints to test for the degree of collapse among dilution curves or different assumptions about the structure of the underlying curve. In the event that the residuals are too large, for example, one could hold out such samples for further investigation. See Ref. [48] for related methods. The probabilistic modeling of the within-lab uncertainties δs,n,r can also be estimated via more sophisticated techniques, e.g. bootstrap-type methods [36, 37]

Finally, we observe that probabilistic modeling of the individual terms δs,n and δr,n appearing in Eq. (28) may be useful in applications that seek to estimate correlates of protection in terms of P(𝓎s,rys,n,r,Δgr,n,ςn2). While not needed to validate the analysis herein, the separability of δs,n,r implies that the Δgr,n have additional uncertainty associated the contribution from δr,n. Thus, it may be desirable to treat Δgr,n as a random variable whose mean is given by our MLE analysis and distribution by the δr,n. Propagating the latter uncertainty into P(𝓎s,rys,n,r,Δgr,n,ςn2) would then provide more realistic estimates of this distribution.

F. Concluding Thoughts

The main objective of this work is to provide a theoretical foundation for tasks such as antibody normalization, harmonization, and estimating correlates of protection. As exercises in metrology, however, these tasks are challenging because their uncertainties are dominated by significant epistemic effects, e.g. lack of knowledge as to which antibodies are being detected, how they interact with the measurement system, and even what we mean by an antibody concentration. This is not a criticism so much as an observation: serology testing and immunity are difficult to understand due to complicated thermodynamic effects and inherent multiscale phenomena. Our approach has therefore been to identify those aspects that cannot be made precise and leverage UQ as a means to quantify our lack of understanding. This approach is enticing because it allows one to make informed decisions based on imperfect knowledge. It also suggests routes for optimizing – both informally and more mathematically – aspects of serology testing, as well as diagnostics in general. Looking forward then, our hope is that this work motivates a wider adoption of UQ within the biomedical community as a route to establishing rigorous principles of biometrology.

Acknowledgements:

The authors thank Drs. Ronald Boisvert, Charles Romine, and especially Barry I. Schneider for helpful discussion during the preparation of this manuscript. This manuscripts is an official contribution of the National Institute of Standards and Technology and is not subject to copyright in the United States.

The NIST Research Protections Office has approved the use of data described herein.

Appendix A: Introduction to Serology

To understand our motivations for studying serology, it is useful to have basic knowledge of the properties of antibodies, the processes that create them, and the methods by which they are measured.

Antibodies are a key part of the humoral immune system, which is mediated by macromolecules in extracellular fluid (i.e. “humors” or body-fluids). Humoral immunity is itself part of a larger adaptive immune response associated with those biological phenomena that change in response to pathogens in order to better fight infections. A key process of this adative immunity is so-called “affinity maturation” wherein certain T-cells force B-cells to undergo a process of hyper-evolution. As a result, the latter create antibodies that strongly bind to a specific chemical target (epitope), e.g. part of a virus. Physically, this process tailors the Gibbs-free energy of reaction between the antibody and target such that binding of the two is heavily favored over the reverse reaction. The B-cells themselves act as a sort of “memory” of the disease, and antibodies act as long-term (months or longer) defenses that disable their target antigen upon re-exposure. In this way, the body can more quickly respond to reinfection, and hopefully one experiences less symptoms, if any at all.

This overall picture explains why antibodies are objects of interest in diagnostic and public-health settings: they indicate when someone has been infected by a specific pathogen, especially after the disease has run its course. To a lesser extent, detecting antibodies also provides a degree of confidence that an individual has developed adaptive immunity to a disease, although such inferences are nuanced.

In fact, our description of humoral immunity is as oversimplified as it is intuitive. To highlight just a few relevant issues: (i) the human body produces not one, but at least five major types of antibodies, each with different structures and purposes; and (ii) any given antibody type will have a panoply of subtypes, each of which is specific to (i.e. binds with high affinity to) a different epitope, sometimes from the same pathogen. Antibody types are often called immunoglobulins, and convention dictates that they are denoted by the acronym “IgX” followed by the target epitope, where “X” stands for G, M, A, D, and E. IgG corresponds to the type best described by our picture above: they tend to have high-specificity and be long-lasting. IgM antibodies often appear within days to weeks of infection, are less specific, and do not last as long. IgA antibodies are often found in external fluid secretions and are often a relevant measurand when doing saliva testing. In the context of serology testing, the other two types of antibodies do not concern us.

Issues (i) and (ii) both lead to serious challenges when considering serology testing from a metrology standpoint, but to understand these, one must know how antibody measurements are performed. In an ideal biological setting, binding is described by a chemical reaction associated with Eq. (11). The goal of measuring antibody titers or “levels” is to estimate the concentration of bound complexes under limiting cases, i.e. when antibodies are the limiting reagent. Thus, the measurement process mirrors biology. In a binding assay, a blood or saliva sample is exposed to a substrate, i.e. a material containing copies of the target epitope of interest. After antibodies are bound, they are labeled with fluorescent tags, and the total fluorescence is measured as a proxy for number of bound complexes. Neutralizing assays are more complicated and involve a series of steps to determine what concentration of a sample is sufficient to inhibit growth of a target pathogen. This is a more direct (albeit incomplete) measure of “immunity.”

In this context, the presence of many different antibody types complicates process of measuring the reaction associated with Eq. (11), especially when they target the same or similar antigens. In a binding assay, this leads to multiple reactions competing for the same epitope. Experimentalists have means of distinguishing signals from different antibody types, but the thermodynamics is not described by a simple reaction. One might hazard that the true chemical equation looks something like

iaiYi+BiciYiB, (A1)

where Yi is the ith antibody type binding to substrate B,YiB is the corresponding bound complex, and the ai and ci are unknown coefficients describing the detailed balance associated with this system. The thermodynamics of this situation are full of unknowns, so that it is not entirely clear what one means by the concentration of a bound complex. We take a somewhat vague definition: the measurand is any bound complex reacting like a given antibody type (e.g. IgG) in a competitive binding environment. This may also be the most biologically relevant definition, since it corresponds to competitive reactions that happen in vivo. But then we cannot say that we are quantifying concetration of IgG antibodies, but rather only IgG-like molecules.

This problem is compounded by two other conventions in serology. First, it was normal during the SARS-CoV-2 pandemic to develop standards based on individual or pooled blood samples (i.e. drawn from multiple individuals and mixed together) having large but finite volumes. Second, common practice dictated defining concentration of antibodies for IgG, IgM, and IgA assays on the same scale relative to such standards. From the above discussion, it should be clear that this further confuses what we mean by concentration of antibodies. Pooled standards increase competitive binding, and combining multiple Ig types onto one scale means that we are abstracting to some generic notion of “concentration count.” On the surface this would seem to simplify the situation, since number can be understood easily in metrological terms. But as the main manuscript shows, this notion of “counting” is neither absolute nor even unambiguously relative. The issue comes back to the concept of Gibbs free energies, and the fact that only energy differences are meaningful.

For the purposes of this overview, our main takeaway is this: concentration measurements in serology use the language of counting in metrology, but fundamentally they are context-dependent. One does not measure the number of bound antibodies in isolation, but rather as impacted by effects such as competitive binding of other antibodies. A main goal of this manuscript is to show that the assay being used to perform the measurements is also a key part of that context.

To directly compare concentration measurements on a scale that permits quantitative comparison, it is necessary to account for this context and the uncertainty it induces. This leads to the concept of harmonization, i.e. the process of making measurements from different instruments interchangeable. In serology, this is often confused with normalization, i.e. the process of putting a measurement from a given instrument on a scale. This alone may be insufficient for harmonization if the context-dependent scales associated with each instruments are themselves different. In practice, this is in fact the case, since each assay contributes differently to a free-energy of reaction.

A proposed solution to some of these problems has been to use monoclonal antibodies (or mAbs) as reference materials. Monoclonals are synthetic (laboratory-made) antibodies whose properties can be tailored for different applications. A key feature of mAbs is the fact that they can be manufactured to have little, if any, variability between realizations of the same molecule. From a measurement standpoint this is desirable, since it eliminates the competitive binding problem for the reference material. Monoclonal antibodies also clarify interpretation of the measurements: harmonized results can be interpreted as quantifying the effective number of mAbs of a given type in a sample.

Appendix B: Affine Transformations for Dilutions Series: Global Method for Antibody Normalization

In this section, we address the numerical issues associated with estimating the scaled antibody level yˆs,n,r in terms of the invariant quantity αs defined in Sec. IV A. The analysis herein generalizes the methods of Refs. [18, 29].

Without loss of generality, we assume that the measurement value F corresponds to MFI. Let x=cd denote the concentration of bound antibodies in a sample with concentration c and dilution factor d, and assume that there exists a range xmin,xmax over which F(x) is strictly monotone increasing function of x. The physical interpretation is straightforward: more antibodies yield more fluorescence. We supplement this with this assumption that F(x) approaches lower and upper limits Fmin and Fmax (which we can always take to be positive) as x0 or x. Physically Fmin and Fmax can be interpreted as a noise-floor and detector saturation threshold. It is important to distinguish these sources of nonlinearity, which are instrument artifacts, from effects associated with nonlinear dependence of cs,n on ys as expressed by Eq. (12). We always assume that c (and thus x) is linear in y, even when fluorescence F is nonlinear in c.

In practice, estimating the αs via data collapse is complicated by three issues.

First, measurements are often taken at a few serial dilutions di(i=1,2,,D) spanning several decades.

Thus F(x) tends to be given on a sparse grid whose characteristic spacing grows exponentially. To make the spacing more uniform, we take a logarithm of F and express the resulting function in terms of 𝓍=ln(x), which transforms the measurement domain to <𝓍<. This transformation also preserves strict monotonicity of f(𝓍)=ln(F(e𝓍)/UF). As an added benefit, we find that f(𝓍) is typically sigmoidal. That is, for some inflection point 𝓍I,f(𝓍) is convex (concave) when 𝓍𝓍I(𝓍𝓍I). While not strictly necessary, this assumption is so convenient that we leverage it throughout. Section VI proposes generalizations and limitations of this choice.

Second, the sparsity of the di means that f(x) is only known at a few points, which makes it challenging to determine whether two dilution series coincide. We address this problem by determining multiple αs simultaneously by requiring that they all fall on the same curve. Here the sigmoid structure of f(x) plays an important role by ensuring that this curve has a physically reasonable structure.

Third, we anticipate that the αs are to be determined by some numerical method that iteratively varies these parameters to find their optimal values. However, doing so makes the grid of 𝓍 values dependent on the αs. This motivates us to treat the fluorescence values of each measurement as the independent variables, since these always define a fixed grid. By the strict monotonicity of f(𝓍), we may then write 𝓍=𝓍(f) as a function of the fluorescence values, which effectively “flips” our perspective about the line 𝓍=f. Recalling that 𝓍=ln(x)=ln(cd), we now see that the equality

𝓍=lncˆs,n,rd=lncr,nd/αs=lncr,ndlnαs (B1)

reinterprets lnαs as a constant vertical offset accounting for the difference between a reference and sample dilution series.

To realize these ideas mathematically, let fi,r be the fluorescence measurements associated with the reference material at dilution di. Assume S samples indexed by s having unknown normalized concentrations cˆs,n,r. For each of these samples, we assume corresponding measurements fi,s for dilutions di. In practice, the dilutions can be different for each sample, although for simplicity we assume the same set for each sample. Generalizations are trivial and left for the reader.

From this data, we create a single vector of elements fj comprised of the fi,s and fi,r in ascending order and without regard to their sample number or status as a reference. It is not problematic if values of fj are repeated. Also let αsj and dij denote the corresponding value of α and d for the jth fluorescence value, where sj can be a specific value of s or denote the reference r. To find the αsj, we postulate the existence of true log-antibody numbers 𝓍ˆj, which should be sufficiently close to the values predicted by the cr,ndij/αsj. In fact, under noiseless conditions, one expects

𝓍ˆjlncr,ndij+lnαsj=0. (B2)

In practice, there will be noise, which suggests the objective

^=j[𝓍ˆjlncr,ndij+lnαsj]2. (B3)

Assuming a value for cr,n, which we can take to be 1 for convenience, we minimize Eq. (B3) as a function of the 𝓍ˆj and the αsj, subject to the constraints that

𝓍ˆj=𝓍ˆjiffj=fj (B4)
αsj=1ifsj=r. (B5)

Note that setting cr,n=1 amounts to an arbitrary rescaling of the reference concentration, which can be undone by multiplying all concentrations by the appropriate units and scale factor at the end of all calculations; see Ref. [26].

By itself, Eq. (B3) does not define a well-posed optimization problem. For example, if the fj all correspond to distinct samples, then any values of the αsj will yield 𝓍ˆj that yield =0. In such cases, we require that our analysis reduce to single-point normalizations based on Eq. (5a), which enforces theoretical consistency with past work, e.g. Refs. [18, 19]. We therefore add two constraints and a small regularization. First, letting jk denote indices associated with unique values of fj in ascending order, we require that

𝓍ˆjk+1𝓍ˆjk. (B6)

That is, the antibody levels must be increasing with increasing fluorescence. Second, we require that 𝓍(f) be concave for fjk up to some inflection point fI and convex for all fjkfI. This is equivalent to the sigmoid assumption transformed about the line x=f. To enforce this constraint, we construct a second-order, finite difference matrix Am,jk(p) in terms of an undetermined inflection index p using the procedure in the appendix of Ref. [49], where

jkAm,jk𝓍jk0mp (B7a)
jkAm,jk𝓍jk0m>p. (B7b)

Third, we modify the objective function to be

^=^+ϵ˜1k=klowkhigh𝓍jk+1𝓍jk1fjk+1fjk112+ϵ˜2m=1NjkjkAm,jk𝓍jk2 (B8)

where ϵ˜1 and ϵ˜2 are small regularization parameters, and klow, khigh are user-defined lower and upper limits between which we expect the fluorescence signals to be approximately linear with antibody number. Thus, the regularization term associated with ϵ˜1 ensures that the reconstructed dilution curve has a linear region when there is only one measurement per sample. The regularization associated with ϵ˜2 penalizes excessive curvature. These parameters are chosen to have values that are roughly three decades smaller than the characteristic value of near its minimum, or, if ^=0 is in the feasible set, we define ϵ˜1=ϵ˜2=103.

To determine the remaining parameters, we minimize Eq. (B8) with respect to the 𝓍ˆj, αsj, and p, subject to Eqs. (B4)(B5) and the inequality constraints (B6)(B7b). It is straightforward to show that when the data is noiseless and the dilution curve is linear, the minimum of Eq. (B8) is unique and yields the true values of αs and 𝓍. Thus, our normalization procedure generalizes the techniques in Refs. [18, 19] and reduces to these approaches when only analyzing a single dilution associated with each sample and reference.

Figure 2 illustrates the results of this analysis applied to a collection of 38 SARS-CoV-2 positives samples and a mAb reference material, all measured using a ligand binding assay. For this analysis we set ϵ˜1=ϵ˜2=103 and klow=khigh to be the index associated with the measurement closest to the median fluorescence. By eye, it is clear that the raw dilution curves all have the same approximate shape (top subplot). After collapse, we find that with the exception of a few low-fluorescence data points, the characteristic deviation from the estimated dilution curve 𝓍 is less than 5 %, which is well within characteristic uncertainties associated with pipetting and sample preparation. We speculate that the few data points showing significant deviation are exhibiting noise associated with being near the instrument noise floor. Note also that we do not need to specify either a linear range or functional form of the dilution curve. See Ref. [20] for more examples of this analysis applied to an interlab study with multiple distinct reference materials and assays.

As a cautionary remark, the low-fluorescence data of Fig. 2 reveals the challenges of dealing with data fully at the noise-floor or upper saturation threshold. In such cases, the amount of relevant physical information is dwarfed by instrument artifacts, which violates the assumptions underlying the optimization. Thus, while our analysis does not need a linear fluorescence region per se, it does require the signal-to-noise ratio of the data to be suitably large. To ensure this is the case, we remove from our analysis all data-points for which the signal-to-noise is less than roughly 1/10. While this choice is subjective, we find for the examples herein that it yields reasonable results.

Footnotes

1

Certain commercial products are referenced (directly or indirectly) in this manuscript to clarify our theoretical analysis. Such reference does not imply endorsement or approval of any kind by NIST.

2

This mirrors the challenge of estimating binding kinetics for emerging SARS-CoV-2 variants.

3

When yr is unknown, as is the case for human-derived, pooled samples, its value is assigned arbitrarily [14, 15, 24]. In this case, it is equally valid to fix cr instead. This is purely a semantic choice that does not play a role in our analysis.

4

In fact, serology measurements are often performed on such a dilution series, although often only one dilution is used for data analysis.

5

Throughout we use the phrase data collapse to refer to a transformation that maps a function or dataset onto another function or dataset.

6

It is necessary to distinguish nonlinearity in the measurement due to Eq. (12) from nonlinear effects due to detection equipment such as photodetectors. This distinction is important for the analysis in Sec. IV A.

7

When Kˆm,n1 unit volume, Km,n=Kˆm,nbn. When bnym, one finds Km,n=1.

8

This property is also called rank factorization, since it amounts to representing a matrix in terms of objects with lower rank.

9

Note that the κYm and κBn are not determined uniquely by Eq. (15). We can always define new constants κˆYm=κYm/α and κˆBn=κBnα for any positive constant α such that the product Km,n=κˆYmκˆBn=κYmκBn is unchanged.

10

We do not explore the theoretical question of whether separability is necessary for harmonization.

11

Recall that the logarithm of concentration is linear in the Gibbs free energy. Thus Eq. (29) can also be viewed as an estimate of the average ΔG.

12

In principle, jackknife methods can be used to quantify and remove a bias associated with an estimator. In the experiments accompanying this manuscript, however, the measurements for some samples fell below the detection threshold for all but one lab. Leaving out the single uncensored concentration for that sample therefore couples the corresponding consensus value to only the terms Δgr,n and ςn2. In practice, we find that this yields jackknife bias correction terms that are unphysically large, so that we do not consider them further. Moreover, censoring is known to cause problems for resampling-based techniques [44]. As a result, we only consider the standard deviations from the jackknife distributions, which we treat as reasonable proxies for the true parameter uncertainties. A further analysis of this situation is beyond the scope of the present manuscript.

13

In contrast, using different study designs and analyses, Refs. [36] found that normalization still yielded up to 20-fold systematic discrepancies between assays. Comparing these results to Ref. [24] is challenging due to differences in the way results were reported. Reference [24] provided aggregate coefficients of variation across assays, whereas the other studies explicitly correlated normalized results between assays head-to-head.

References

  • [1].Abbasi J, JAMA 326, 1781 (2021). [DOI] [PubMed] [Google Scholar]
  • [2].West RM, Kobokovich A, Connell N, and Gronvall GK, mSphere 6, 10.1128/msphere.00201 (2021), https://journals.asm.org/doi/pdf/10.1128/msphere.00201-21. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [3].Muller L, Kannenberg J, Biemann R, Hönemann M, Ackermann G, and Jassoy C, Journal of Clinical Virology 155, 105269 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [4].Perkmann T, Perkmann-Nagele N, Koller T, Mucher P, Radakovics A, Marculescu R, Wolzt M, Wagner OF, Binder CJ, and Haslacher H, Microbiology Spectrum 9, e00247 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [5].Giavarina D and Carta M, Diagnosis 9, 274 (2022). [DOI] [PubMed] [Google Scholar]
  • [6].Infantino M, Pieri M, Nuccetelli M, Grossi V, Lari B, Tomassetti F, Calugi G, Pancani S, Benucci M, Casprini P, Manfredi M, and Bernardini S, International Immunopharmacology 100, 108095 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [7].Prechl J, Biologia Futura 72, 37 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [8].“Serological sciences network,” https://www.cancer.gov/research/key-initiatives/covid-19/coronavirus-research-initiatives/serological-sciences-network, accessed: 2022-10-11. [Google Scholar]
  • [9].Krammer F, The Lancet 397, 1421 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [10].Feng S, Phillips DJ, White T, Sayal H, Aley PK, Bibi S, Dold C, Fuskova M, Gilbert SC, Hirsch I,Humphries HE, Jepson B, Kelly EJ, Plested E, Shoemaker K, Thomas KM, Vekemans J, Villafana TL, Lambe T, Pollard AJ, Voysey M, Adlou S, Allen L, Angus B, Anslow R, Asselin M-C, Baker N, Baker P, Barlow T, Beveridge A, Bewley KR, Brown P, Brunt E, Buttigieg KR, Camara S, Charlton S, Chiplin E, Cicconi P, Clutterbuck EA, Collins AM, Coombes NS, Clemens SAC, Davison M, Demissie T, Dinesh T, Douglas AD, Duncan CJA, Emary KRW, Ewer KJ, Felle S, Ferreira DM, Finn A, Folegatti PM, Fothergill R, Fraser S, Garlant H, Gatcombe L, Godwin KJ, Goodman AL, Green CA, Hallis B, Hart TC, Heath PT, Hill H, Hill AVS, Jenkin D, Kasanyinga M, Kerridge S, Knight C, Leung S, Libri V, Lillie PJ, Marinou S, McGlashan J, McGregor AC, McInroy L, Minassian AM, Mujadidi YF, Penn EJ, Petropoulos CJ, Pollock KM, Proud PC, Provstgaard-Morys S, Rajapaska D, Ramasamy MN, Sanders K, Shaik I, Singh N, Smith A, Snape MD, Song R, Shrestha S, Sutherland RK, Thomson EC, Turner DPJ, Webb-Bridges A, Wrin T, Williams CJ, and the Oxford COVID Vaccine Trial Group, Nature Medicine 27, 2032 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [11].Perry J, Osman S, Wright J, Richard-Greenblatt M, Buchan SA, Sadarangani M, and Bolotin S, PLOS ONE 17, 1 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [12].Abbasi J, JAMA 327, 115 (2022). [DOI] [PubMed] [Google Scholar]
  • [13].Karger AB, Brien JD, Christen JM, Dhakal S, Kemp TJ, Klein SL, Pinto LA, Premkumar L,Roback JD, Binder RA, Boehme KW, Boppana S, Cordon-Cardo C, Crawford JM, Daiss JL, Dupuis AP, Espino AM, Firpo-Betancourt A, Forconi C, Forrest JC, Girardin RC, Granger DA, Granger SW, Haddad NS, Heaney CD, Hunt DT, Kennedy JL, King CL, Krammer F, Kruczynski K, LaBaer J, Lee FE-H, Lee WT, Liu S-L, Lozanski G, Lucas T, Mendu DR, Moormann AM, Murugan V, Okoye NC, Pantoja P, Payne AF, Park J, Pinninti S, Pinto AK, Pisanic N, Qiu J, Sariol CA, Simon V, Song L, Steffen TL, Stone ET, Styer LM, Suthar MS, Thomas SN, Thyagarajan B, Wajnberg A, Yates JL, and Sobhani K, medRxiv; (2022), 10.1101/2022.02.27.22271399. [DOI] [Google Scholar]
  • [14].Kristiansen PA, Page M, Bernasconi V, Mattiuzzo G, Dull P, Makar K, Plotkin S, and Knezevic I, The Lancet 397, 1347 (2021). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [15].Knezevic I, Mattiuzzo G, Page M, Minor P, Griffiths E, Nuebling M, and Moorthy V, The Lancet Microbe 3, e235 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [16].Niu L, Wittrock KN, Clabaugh GC, Srivastava V, and Cho MW, Frontiers in Immunology 12 (2021), 10.3389/fimmu.2021.647934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [17].Braden BC and Poljak RJ, in Idiotypes in Medicine: Autoimmunity, Infection and Cancer, edited by Shoenfeld Y, Kennedy RC, and Ferrone S (Elsevier Science B.V., Amsterdam, 1997) pp. 37–50. [Google Scholar]
  • [18].Frey A, Di Canzio J, and Zurakowski D, Journal of Immunological Methods 221, 35 (1998). [DOI] [PubMed] [Google Scholar]
  • [19].Barrette RW, Urbonas J, and Silbart LK, Clinical and Vaccine Immunology 13, 802 (2006). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [20].Wang L, Patrone PN, Kearsley AJ, and Lin-Gibson S, Submitted (2023). [Google Scholar]
  • [21].Smith R, Uncertainty Quantification: Theory, Implementation, and Applications, Computational Science and Engineering (Society for Industrial and Applied Mathematics, 2013). [Google Scholar]
  • [22].Tate JR, Johnson R, and Legg M, Clin Biochem Rev 33, 81 (2012). [PMC free article] [PubMed] [Google Scholar]
  • [23].McLawhon RW, Clinical Chemistry 57, 936 (2011), https://academic.oup.com/clinchem/article-pdf/57/7/936/32655704/clinchem0936.pdf. [DOI] [PubMed] [Google Scholar]
  • [24].Kemp TJ, Quesinberry JT, Cherry J, Lowy DR, and Pinto LA, Journal of Clinical Microbiology 60, e00995 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [25].Bentley EM, Atkinson E, Rigsby P, Elsley W, Bernasconi V, Kristiansen P, Harvala H, Turtle LC, Dobson S, Wendel S, Anderson R, Kempster S, Duran J, Padley D, Almond N, Rose NJ, Page M, and Mattiuzzo G, Expert Committee on Biological Standardization, 1 (2022). [Google Scholar]
  • [26].Wang L, Patrone PN, Kearsley AJ, Izac JR, Gaigalas AK, Prostko JC, Kwon HJ, Tang W, Kosikova M, Xie H, Tian L, Elsheikh EB, Kwee EJ, Kemp T, Jochum S, Thornburg N, McDonald LC, Gundlapalli AV, and Lin-Gibson S, International Journal of Molecular Sciences 24 (2023), 10.3390/ijms242115705. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [27].Nixon K, Jindal S, Parker F, Reich NG, Ghobadi K, Lee EC, Truelove S, and Gardner L, The Lancet Digital Health 4, e738 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [28].Dron L, Kalatharan V, Gupta A, Haggstrom J, Zariffa N, Morris AD, Arora P, and Park J, The Lancet Digital Health 4, e748 (2022). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [29].Andreasson U, Perret-Liaudet A, van Waalwijk van Doorn LJC, Blennow K, Chiasserini D, Engelborghs S, Fladby T, Genc S, Kruse N, Kuiperij HB, Kulic L, Lewczuk P, Mollenhauer B, Mroczko B, Parnetti L, Vanmechelen E, Verbeek MM, Winblad B, Zetterberg H, Koel-Simmelink M, and Teunissen CE, Frontiers in Neurology 6 (2015), 10.3389/fneur.2015.00179. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [30].Holland T and Holland H, Journal of Microscopy 214, 1 (2004). [DOI] [PubMed] [Google Scholar]
  • [31].Božič B, Čučnik S, Kveder T, and Rozman B, in Autoantibodies (Third Edition), edited by Shoenfeld Y, Meroni PL, and Gershwin ME (Elsevier, San Diego, 2014) third edition ed., pp. 43–49. [Google Scholar]
  • [32].Pathria R, Statistical Mechanics (Elsevier Science, 2016). [Google Scholar]
  • [33].Erickson HP and Corbin Goodman L, Biochemistry (2022), 10.1021/acs.biochem.2c00291. [DOI] [Google Scholar]
  • [34].Lay D, Lay S, and McDonald J, Linear Algebra and Its Applications (Pearson, 2016). [Google Scholar]
  • [35].Rukhin AL, Metrologia 46, 323 (2009). [Google Scholar]
  • [36].STINE R, Sociological Methods & Research 18, 243 (1989). [Google Scholar]
  • [37].Chernick MR, González-Manteiga W, Crujeiras RM, and Barrios EB, “Bootstrap methods,” in International Encyclopedia of Statistical Science, edited by Lovric M (Springer Berlin Heidelberg, Berlin, Heidelberg, 2011) pp. 169–174. [Google Scholar]
  • [38].BIPM IEC, IFCC ILAC, ISO IUPAC, IUPAP, and OIML, “Evaluation of measurement data — Guide to the expression of uncertainty in measurement,” Joint Committee for Guides in Metrology, JCGM; 100:2008. [Google Scholar]
  • [39].Clifford Cohen A, Truncated and censored samples (CRC Press, 2016). [Google Scholar]
  • [40].Tony NHK, “Censoring methodology,” in International Encyclopedia of Statistical Science, edited by Lovric M (Springer Berlin Heidelberg, Berlin, Heidelberg, 2011) pp. 221–224. [Google Scholar]
  • [41].Rasmussen CE and Williams CKI, Gaussian Processes for Machine Learning (The MIT Press, 2005). [Google Scholar]
  • [42].The Annals of Mathematical Statistics 29, 614 (1958). [Google Scholar]
  • [43].QUENOUILLE MH, Biometrika 43, 353 (1956). [PubMed] [Google Scholar]
  • [44].Portnoy S, Computational Statistics and Data Analysis 72, 273 (2014). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [45].Haddad D, Seifert F, Chao LS, Possolo A, Newell DB, Pratt JR, Williams CJ, and Schlamminger S, Metrologia 54, 633 (2017). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [46].Fang H, Bielsa F, Li S, Kiss A, and Stock M, Metrologia 57, 045009 (2020). [Google Scholar]
  • [47].Alberts B, Johnson A, Lewis J, Walter P, Raff M, and Roberts K, Molecular Biology of the Cell 4th Edition: International Student Edition (Routledge, 2002). [Google Scholar]
  • [48].Patrone PN, Romsos EL, Cleveland MH, Vallone PM, and Kearsley AJ, Analytical and Bioanalytical Chemistry 412, 7977 (2020). [DOI] [PMC free article] [PubMed] [Google Scholar]
  • [49].Patrone PN, Cooksey G, and Kearsley A, Phys. Rev. Applied 11, 034025 (2019). [DOI] [PMC free article] [PubMed] [Google Scholar]

RESOURCES