Abstract
We estimate the distribution of random parameters in a distributed parameter model with unbounded input and output for the transdermal transport of ethanol in humans. The model takes the form of a diffusion equation with the input being the blood alcohol concentration and the output being the transdermal alcohol concentration. Our approach is based on the idea of reformulating the underlying dynamical system in such a way that the random parameters are now treated as additional space variables. When the distribution to be estimated is assumed to be defined in terms of a joint density, estimating the distribution is equivalent to estimating the diffusivity in a multi-dimensional diffusion equation and thus well-established finite dimensional approximation schemes, functional analytic based convergence arguments, optimization techniques, and computational methods may all be employed. We use our technique to estimate a bivariate normal distribution based on data for multiple drinking episodes from a single subject.
Keywords: Distribution estimation, Biosensor data, Distributed parameter systems, Random parameters, Blood alcohol concentration, Transdermal alcohol concentration
1. Introduction
Researchers and clinicians studying and treating alcohol dependence have long sought the means to continuously and quantitatively monitor blood alcohol levels in naturalistic settings. The ability to do this would be extremely valuable for advancing a wide range of alcohol research and clinical treatment domains, including how alcohol concentrations relate to drinking motives, physical responses to alcohol, behaviors, decision-making, negative consequences, and environmental situational factors over the course of a drinking episode and across drinking episodes. At present the only ways to collect alcohol consumption data in the field are the self-report diary and the breath analyzer, both of which require the active participation of the subject and generally yield inaccurate estimates of blood alcohol level when obtained during naturalistic drinking episodes.
Recently, biosensors that measure transdermal alcohol concentration (TAC), the amount of alcohol diffusing through the skin (Fig. 1.1), have been developed and used, but primarily only as abstinence monitors (for example, in court mandated monitoring of DUI offenders). Because TAC data does not consistently correlate with breath/blood alcohol concentrations (BrAC/BAC) across individuals, devices, and environmental conditions, these devices have not experienced wide-spread acceptance among the research and clinical communities. Indeed, BAC and BrAC are currently, and historically have been, the standard measures of alcohol level intoxication among alcohol researchers and clinicians and unfortunately there is currently no well-established method for producing reliable estimates of BrAC/BAC (eBrAC/eBAC) from TAC data.
The transport and filtering of alcohol by the skin is affected by a number of factors that differ across individuals (e.g., skin layer thickness, porosity, and tortuosity, etc.) and across drinking episodes within individuals (e.g., body and ambient temperature, humidity, subject activity level, skin hydration, vasodilation, etc.). The implication is that, regardless of how reliable and accurate transdermal alcohol device hardware becomes at measuring TAC, the raw TAC data will never consistently map directly onto BrAC/BAC across individuals and drinking episodes.
In our work to date on determining eBrAC/eBAC from TAC (see, for example, [5] and [16]), we have taken a strictly deterministic approach to converting TAC to either BAC or BrAC. First principles physics-based models (a one-dimensional diffusion equation with either infinite or finite speed of propagation and input and output on the boundary) were fit to individual calibration data to capture the forward process - the transport of ethanol molecules from the blood, through the skin, and its measurement by the sensor. The result is TAC expressed as a convolution of BAC or BrAC with a kernel or filter. We then deconvolve an estimate of the BAC or BrAC from the biosensor measurements of TAC.
To greatly reduce the burden on researchers/clinicians and participants/patients and thus significantly increase the feasibility of using these devices, we have been investigating ways to eliminate the need to calibrate the sensor’s data analysis system to each individual, each sensor, and across varying current environmental conditions. In particular, we have been investigating the use of our first principles physics/physiological based models to describe the dynamics common to the entire population (interpreted broadly to include all individuals, devices, and environmental conditions) and then to attribute all un-modeled sources of uncertainty (primarily due to variations in physiology, hardware, and the environment) observed in individual data to random effects. We refer to this as a population model; it takes the form of our earlier deterministic transport models, but with the parameters now being random variables whose distributions are to be estimated based on aggregate population data.
We assume an underlying mathematical framework describing the system dynamics that are common to all individuals, environmental conditions, and devices in the population (e.g., the physics-based model for the transport of ethanol from the blood, through the skin, and measurement by the sensor), but that individual members of the population exhibit variation in the parameters appearing in the model (e.g., the rate at which the alcohol is transported, evaporates, etc.). We then assume that the sensor measures the sum or mean of all of these effects. This is realized in the form of a model based on random partial differential equations and boundary conditions, and then, instead of fitting the unknown parameters in the model directly to individual data, we now estimate the distribution of the random parameters (in the form of probability measures or density functions) based on aggregate population data. In this way, the fit describes the mean behavior of the population.
We consider the (in general, infinite dimensional) discrete time initial value problem given by
(1.1) |
(1.2) |
and assume that it describes the dynamics of a process common to the entire population. In addition, it is assumed that we are able to observe some function of the state, or solution, of (1.1), (1.2), x(u;q) = {xk(u,q)}, and the input, u = {uk}, as given by the output or observation equation
(1.3) |
In equations (1.1)–(1.3) it is assumed that the parameters q are contained in a set Q, the set of admissible parameters, and that their values are specific to a particular individual (interpreted broadly) member of the population. The objective is to estimate the parameters, q, or more precisely their distribution, based on population or aggregate data rather than data for a particular individual. We assume 𝓆 is a random vector with support set Q and that 𝓆~π(ρ), where π(ρ) is a family of distributions or push forward measures parameterized by ρ and defined on a sigma field of subsets of Q where , the set of feasible parameters. We will assume π(ρ) is defined in terms of a joint density function f(ρ); f(ρ) is absolutely continuous, non-negative, and
for A an event, where .
The statistical model for the population data upon which the estimation is to be based assumes that the observed data points can be represented by the expected value of the output of the model plus random error. We assume that we have ν subjects and μi noise-corrupted observations for the ith subject. For , we define
(1.4) |
for j = 0,1, …, μi, i = 1, …, ν, the mean behavior of the population at time tj given that the distribution of the parameters 𝓆 is described by the measure π(ρ). In (1.4), xi and yi are given by (1.1) – (1.3) with and x0 = xi,0. As is typically the case in standard linear regression, we assume our observations are given by
(1.5) |
where the εi,j, j = 1, …, μi, i = 1, …,ν, represent measurement noise which are assumed to be independent, identically-distributed random variables with mean 0 and common variance σ2, and represents the true values of the parameters.
Our estimation problem minimizes prediction error based on a least squares (or MLE if εi,j~N(0, σ2)) approach. For V = {Vi,j}, our estimator is given by
(1.6) |
where λ(ρ) is a regularization term, , the αi,j are positive weights, νi,j(ρ) is given by (1.4) and V = {Vi,j} by (1.5). In general (and in fact, in the problem of interest to us here), the fact that the dynamical system (1.1)–(1.3) may be (or is) infinite dimensional, means that the optimization problem given in (1.6) is not directly amenable to computation. Consequently, when we apply these ideas to the alcohol biosensor problem in Section 2 below, we replace the infinite dimensional state equation with a finite dimensional approximation. Relevant convergence issues which are the central focus of our research, are discussed in Section 4. We note that if one is trying to estimate the shape of f and the parameters ρ are the coefficients of basis elements with the dimension of the feasible parameter space increasing in dimension in order to achieve some level of convergence, then the presence of regularization in the form of the term λ(ρ), would likely be essential. In our studies here, we are not estimating the shape of f and the dimension of is fixed and relatively low (~10), and hence we found that regularization was unnecessary. Consequently we do not include λ(ρ) (i.e. we assume λ(ρ) = 0) in the remainder of our treatment below. It would not however, be difficult to include it, and we may do that in future work where the focus is either the estimation of the shape of the density or the corresponding measure, π, as in [4].
Our problem as posed is a classic mixed effects model (see, for example, [6,7], and the numerous references contained therein) featuring both inter- and intra-subject variation and uncertainty. The standard statistical model requires a function, which describes the relationship between the observations and the design variables of the form
where yij is the jth observation from the ith individual, the are the known design or state variables, the qi’s are the parameters specific to ith individual, and εij (typically ~N(0, σ2)) are i.i.d. with mean zero and common variance σ2. Moreover, it is also assumed that qi = h(ci, μ, ηi) where ci’s are the known vectors of the covariates, μ is the unknown vector of fixed effects, and the ηi are random, for example, ηi~N(0, Ω). In our case here, in this model, the parameters to be estimated are μ, σ, Ω with Ω the covariance matrix. When f and h are linear, estimation of these parameters is relatively straight forward using standard approaches. However, when f is nonlinear, or even more complex, for example, as in our case here wherein, due to the PDE-based model, a closed form finite dimensional representation is unavailable, and there is a clear lack of independence in the observations, a number of significant statistical and computational challenges result. Indeed, the state variables, xij, in our model are the solution of a partial differential equation and consequently the representation for the function f is in terms of a semigroup of infinite dimensional operators on a Hilbert space whose dependence on the parameters, qi, is highly nonlinear. As a result the use of any estimation technique that relies on a likelihood (e.g. MLE, EM, Stochastic Approximation Expectation Minimization, Bayesian estimation with MCMC, etc.) can be daunting (see, for example, [10]). Indeed, such an approach would in one way or another, likely involve repeated simulation requiring repeated solution of the PDE which is something we are trying to avoid. Another method involves estimating the state from the observations, then using nonlinear regression together with the PDE to estimate the parameters. In our case we have a one dimensional measurement of an infinite dimensional state and observability could very likely be an issue. Also, if one opted to use the Kalman filter to estimate the state based on the observations, its implementation is essentially equivalent to solving the PDE and would itself require finite dimensional approximation. In general, this approach often yields inaccurate estimates.
Along with measurements across different subjects, we have longitudinal measurements for each subject, which of course one would expect to be dependent. In addition, from a pharmacokinetic (PK) standpoint, our definition of “subject” or population is somewhat non-standard in that it refers to not only individual participants and their various physiological differences, but also environmental conditions suchas temperature and humidity, etc. as well as hardware related uncertainties. Thus, in our model, the qi’s describe unmodeled phenomena present both across different individuals and within the same individual. Indeed our working hypothesis is that each observation represents a sum or average of any number of diffusion processes all at work simultaneously. In addition, because human subjects are involved, it would be both costly and time consuming to collect simultaneous BrAC and TAC data from enough individuals wearing enough sensors and under sufficiently varied environmental conditions to estimate the distribution of the qi’s directly. Consequently, although it can have bias problems [14], it seemed most appropriate that our estimator in (1.6) take the form of what is known in the PK literature as the naïve pooled data estimator.
Our general approach relies to some extent on two relatively recent papers: 1) Banks and Thompson’s framework for the estimation of probability measures in random abstract evolution equations and the convergence of finite dimensional approximations in the Prohorov metric [4], and 2) Gittelson, Andreev, and Schwab’s theory for random abstract parabolic partial differential equations with dynamics defined in terms of coercive sesquilinear forms [9]. The approach in [9] is novel in the way that it treats the random variable as another “space-like” independent variable in the PDE. In this way, finite dimensional approximation is handled in much the same way that it is for the standard deterministic space variables. We use the framework in [9] together with the generation and approximation results from linear semigroup theory, (i.e. the Hille-Yosida-Phillips and Trotter Kato theorems, see, for example, [3, 11, 15]) to establish that the sufficient conditions for a Banks–Thompson-like convergence result are satisfied in the case of regularly dissipative systems with random parameters whose distributions can be described by appropriately parameterized probability density functions.
For a number of reasons, the approach we take here, and in particular our population model as defined in Section 3 below, is especially well suited for our estimation problem as given in (1.6) with the underlying dynamics (1.1), (1.2) being described by a random PDE. Indeed, 1) it does not require repeated simulation, 2) it takes particular advantage of the underlying parabolic structure of the model’s state equation, 3) it lends itself extremely well to functional analytic arguments for convergence of the estimators based on finite dimensional approximation, the central focus of this study, 4) based on our working hypothesis concerning our data as stated in the previous paragraph and the statistical model given in (1.5), the output of the population model is precisely what is required to evaluate the naïve pooled data based performance index, J (ρ; V) given in (1.6), and 5) it is especially well suited for deconvolving an estimate of BrAC from TAC which is our ultimate goal [20].
In this paper, we focus solely on the application of our framework to the alcohol biosensor problem outlined above. The abstract functional analytic approximation and convergence theory on which the results presented here are based was established in [18] and [19]. In addition, in our treatment here, we are only concerned with the fitting of the random parameters in the forward model. The inverse problem involving the deconvolution of the BAC/BrAC from the TAC signal once the forward model with random elements has been fit is treated elsewhere [20].
An outline of the remainder of the paper is as follows. In Section 2 we discuss our diffusion based distributed parameter model for the transdermal transport of ethanol, its abstract formulation as an infinite dimensional dynamical system with unbounded input and output, and the corresponding discrete-time input-output system on which our general framework is based. In Section 3 we discuss the treatment of initial value problems involving regularly dissipative operators with random coefficients as in [9] and [17], and apply it to the system of interest to us here discussed in Section 2. Section 4 outlines our finite dimensional approximation and convergence results, in Section 5 we discuss a consistency result for our estimator, and in Section 6 we present and discuss our numerical results using actual experimental/clinical data for the alcohol biosensor problem. Section 7 contains a brief discussion of future research, and some concluding remarks.
We use standard notation throughout. For example, we denote the space of square Lebesgue or Bochner integrable functions defined on an interval (a, b) with range in the normed linear space X by L2 (a, b; X), and we use C(a, b; X) when the functions are continuous. When , the range space is omitted. For normed linear spaces X and Y, denotes the space of bounded (continuous) linear operators defined on X with range in Y. Unless explicitly stated otherwise, all Hilbert space norms are the ones induced by the standard inner product on that space. We occasionally use “dot” notation to denote weak or strong derivatives with respect to time.
2. A Distributed Parameter Model for a Transdermal Alcohol Biosensor and its Abstract Formulation
Our ethanol transport model is based on diffusion, or Fick’s law, and consequently it is described by abstract parabolic operators. When formulated abstractly in a Gelfand triple setting, these operators are examples of what are known as regularly dissipative operators and can be shown to generate analytic or holomorphic semigroups (see [11, 15, 22]).
After converting to what are essentially dimensionless quantities (see [16]), we obtain the input/output model
(2.1) |
(2.2) |
(2.3) |
(2.4) |
(2.5) |
in the form of an initial-boundary value problem for a one dimensional diffusion equation with input and output on the boundary and two unknown parameters, q = (q1,q2). In the system (2.1) – (2.5) φ(t, η) is essentially the concentration of ethanol in the interstitial fluid in the epidermal layer of the skin at depth η and time t, u is the concentration of alcohol in the blood (BAC) as measured by a breath analyzer (BrAC), and y is the (TAC). The boundary condition (2.2) models the evaporation of ethanol at the skin surface, condition (2.3) captures the exchange of ethanol molecules between the (blood fed) dermal and epidermal layers of the skin. The output equation (2.5) models the biosensor measured TAC at the skin surface. We assume that there is no alcohol in the skin initially, so in general φ0 = 0 in (2.4).
Using the tools of functional analysis and linear semi-group theory, we reformulate (2.1) – (2.5) as a discrete-time SISO system with state space an infinite dimensional Hilbert space. In (2.1)–(2.5) the input and output are on the boundary and consequently the resulting continuous time input and output operators are unbounded with respect to the usual state space for such a system, L2(0,1). However, in the discrete or sampled time formulation which is of primary interest to us here, they become bounded.
Let V and H be Hilbert spaces with the embeddings V ↪ H ↪ V* dense and continuous, where V* denotes the space of continuous linear functionals on V. Let 〈·,·〉 and | · | denote the H inner product and norm, respectively, and let ∥·∥, denote the norm on V. For q ∈ {Q, dQ}, a compact metric space, let be a bilinear form satisfying the following three conditions:
-
i
(Boundedness) |a(q; ψ1, ψ2) | ≤ α0∥ψ1∥∥ψ2∥, ψ1, ψ2 ∈ V, q ∈ Q,
-
ii
(Coercivity) a(q; ψ, ψ) + λ0|ψ|2 ≥ μ0∥ψ∥2, ψ ∈ V, q ∈ Q
-
iii
(Measurability) For ψ1, ψ2 ∈ V, the map q → a(q; ψ1,ψ2) is measurable with respect to all measures π(ρ), .
For q ∈ {Q, dQ} let b(q), c(q) ∈ V*, and consider an input/output system in weak form as given by
(2.6) |
where φ(0) = φ0, ∈ H and 〈·,·〉v*,v denotes the natural extension of the H inner product to the duality pairing between V and V*. If we set and u ∈ L2 (0, T) it can be shown (see, for example, [13]) that the system (2.6) admits a unique solution φ ∈ W(0, T) that depends continuously on u ∈ L2(0, T). It follows that W(0, T) ⊆ C(0, T, H) and that y ∈ L2 (0, T).
For q ∈ Q, the q-dependent bilinear form on V × V, , defines a bounded linear operator by 〈A(q)ψ1, ψ2〉v*,v = −a(q; ψ1, ψ2), for ψ1, ψ2 ∈ V. Then, if we let denote any of the spaces V, H or V*, we can consider the linear operator A(q) to be the unbounded linear operator, where Dq = V in the case , and in the case or . It can then be shown (see, for example, [2,3,22]) that A(q) is a closed, densely defined unbounded linear operator on and it is the infinitesimal generator of an analytic semigroup of bounded linear operators, {eA(q)t: t ≥ 0} on .
For q ∈ Q, define the bounded linear operators and by 〈B(q)u, ψ〉V*,V = 〈b(q), ψ〉V*,Vu, and C(q)ψ = 〈c(q), ψ〉V*,V, respectively, for ψ ∈ V and . The input/output system can now be written formally in the standard state space form as
(2.7) |
where the state x(t) = φ(t,·). Using the fact that {eA(q)t: t ≥ 0} is an analytic semigroup on the spaces V, H and V*, it follows that a so called mild solution ([11,15]) to the state equation in (2.7) x is given by
(2.8) |
where x in (2.8) is in W(0, T).
Now let a sampling time τ > 0 and x0 ∈ V be given, and consider zero order hold inputs of the form u(t) = ui, t ∈ [iτ, (i + 1)τ), i = 0,1,2,…. Then, under the assumptions we have made here thus far (see [19]), it can be shown that using (2.8), it follows from the properties of analytic semigroups generated by regularly dissipative operators that
(2.9) |
where , , , xi = x(iτ) and yi = y(iτ), i = 0,1,2, ….
Boundedness of the operators and in (2.9) follows once again from the fact that {eA(q)t: t ≥ 0} is an analytic semigroup on V, H and V* ([2,3,13,22]). Indeed, the coercivity assumption, Assumption (ii) (possibly together with a change of variables), implies that we may assume without loss of generality that the operator A(q) from either H into H or V into V*, is invertible with bounded inverse. Consequently, it follows that
Let Q be a closed and bounded subset of endowed with the Euclidean metric, let H = L2 (0,1) together with the standard inner product , and norm denoted by | · |, and let V be the Sobolev space V = H1(0,1) together with its standard inner product and norm denoted by ∥·∥. Then we have the usual dense and continuous embeddings 𝑉 ↪ 𝐻 ↪ 𝑉*, where 𝑉* denotes the space of distributions dual to 𝑉. The forms and functions , and are given by
(2.10) |
〈b(q), ψ〉v*,v = q2 ψ(1), and 〈c(q), ψ〉v*,v = ψ(0), for ψ ∈ V. It follows that b(q) = q2 δ (· − 1) ∈ V* and c(q) = δ ∈ V*, where δ denotes the Dirac delta distribution, or unit impulse at zero. It is not difficult to argue that Assumptions (i)-(iii) hold for the form a(·; ·, ·) as given in (2.10) above. See [19] for a more abstract, detailed and rigorous description of how we deal with input signals on the boundary of the domain.
3. Systems Governed by Regularly Dissipative Operators with Random Parameters
In this section, we use ideas from [9] to consider systems of the form (2.1)–(2.5) with the parameters q ∈ Q random. Let 𝓆 be a p dimensional random vector with support in where . Let be given by and , and let Θ be a parameter set that is a compact subset of for some r. We assume that 𝓆 has distribution described by the absolutely continuous cdf, , or equivalently by the push forward measure , where θ ∈ Θ.
Let a(q; ·, ·) denote a sesquilinear form on V × V satisfying the conditions (i) - (iii) given in Section 2, and in particular that the function q ↦ a(q; ν, w) is π–measurable for any ν, w ∈ V. By appropriately defining new function spaces, it is possible to embed the randomness in the sesquilinear form a(𝓆; ⋅, ⋅), or equivalently in the operator, A(𝓆), in (2.7) into these spaces. Consequently, the input output system (2.7) can be stated in a way that makes the stochasticity in the operators, the state and the output effectively invisible and thus amenable to analysis and approximation using standard (deterministic) linear semigroup theory. In effect, the random variables are treated the same way as the space variables in the underlying PDE.
Toward this end, we define and . It then follows that , , form a Gelfand triple of separable Hilbert spaces with by identifying with its dual and identifying with the Bochner space . We also define the π-averaged sesquilinear form by
(3.1) |
where, υ, . Assumptions (i) and (ii) guarantee that a(·, ·) given in (3.1) is a bounded and coercive sesqui-linear form on , and therefore that -a induces a bounded linear map from into . It follows (see, once again, for example, [2,3,22]) that the operator where is the infinitesimal generator of an analytic semigroup of bounded linear operators on , and, moreover, that can be extended to an analytic semigroup on and restricted to an analytic semigroup on .
We assume that the maps q ↦ 〈b(q), ψ(q)〉v*,v, and q ↦ 〈c(q), ψ(q)〉v*,v are π–measurable for any , and that ∥b (q)∥v* and ∥c(q) ∥v* are uniformly bounded for q ∈ Q. (Assuming that b, would be fine as well.) We then define the two bounded linear operators and by the expressions and , respectively, for and .
We then consider the continuous time input/output system set in the spaces , , , given by
(3.2) |
(3.3) |
The mild solution to the initial value problem given in (3.2), (3.3) is then given by
(3.4) |
and therefore, from (3.3), for 𝑡 ≥ 0, that
(3.5) |
Once again let the sampling time τ > 0 be given and consider zero order hold inputs of the form u(t) = ui, t ∈ [iτ, (i + 1)τ), i = 0,1,2, …. Set and let , i = 0,1,2, …. It then follows from the variation of parameters formula for systems governed by analytic semigroups, (3.4), and (3.5) that
(3.6) |
with , where , , and . Boundedness of the operators and follows from the fact that is an analytic semigroup on , and . Once again, without loss of generality, we may assume that is invertible with bounded inverse, from which it follows that .
It can be shown ([9,17]) that given by (3.4) agrees π-almost everywhere with x(t) given in (2.8) for all t ≥ 0, and consequently it follows that , for all t ≥ 0, and therefore that
(3.7) |
We refer to (3.2),(3.3) or (3.6),(3.7) as our population model.
4. Finite Dimensional Approximation, Convergence and Computational Considerations
In light of the final expression in Section 3, we may formulate our estimation/optimization problem (1.6) as follows: Given v data sets , determine , and θ* ∈ Θ, feasible, which minimize
where is given as in (3.6) with . ∎
Henceforth we assume that the measures are described by a family of joint density functions , and let . We will require the following assumptions on the family of densities :
-
iv
The maps are continuous on for π-almost every .
-
vThere exist positive constants γ, δ such that
for π-almost every .
Assumptions (i)-(v) are sufficient to establish that the maps are continuous for k = 0,1,2, …,μi − 1, and i = 1,2, …,ν, and therefore that the map is continuous. Consequently it follows from compactness that problem has a solution.
Solving problem requires finite dimensional approximation. For each N = 1,2, …., let , and . Let and be vectors in with , and let θN ∈ Θ, and set , , and . Let be a finite dimensional subspace of . Let be such that and . Let be the orthogonal projection of onto and define . Define by
where, vN, .
Using assumptions (i)-(v), it can then be shown (see Sirlanci et. al. [18,19]) that on (i.e. that , t ≥ 0), where the constants M and λ0 are independent of N. Suppose further that for some λ ≥ λ0, as N → ∞, for every , where and denote respectively the resolvent operators and at λ. It then follows, from a version ([1,18,19]) of the Trotter Kato theorem (see, for example, [11,15]) that allows for the state spaces to depend on the parameters, , that
for , uniformly in t in compact intervals of [0, ∞).
Define the operators and by
and
where , and and consider the sequence of finite dimensional optimization problems given by
Given ν data sets , determine , and θN* ∈ Θ, feasible, which minimize
where is given by
(4.1) |
with , where , , and . ∎
We can then prove the following convergence theorem (see [18,19]).
Theorem 4.1 Let assumptions (i)-(v) hold. Then for each N = 1,2,…., the problems given above admit a solution . Suppose further that
-
vi
as N → ∞, .
Then there exists a subsequence, , with as j → ∞, and a solution to problem .
The optimization problems are solved numerically, typically via an iterative gradient-based scheme. Once a basis for the space is chosen, the operators in (4.1) can be represented as matrices and the value of the cost functional J and its gradient can then be computed. If for N = 1, 2, …, , then the matrix representation for is given by , for i, j = 1,2,…, KN. Matrix representations for the operators and are computed analogously. The matrix representations for , , and can then be used in a straight forward manner to compute the matrix representations for the operators , , and appearing in (4.1).
We compute using the adjoint [12]. For each i = 1,2, …, ν, set , j = 0,1,2, …, μi and define the adjoint systems
(4.2) |
Then at can then be computed as
(4.3) |
where .
The tensor can be computed at the same time as the matrix is computed using the sensitivity equations. For t ≥ 0 set . Then ΦN is the unique principal fundamental matrix solution to the initial value problem
(4.4) |
Setting ΨN = ∂ΦN/∂ρ, differentiating (4.4) with respect to ρ, interchanging the order of differentiation, and using the product rule, we obtain
(4.5) |
Combining the two initial value problems given in (4.4) and (4.5), and then solving we obtain
(4.6) |
5. Consistency of the Estimator
In the context of the alcohol biosensor problem of interest to us here, the estimator, defined in (1.6) is given by, , where , and θ* ∈ Θ are a solution to problem in Section 4.1. Under the following assumptions, using Theorem 4.2 in [4], it is possible to establish a consistency result for the estimator .
The measurement noise {εi,j} in (1.5) is i.i.d. with respect to a probability space {Ω, Σ, P} with EP [εi,j] = 0 and Var[εi,j] = σ2.
The feasible set of parameters is compact (i.e. closed and bounded since it is finite dimensional) and has nonempty interior.
For = 1,2, …,ν, μi = μ, and μτ = T, for some positive integer μ and some T > 0, where τ is the sampling time defined in Section 3.
Let V = {Vi,j} be as is given in (1.5) for some ρ0 ∈ int with as given in the definition of problem .
For each i = 1,2,…, ν, and with given by (3.5), ρ0 is the unique minimizer of Ji,0 in , where
A straight forward application of Theorem 4.2 in [4] can then be used to establish the following lemma.
Lemma 5.1 Assume that Assumptions (i)-(v) and (a) – (e) hold. Then there exists an A ∈ Σ with P(A) = 1 such that for all ω ∈ A and Ji,μ(ρ; V) as given in (1.6) with αi,j = 1, we have
as ν, μ → ∞, τ → 0, with μτ = T, uniformly in ρ, for .
Theorem 5. 1 (Consistency of the estimator ) Let be as defined in (1.6) together with problem . Then under the assumptions of Lemma 5.1, the estimator is consistent for ρ0. That is , as ν, μ → ∞, τ → 0, with μτ = T.
Proof The proof is quite similar to the proof of Theorem 4.3 in [4]. For ν = 1,2,…., let , let A ∈ Σ be as in the statement of Lemma 5.1, let ω ∈ A be fixed, and let δ > 0 be arbitrary. Then Lemma 5.1 implies that there exists an open local neighborhood of ρ0 of radius δ, Nδ(ρ0), such that by Assumption (e) there exists an ε > 0 for which J0(ρ) − J0(ρ0) > ε, for all , where is the compact (i.e. closed and bounded) set given by . Now once again by Lemma 5.1, there exist ν0, μ0, τ0, such that for all ν > ν0, μ > μ0, τ < τ0 with μτ = T, , , where J is as given in (1.6) or problem . Then with ν > ν0, μ > μ0, τ < τ0, μτ = T and ,
But . It follows that if ν > ν0, μ > μ0, τ < τ0, with μτ = T. Since δ > 0 was arbitrary and P (A) = 1, we have that , as ν, μ → ∞, τ → 0, with μτ = T.
6. Numerical Results
The approximating finite dimensional subspaces were constructed as follows based on the discretization of (η, q1, q2)-space. For each n = 1,2,…., let denote the set of standard linear B-splines on the interval [0,1] defined with respect to the usual uniform mesh, , and set (note the are the usual “pup tent” or “chapeau” functions of height one and support of width ). If Pn:H → Hn denotes the orthogonal projection of H = L2 (0,1) onto Hn, it is well known (see for example, (Schultz, [21]) that limn→∞ Pn φ = φ in H for φ ∈ H and in V for φ ∈ V. Then for i =1,2, and each mi = 1,2,…. let denote the set of standard 0th order B-splines (i.e. piecewise constant functions) on the interval [ai, bi] defined with respect to the usual uniform mesh, . If denotes the orthogonal projection of L2(ai, bi), it is not difficult to show [1,3] that in L2(ai, bi) for every ζ ∈ L2 (ai, bi). Then let N denote the triple (n, m1, m2), and L the multi-index L = (j, j1, j2), where j ∈ {0,1,2, …, n}, j1 ∈ {1,2, …, m1}, and, j2 ∈ {1,2, …, m2}. We then use tensor products to define as and set . It is then not difficult to argue that Assumption (vi) holds.
In both of the examples to follow we fit a truncated bivariate normal. Let , , and , and let denote the joint density for the bivariate normal with mean and covariance matrix Σ:
Let denote the corresponding cumulative distribution function. We then set
In order to guarantee that we only search over positive definite symmetric matrices Σ, we parameterize Σ as Σ = LLT, where
It then follows that
so long as L11 and L22 are both nonzero. Thus the optimization is over a feasible (e.g., since our model is diffusion based, we would want a1, a2 > 0 or ᾱ > 0) subset of with ρ = (a1, b1, a2, b2, μ1, μ2, L11, L21, L22).
All computations were carried out in Matlab on either MAC or PC laptops or desktops. For higher dimensional problems with high resolution discretization, faster platforms such as a cluster may be required. The optimization problems () were solved iteratively using the Matlab Optimization Toolbox routine FMINCON. We computed the requisite gradients using the adjoint as shown in (4.2) – (4.6) above and we also let FMINCON compute them using finite differences. Both yielded the same results. The finite difference calculations were faster, but the adjoint would likely be preferable for problems involving a higher dimensional parameter space such as would be encountered in the non-parametric case. This is because the required number of integrations of the state equation when using the adjoint method does not increase with the number of parameters to be estimated as it does with a finite difference scheme for computing the gradient of the cost functional. Initial estimates for the parameters were obtained by first fitting each dataset deterministically via nonlinear least squares to obtain estimates for q1 and q2. Then sample means, standard deviations, and covariances were used to compute initial estimates for a1, b1, a2, b2, μ1, μ2. The two random variables and were initially assumed to be independent, each with standard deviation one sixth of the length of the corresponding boundary of the initial domain. Some care must be exercised in choosing these initial guesses. Because of the nature of the approximation scheme we are using, if in any iteration the pdf becomes too flat, our Galerkin scheme’s mass and stiffness matrices can become singular or close to singular.
A WrisTAS™ 7 alcohol biosensor (Fig. 1.1) was worn for 18 days by one of the co-authors (S.E.L.) and was set to measure the local ethanol vapor concentration over the skin surface at 5-minute intervals. In addition, she contemporaneously collected breath measurements.
The first drinking episode was conducted in the laboratory with BrAC measured and recorded every 15 minutes from the start of the drinking session until BrAC returned to 0.000. She then wore the TAC device in the field and consumed alcohol ad libitum for the following 17 days. For each drinking episode, BrAC readings were taken every 30 minutes until the BrAC returned to 0.000. Figure 6.1.a shows the entire 18 day TAC signal along with the contemporaneous BrAC measurements. The 11 individual drinking episodes are marked. The TAC measurements provided by the sensor are in units of milligrams per deciliter (mg/dl), and the BrAC measurements are in units of percent alcohol.
We fit the population consisting of all eleven drinking episodes, but we also visually stratified the population into two groups, one containing the seven episodes in which the peak BrAC was higher than the (bench calibrated) peak TAC (episodes 1,2,4,6,7,8, and 11), and a second containing the remaining four drinking episodes in which the reverse was true (episodes 3, 5, 9, and 10). Our results for the first stratified group are shown in Figure 6.1.b – 6.1.i. In Figures 6.1.c,d,f,g, and i we plotted the training BrAC and TAC data for each of the episodes 1, 2, 6, 7, and 11 along with resulting fit population model estimated TAC and the 75% credible band. In Figures 2.1.e and 6.1.h we plotted the results of a cross validation on episodes 4 and 8. In Figure 6.1.b we plotted the optimal fit population truncated bivariate normal pdf. The converged values for the parameters were , , , , , , and .
The credible bands were computed directly from samples, of , obtained from the optimal distribution for using importance sampling and the state as , where is given by (4.1). We note that, strictly speaking, this is not valid since our theory yields only that , and thus that pointwise evaluation in is undefined. However, the results seem quite reasonable and are extremely useful and consequently we have included them. Our results for the full un-stratified data set and for the second stratified group were similar. We also applied our scheme to a population consisting of multiple subjects each with a single drinking episode with the results being quite similar to those presented above [20].
7. Discussion and Concluding Remarks
We note that our theory and general approach also applies to hyperbolic systems with either H-semicoercive or V-coercive damping [2] such as the telegraph equation which can be used to model diffusion with finite speed of propagation. We are investigating elimination of the requirement that the measures π be defined in terms of a density. We believe that it is possible to directly apply the Prohorov metric based framework developed in [4] by using a different version of the Trotter Kato-like semigroup approximation result in Section 4. We also believe that results for the estimation of functional parameters in parabolic systems could be used to estimate the pdfs non-parametrically. Indeed, in the system (3.2), (3.3), the pdf effectively plays the role of a non-constant coefficient in an abstract parabolic system. Thus, we should be able to estimate both the support and the shape of the density by parameterizing the pdf as a linear combination of basis elements (e.g. splines, orthogonal polynomials, etc.) and then estimating the coefficients in the expansion. We are also looking at polynomial chaos expansions for , and then estimating the coefficients.
Of primary interest to us is the estimation of the input to the system u, or the BAC/BrAC, from the output y, or TAC. Once the distribution of the random parameters has been estimated, this takes the form of a deconvolution problem. In [20] we use the results presented here together with the framework in [9] and [17] to do just that. We obtain an estimate of the input along with error bars or credible bands. Finally, we are looking at using the approach in [9] and [17] to control random parabolic systems, in particular, the computation of the feedback solution to the LQR and LQG problems for random distributed parameter systems.
Acknowledgements.
This research was supported by a grants from the Alcoholic Beverage Medical Research Foundation and the National Institute of Alcohol Abuse and Alcoholism (NIAAA) (R21AA17711, R01AA026368–01 S.E.L. and I.G.R.), and (R01AA025969, C.E.F.).
References
- [1].Banks HT, Burns JA and Cliff EM, Parameter estimation and identification for systems with delays, SIAM J. Contr. and Opt, Vol. 19, No. 6, November 1981, pp. 791–828. [Google Scholar]
- [2].Banks HT. and Ito K, A Unified framework for approximation in inverse problems for distributed parameter systems, Control Theory Advanced Technology Vol. 4, No. 1,1988, pp. 73–90. [Google Scholar]
- [3].Banks HT, and Kunisch K, Estimation techniques for distributed parameter systems, Boston, Birkhauser, 1989. [Google Scholar]
- [4].Banks HT and Thompson C, Least squares estimation of probability measures in the Prohorov metric framework, CRSC-TR12–21, N. C. State University, 2012. [Google Scholar]
- [5].Dai Z, Rosen IG, Wang C, Barnett N, and Luczak SE, Using drinking data and pharmacokinetic modeling to calibrate transport model and blind deconvolution based data analysis software for transdermal alcohol biosensors, Math. Biosci. and Eng, 13(5), 2016, pp. 911–934. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [6].Davidian M, and Giltinan D, Nonlinear Models for Repeated Measurement Data, NewYork: Chapman and Hall, 1995. [Google Scholar]
- [7].Demidenko E, Mixred Models, Theory and Applications, John Wiley and Sons, Hoboken, 2004. [Google Scholar]
- [8].Fairbairn CE, Bresin K, Kang D, Rosen IG, Ariss T, Barnett NP, Luczak SE, and Eckland NS, A multimodal investigation of contextual effects on alcohol’s emotional rewards, 2017, submitted, in review. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [9].Gittelson CJ, Andreev R, and Schwab C, Optimally adaptive Galerkin methods for random partial differential equations, Journal of Computational and Applied Mathematics,263, 2014, pp.189–201. [Google Scholar]
- [10].Grenier E, Louvet V, and Vigneaux P, Parameter estimation in non-linear mixed effects models with SAEM algorithm: extension from ODE to PDE. ESAIM: Math. Modelling and Num. Anal, 48(5), 2014, pp. 1303–1329. [Google Scholar]
- [11].Kato T, Perturbation Theory for Linear Operators, Springer, 1976. [Google Scholar]
- [12].Levi A and Rosen IG, A novel formulation of the adjoint method in the optimal design of quantum electronic devices. SIAM J. Ctrl and Opt, 48, 2010, pp. 3191–3223. [Google Scholar]
- [13].Lions JL, Optimal Control of Systems Governed by Partial Differential Equations, New York: Springer, 1971. [Google Scholar]
- [14].Mould DR and Upton RN, Basic concepts in population modeling, simulation, and model-based drug development, CPT: Pharmacomet. & Systems Pharm 2012, 1, e6. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [15].Pazy A, Semigroups of Linear Operator and Applications to PDEs, New York: Springer, 1983. [Google Scholar]
- [16].Rosen IG, Luczak SE, and Weiss J, Blind deconvolution for distributed parameter systems with unbounded input and output and determining blood alcohol concentration from transdermal biosensor data. Applied Mathematics and Computation, 231, 2014, pp. 357–376. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [17].Schwab C and Gittelson CJ, Sparse tensor discretization of high-dimensional parametric and stochastic PDEs, in Acta Num, 20, Cambridge U. Press, 2011, pp. 291–467. [Google Scholar]
- [18].Sirlanci M, Luczak SE and Rosen IG, Approximation and convergence in the estimation of random parameters in linear holomorphic semigroups generated by regularly dissipative operators, Proceedings of the 2017 American Control Conference, May 2017. [Google Scholar]
- [19].Sirlanci M and Rosen IG, Estimation of the distribution of random parameters in discrete time abstract parabolic systems with unbounded input and output: approximation and convergence, J. Math Anal and App, Submitted, 2017. Retrieved from https://arxiv.org/pdf/1807.04904.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
- [20].Sirlanci M, Luczak SE, Fairbairn CE, Bresin K, Kang D, and Rosen IG, Deconvolving the input to random abstract parabolic systems; a population model-based approach to estimating blood/breath alcohol concentration from transdermal alcohol biosensor data, submitted, 2018. Retrieved from https://arxiv.org/pdf/1807.05088.pdf [DOI] [PMC free article] [PubMed] [Google Scholar]
- [21].Schultz MH, Spline Analysis, Prentice Hall, Englewood Cliffs, N.J., 1973. [Google Scholar]
- [22].Tanabe H, Equations of Evolution (Vol. 6). Pitman Publishing, 1979. [Google Scholar]