A MEASURE-THEORETIC COMPUTATIONAL METHOD FOR INVERSE SENSITIVITY PROBLEMS I: METHOD AND ANALYSIS

J Breidt; T Butler; D Estep

doi:10.1137/100785946

. Author manuscript; available in PMC: 2013 Apr 29.

Published in final edited form as: SIAM J Numer Anal. 2011 Sep 1;49(5):1836–1859. doi: 10.1137/100785946

A MEASURE-THEORETIC COMPUTATIONAL METHOD FOR INVERSE SENSITIVITY PROBLEMS I: METHOD AND ANALYSIS

J Breidt ^†, T Butler ^‡, D Estep ^†

PMCID: PMC3638864 NIHMSID: NIHMS391569 PMID: 23637467

Abstract

We consider the inverse sensitivity analysis problem of quantifying the uncertainty of inputs to a deterministic map given specified uncertainty in a linear functional of the output of the map. This is a version of the model calibration or parameter estimation problem for a deterministic map. We assume that the uncertainty in the quantity of interest is represented by a random variable with a given distribution, and we use the law of total probability to express the inverse problem for the corresponding probability measure on the input space. Assuming that the map from the input space to the quantity of interest is smooth, we solve the generally ill-posed inverse problem by using the implicit function theorem to derive a method for approximating the set-valued inverse that provides an approximate quotient space representation of the input space. We then derive an efficient computational approach to compute a measure theoretic approximation of the probability measure on the input space imparted by the approximate set-valued inverse that solves the inverse problem.

Keywords: adjoint problem, density estimation, inverse sensitivity analysis, model calibration, nonparametric density estimation, parameter estimation, sensitivity analysis, set-valued inverse

1. Introduction

We develop and analyze a numerical method to solve the inverse sensitivity analysis problem: Given a specified variation and/or uncertainty in the output of a smooth map, determine variations in the input parameters that produce the observed uncertainty. We formulate this inverse problem using probability to describe variation by assuming that the inputs and outputs are random variables. This inverse problem has an abstract interpretation in which the density is imposed on the output in order to observe the consequences for the inputs. It also has an experimental interpretation in which the model output matches observed values of an experiment and the imposed density is associated with the experimental data, i.e., reflecting the uncertainty in the data or arising as a consequence of experimental error.

To motivate this inverse sensitivity analysis problem, consider the situation of a manufacturer who will purchase a large number of metal plates of a given alloy and thickness that are to be used subsequently in a high temperature environment. In order to ensure the plates maintain integrity, the manufacturer specifies that a given heat load must be distributed quasi-uniformly after ten minutes of exposure, with some conditions on how much the temperature may vary through the plate. The plates are milled with variations in the purity of the alloy and the thickness of the plates, both of which affect the heat distribution under load. To check a batch of plates to see if it meets the requirements, the manufacturer tests the heat specification on a random sample of plates drawn from the batch. The random selection of samples, the variation in plate properties, and measurement error combined lead to a description of the test results as a random variable. After delivery, the manufacturer decides that knowing the statistics on the size of the plates and the composition of the alloy would be useful. The heat equation models the heat distribution under a given load once the conductivity determined by the alloy composition and the thickness of the plates are specified. The inverse sensitivity problem is to determine the distribution on the space of parameters consisting of the thickness and alloy purity from the distribution of the results of the heat experiments on the plates.

The probabilistic inverse problem can be described more precisely as follows. Given

a model M(Y, λ) with solution Y = G(λ) depending on parameters and data λ in parameter space Λ ⊂ ℝ^d,
a linear functional q(λ) = q(Y (λ)) taking values in an output space 𝒟,
an observed probability density ρ𝒟(q(λ)) = ρ𝒟(q(Y (λ))) on the output value q(λ), determine
a probability density σ_Λ(λ) on the parameter space Λ that produces the observed density.

We assume the model M(Y, λ) depends smoothly on the inputs, so the map q(λ) is implicitly a smooth and deterministic function of λ.

There are several important issues associated with this problem. In general, the parameter space is multidimensional while there is a single observation (or a low dimensional set of observations at most). So, the inverse problem is ill-posed in the sense that the inverse solution of the deterministic model is set-valued. Under the assumption of a smooth model, we address this issue by constructing a systematic method for approximating set-valued inverses. Second, we are particularly interested in models that are complicated and/or expensive to evaluate, e.g., requiring the solution of a differential equation, so that the map to the output is determined implicitly. We address this issue by using adjoint operators [22, 20, 6, 21, 23, 12, 13, 9, 10, 7, 11] to compute the required derivative information. Third, while probability densities describe random variables, the densities themselves are not random. Common approaches to approximating probability densities often use a random representation obtained by some variation of Monte Carlo sampling [14, 17, 18]; however, this is not a requirement. In particular, the approach described in this paper is not stochastic, rather it is based on the simple approximation commonly used in measure theory.

In this paper, we present the basic method and analysis of a measure-theoretic computational approach for the probabilistic inverse sensitivity analysis problem. In [4], we present a numerical analysis of the discretization error that arises when evaluating the model by numerical solution and using a finite number of random samples to represent the distribution on the output quantity. In [5], we discuss the problem of dealing with multiple quantities of interest, which has application to data assimilation and “cascaded” uncertainty in operator decomposition solution of multiphysics problems.

This paper is structured as follows. In section 2, we formulate the probabilistic inverse problem that we solve and discuss the relation to a Bayesian inverse problem. In section 3.1, we deal with the set-valued nature of the inverse problem by introducing a theory of generalized contours and explain how the generalized contours can be approximated. In section 3.2, we develop a computational measure theoretic method for approximating the inverse parameter distribution using approximate generalized contours. In section 4, we apply the method to a variety of problems. Finally, section 5 summarizes the work.

2. Formulation of the probabilistic inverse problem

The inverse problem we study is the direct inversion of the forward stochastic sensitivity analysis problem for a deterministic model. We consider a deterministic operator q(λ) that maps values in a parameter space Λ to an output space 𝒟. We assume there is a parameter volume measure μ_Λ on Λ that determines the volume of sets in Λ. The volume measure depends on the units of measure used for the parameters and also reflects the structural dependency among the parameters, e.g., depending on whether or not μ_Λ is a product measure. The volume measure is specified as part of the model that defines the map q(λ) since the parameters must be explicitly defined in the physical model that determines q. We assume that μ_Λ is absolutely continuous with respect to the Lebesgue measure and the volume V of Λ is finite.

We first describe the forward stochastic sensitivity analysis for the deterministic map q(λ). We assume that a probability density σ_Λ(λ) is specified on the parameter space Λ. This density distinguishes the probability of different events in Λ, i.e., the probability of an event A in Λ, by which we mean a measurable set of values, is computed via

P (A) = \int_{A} σ_{Λ} (λ) d μ_{Λ} (λ) .

The deterministic model can be expressed in terms of a likelihood function L(q|λ) of the output q values given the input parameter values λ, where L(q|λ) = δ(q − q(λ)) is the unit mass distribution at q = q(λ). This implies the fundamental relationship

Law of Total Probability ρ_{𝒟} (q / A) = \frac{\int_{A} L (q | λ) σ_{Λ} (λ) d μ_{Λ} (λ)}{\int_{A} σ_{Λ} (λ) d μ_{Λ} (λ) .}

(2.1)

This is a Fredholm integral equation of the first kind that determines a conditional probability density ρ𝒟(q|A) on the output given that the parameters come from A. Thus, we may determine the conditional probability of event B ⊂ 𝒟 as

P (B | A) = \int_{B} ρ_{𝒟} (q | A) d μ_{𝒟} (q) = \frac{\int_{B} \int_{A} L (q | λ) σ_{Λ} (λ) d μ_{Λ} (λ) d μ_{𝒟} (q)}{\int_{A} σ_{Λ} (λ) d μ_{Λ} (λ)} .

For forward sensitivity analysis it is common to take A = Λ so that P(B|A) = P(B), and we arrive at the common form for the law of total probability given by

ρ 𝒟 (q) = \int_{Λ} L (q | λ) σ_{Λ} (λ) d μ_{Λ} (λ) .

(2.2)

This describes an analogue of a Perron–Frobenius map where the deterministic map q(λ) defines a transformation of the density σ_Λ(λ) to ρ𝒟(q). This forward sensitivity analysis problem is often solved using a Monte Carlo approach: Random parameter sample values λ are drawn from the distribution σ_Λ on the parameter space; corresponding values of q(λ) are computed; and these values are binned to produce an approximate probability distribution on the output.

The stochastic inverse sensitivity analysis problem that we study is the inversion of the Law of Total Probability (2.2).

We assume that an observed probability density ρ𝒟 (q(λ)) is given on the output value q(λ), and we seek to compute the corresponding parameter density σ_Λ(λ) that yields ρ𝒟(q(λ)) via (2.2).

It is important to note that what we seek for the solution of the inverse problem is the actual probability density that can be used to compute the probability of events in the parameter space Λ. In other words, we seek to compute the inverse of the analogue of the Perron–Frobenius map between the densities on the input and output spaces. The purpose of this paper is to describe a method for solving the inverse problem by providing a way to approximate the probability of an arbitrary event in the input space. This can be used subsequently to generate an approximation of the inverse density and/or to compute any desired statistical moments of the inverse density.

We emphasize the fundamental role of the underlying parameter volume measure μ_Λ in defining the solution of the inverse problem. In particular, the a priori specification of μ_Λ imposes the structure of the measure on Λ, e.g., whether the measure on Λ is a product measure or not. In general, there are many combinations of σ_Λ and μ_Λ that can yield a given observed density on the output.

We provide a simple illustration of the inverse problem using the map

q (λ) = λ_{1} + λ_{2},

where λ₁, λ₂ are random variables. For the inverse problem, we specify that q(λ) has a N(0, 2/25) distribution and seek to determine the parameter distribution σ_Λ(λ) that yields the specified output density. This output distribution can be generated by choosing λ₁, λ₂ to be independent identically distributed N(0, 1/25) random variables; see Figure 2.1. As well, we could choose any bivariate normal density

(\begin{matrix} λ_{1} \\ λ_{1} \end{matrix}) \sim N ((\begin{matrix} - α \\ α \end{matrix}), τ^{2} (\begin{matrix} 1 & ϱ \\ ϱ & 1 \end{matrix})) with 2 τ^{2} (1 + ϱ) \frac{2}{25}, ϱ \in [- 1, 1] .

Fig. 2.1 — Left: The N(0, 2/25) distribution imposed on the output λ₁ + λ₂. Right: The joint distribution of two independent N(0, 1/25) parameters λ₁ and λ₂. Summing these variables is one way to compute the imposed normal on the output quantity. Figures 2.2–2.3 show alternatives.

If we find a distribution on Λ that generates q(λ) according to a N(0, 2/25) distribution, then we accept this as a solution to the inverse problem. The choice of the underlying parameter volume measure μ_Λ is critical to this task. In Figures 2.1–2.3, we show five different probability densities σ_Λ(λ) that yield the identical N(0, 2/25) density on q(λ). Each of the five different densities correspond to five different underlying volume distributions μ_Λ as shown.

Fig. 2.3 — The joint distributions of parameters (λ₁, λ₂) sampled with respect to the density ρ_Λ(λ) and the corresponding volume measure presented in pairs of plots. Left two plots: The volume measure is uniform Lebesgue on the boundary. Right two plots: The volume measure is uniform Lebesgue on a nonconvex interior set.

The specification of μ_Λ has to do with how measurements in Λ are carried out and the relationships between the parameters. As noted, the volume measure should be specified as part of defining the model. In many situations involving deterministic models, the product Lebesgue measure appropriately scaled to account for units is the natural choice. But, this is not always the case. Continuing the motivating problem, as a first approximation, we might consider the thickness and alloy composition to be physically independent parameters and impose a product measure on the space formed by the two variables using independent normalized Lebesgue measures. A more realistic description will take into account the fact that the thickness of the plates indirectly depends on the alloy composition during the milling process. We can model the milling process to determine the thickness as an indirect function of the physically independent variables of pressure in the milling process and the alloy composition. The measure on the space consisting of the thickness and alloy composition is then determined by propagating the product measure imposed on the independent alloy composition and pressure variables through the milling model. The resulting measure on the space consisting of the alloy composition and thickness will not be a product measure.

The plots of inverse densities given in Figures 2.2–2.3 also illustrate the important point that injecting probability into the inverse problem by itself does not reduce the ill-posedness, even after specifying the parameter volume measure. The consequence of ill-posedness on the stochastic inverse problem is illustrated by the complex measure structure of the inverse probability densities in the plots. For example, these densities are not product measures. In general, it is not possible to determine densities for the individual parameters without further information. We can determine only a measure on the entire parameter space.

Fig. 2.2 — The joint distributions of parameters (λ₁, λ₂) sampled with respect to the density ρ_{_Λ}(λ) and the corresponding volume measure presented in pairs of plots. Left two plots: The volume measure is uniform Lebesgue on Λ. Right two plots: The volume measure is uniform Lebesgue a set with three distinct parts.

Comparison to a Bayesian inverse problem

There is another natural inverse problem associated with the Law of Total Probability (2.1) that is important in the case of a general likelihood function L(q|λ), not necessarily arising from a deterministic map. Namely, we may use Bayes’ theorem to invert the likelihood function to obtain the “posterior density” p(λ|q) given the “prior density” σ_Λ on the input space Λ and a “data density” ρ_𝒟 on the output space 𝒟. We emphasize that the solution of this Bayesian inverse problem is a conditional distribution. This is very natural when the map from the input to output space has been modeled statistically by specifying L(q|λ) given information about the statistical properties of the input parameters and output quantity, e.g., when the map is derived empirically, rather than from physical principles.

This Bayesian inverse problem is at the heart of Bayesian inference [26, 1, 19, 18]. In this approach, the inferential target is a single, unknown parameter (or parameter vector) λ. We are given data in the form of observations q₁,…, q_n, for which a typical assumption is conditional independence,

p (q_{1}, \dots, q_{n} | λ) \sim \prod_{i = 1}^{n} p (q_{i} | λ)

(2.3)

where {p(q_i|λ)} are conditional probability densities with respect to some appropriate measure, and are specified up to the value of λ. The right-hand side of (2.3) is the likelihood of the observations given the parameter. We are also given a prior distribution on λ that gives a probabilistic description of the uncertainty about the values of λ before any data are observed. This prior distribution is exactly σ_Λ(λ) in the notation used above. Bayesian inference then proceeds by using Bayes’ theorem to compute the posteriori conditional distribution of λ given the observations q₁,…, q_n:

p (λ | q_{1}, \dots, q_{n}) \propto p (q_{1}, \dots, q_{n} | λ) σ_{Λ} (λ) = \prod_{i = 1}^{n} p (q_{i} | λ) σ_{Λ} (λ) .

(2.4)

We could adopt a Bayesian approach to solve the inverse problem we study by modeling σ_Λ(λ) parametrically as σ_Λ(λ|θ) in terms of new (lower-dimensional) parameters θ. This is known as a mixture or hierarchical model. In Bayesian terminology, σ_Λ(λ|θ) is the prior while a new distribution σ_θ describing θ is the hyperprior. Assuming that the hyperprior is specified, we then compute the posterior distribution on θ given “data” from ρ𝒟(q(λ)). Any desired inferences about the distribution of λ given θ can then be obtained from the posterior. The difficulty with this approach is specifying a reasonable conditional model, which is difficult to verify empirically.

The inverse problem solved in this paper shares some characteristics with the Bayesian inverse problem, but has fundamental differences as well. In the Bayesian problem, the inferential target is the parameter λ, and σ_Λ is given as prior information. The likelihood L(q|λ) typically involves a nontrivial stochastic structure and is not deterministic.

By contrast, in the inverse problem we solve the inferential target is the distribution σ_Λ, which is not given as the prior. Further, our likelihood L(q|λ) is given by a deterministic map, which completely determines the set-valued inverse.

The choice of inverse problem to solve depends completely on the available information. In the case of a deterministic physics-based model, the unknowns and quantities subject to uncertainty are the data and parameter values that are input into the model and the observations that are supposed to match model output while the likelihood function determined by the map is completely trivial in a statistical/probabilistic sense. Based on the law of total probability, the inverse problem we solve is the direct inverse of the probabilistic forward sensitivity problem for a deterministic model.

3. Solving the inverse problem

As noted above, while probability densities describe the random nature of a random variable, the densities themselves are not random. While a common approach to compute a discrete approximation of a probability density employs random sampling, this is not necessary. In this paper, we describe a method for computing approximate probability densities that does not require random sampling. Our approach breaks the solution down into two stages:

Construct an approximate representation of the set-valued inverse solution of the deterministic model.
Use measure-theoretic computational methods to approximate the probability density (measure) structure on the parameter space that corresponds to the set-valued inverse and the observed output density.

These are independently interesting tasks.

We present a brief overview before providing the details. Under the assumption of a smooth map, if we are given a fixed output value q̄ ∈ 𝒟, then the implicit function theorem guarantees the existence of a (d − 1)-dimensional manifold in Λ that is mapped to q̄. Motivation comes from the two-dimensional case, λ = (λ₁, λ₂), where the manifolds are contours of the surface q(λ₁, λ₂) (left-hand illustration in Figure 3.1). Every point in Λ lies on a unique contour, so we may consider Λ as a set described by its contours. The set of (generalized) contours is an equivalence class in the input space, i.e., a quotient space representation of the input space. In Λ, there exists 1-dimensional curves transverse to the contours that intersect each contour once and only once (right-hand illustration in Figure 3.1). We can take one of these curves as the index for the set of contours. There is a bijection between the points on an index curve and the points in the range of the output q(Λ). Therefore, any measure posed on the range of the output imposes a measure on the index curve. Thus, the intersections of the contours with the index curve is a random variable with a distribution uniquely defined by the distribution of the output ρ_𝒟(q(λ)) (left-hand illustration in Figure 3.2). In other words, there exists a unique solution to the inverse sensitivity analysis problem in the set of the contours.

Fig. 3.1 — Left: Each observation value corresponds to a unique contour curve. Right: On the horizontal plane, we show a transverse parameterization. Each point on the transverse parameterization corresponds to a unique contour curve, so the transverse parameterization acts as an index for the space of contour curves. There is a unique map from the points in the interval containing the observed output values to the points on the transverse parameterization.

Fig. 3.2 — Left: We show a probability distribution imposed on the output values. A sample of output values drawn from this distribution corresponds to a unique sample of contour curves. Right: Plotted is a sample of contour lines in parameter space corresponding to a specified distribution on the output observation values along with three events. We specify the Lebesgue measure as the parameter volume measure. Event B has relatively low probability because while it has relatively large area, the probability of the contours is relatively low (visible because the density is sparse). Event A has intermediate probability because while the area of event A is relatively small, A contains contours with relatively high probability (which is visible because of the dense sample of contours). The probability of event C is largest because it contains the same high probability contours as A but has larger area.

However, determining the set of contours analytically is infeasible in practice. In [23], the forward sensitivity analysis problem defined by (2.2), where a given density σ_Λ(λ) is propagated through the output surface q(λ), is solved using a piecewise-linear tangent plane approximation to the output surface. This requires computations involving only inner products, which is cheap compared to the full model evaluation cost of q(λ) for each new value of λ. The derivatives of q(λ) are computed implicitly using adjoint methods. Motivated by this approach, we use a piecewise-linear tangent plane approximation to the output surface q(λ) to construct approximate contours and an approximate index set.

The next step is to determine the probability density on the parameter set that corresponds to the distribution on the transverse parameterization of the space of approximate contours. In order to assign a probability to a measurable set in Λ, we first recognize that such a set is defined by the contours it contains and the amount of each contour it contains (right-hand illustration in Figure 3.2). The parameter volume measure μ_Λ specified on Λ quantifies the amount of each contour contained in any given set. Combining the results of the generalized contours with such a measure, the monotone convergence theorem, and additivity properties of measures, we develop an algorithm to estimate the probability of any measurable set in Λ. This algorithm employs a piecewise constant approximation of measures that is commonly used in measure theory. This yields a direct computational method to approximate σ_Λ(λ).

In the next two sections, we provide details of the two ingredients of the approximate solution method.

Remark 3.1. Many solution methods for both statistical and deterministic inverse problems deal with ill-posedness by introducing some form of regularization, either directly or reposing the inverse problem as an optimization problem. Such methods avoid the need to deal with set-valued inverse solutions.

Remark 3.2. There are cases of interest, e.g., a parameter domain that contains a bifurcation point, for which the described method cannot be used in a straightforward fashion. We note that while an approach based on random sampling may be applied nominally to such a problem, the interpretation of the results is still problematic.

Remark 3.3. While the solution method for the inverse problem proposed here relies on derivatives of a quantity of interest, it is not dependent on how those derivatives are computed. Instead of an adjoint-based approach, the derivatives might be computed using (deterministic) forward sensitivity analysis that computes the derivatives directly along with the solution of the model. Yet another approach, presented, e.g., in [27], employs a stochastic spectral method to obtain a polynomial representation of q(λ), which is then used to compute gradients.

3.1. Determining the inverse of the deterministic model using generalized contours

We consider a finite dimensional map q from the space of parameters to the output defined implicitly by solving a finite dimensional nonlinear system of equations,

f (x; λ) = b,

(3.1)

where x ∈ ℝⁿ, parameter λ ∈ Λ ⊂ ℝ^d (assuming that Λ is compact) is a random vector, and f : ℝ^n+d → ℝⁿ is assumed smooth in both variables. The goal is to compute a quantity of interest q(λ) = q(x(λ)) = 〈x, ψ〉, described as a linear functional of the solution x(λ). If x depends smoothly on λ, then the dependence of q on λ is also smooth.

This problem applies in particular to differential equations that depend on a finite set of parameters. For differential equations, we require the same assumptions as the standard existence and uniqueness theorems to guarantee the smoothness of q(λ). This is discussed in more detail in the second part of this paper [4].

For any q̄ ∈ q(Λ), we define q̌(λ) := q(λ) − q̄. By assumption, q̌(λ) : ℝ^d → ℝ is continuously differentiable and there exists λ̄ ∈ Λ such that q(λ̄) = q̄, which implies that q̌(λ̄) = 0. We are mainly interested in the case where the quantity of interest varies as the parameters vary, so we assume that ∂_λd q̂(λ̄) ≠ 0, i.e. there is at least one nontrivial partial derivative. We may relax the restriction of ∂_λd q̌(λ̄) ≠ 0 for a finite number of points in Λ, where q(λ) possibly attains a local extreme value and ignore this set of points when considering the generalized contours.

By the implicit function theorem, there exists an open set U_λ̄ ⊂ Λ^d−1, where Λ^d−1 := {λ^d−1 := (λ₁, …, λ_d−1)|λ = (λ₁, …, λ_d) ∈ Λ}, containing λ^d−1, an open set V_λ̄ ⊂ Λ_d, where Λ_d := {λ_d|λ ∈ Λ}, and a differentiable function g_λ̄ : U_λ̄ → V_λ̄ such that

{(λ^{d - 1}, g_{\bar{λ}} (λ^{d - 1}))} = {λ | q (λ) = \bar{q}} \cap (U_{\bar{λ}} \times V_{\bar{λ}}) .

(3.2)

Since the implicit function theorem is a local result, there may be additional points in Λ that map to q̄, but are not contained in the set defined by (3.2). Thus, given q̄ ∈ q(Λ), we choose a collection of sets {U_λ̄ × V_λ̄} = ∪_α∈A{U_λ̄α × V_λ̄α}, where ∪_α∈A{λ̄_α} is the set of all λ ∈ Λ such that q(λ) = q̄. Then using the same notation as in (3.2), the function g_λ̄ (λ^d−1) might be piecewise defined. The set in (3.2) is a (d−1)-dimensional manifold that is a natural inverse of q(λ) given q̄. We call this set the generalized contour.

Theorem 3.1. If we choose distinct q̄₁, q̄₂ ∈ q(Λ), then the generalized contours for q̄₁ and q̄₂ are unique and do not intersect.

Proof. The nonintersection property follows immediately from the fact that q(λ) is a function. Uniqueness follows immediately from the choice {U_λ̄ × V_λ̄} = ∪_α∈A{U_λ̄α × V_λ̄α}, where ∪_α∈A{λ̄_α} is the set of all λ ∈ Λ such that q(λ) = q̄ for a given value of q̄ ∈ q(Λ).

In two dimensions, the generalized contours are simply contours of the surface q(λ₁, λ₂). We denote a generalized contour for a specific quantity of interest q̄ as q⁻¹(q̄). Since q(λ) is smooth and Λ is compact, q(Λ) defines a compact interval of real numbers, I_q := [q_m, q_M] = q(Λ), where q_m and q_M are the absolute minimum and absolute maximum of q(λ), respectively. We redefine q(Λ) to be the open interval (q_m, q_M), which we also denote by I_q.

We next prove that there exists (possibly discontinuous) 1-dimensional curves that are transverse to the generalized contours that can be used to index the family of generalized contours. We call any curve that has the property that it intersects each generalized contour once and only once a transverse parameterization (TP).

We give a constructive proof that is a useful algorithm. The algorithm produces discontinuous curves in Λ in general.

Theorem 3.2. Suppose f is smooth in (3.1) and q(λ) is a linear functional of the solution to (3.1). There exists a transverse parameterization for the set of generalized contours.

Proof. We construct the transverse curve from a finite number of connected curves. We fix ε > 0 and ε > δ > 0, and set I_q,ε = [q_m + ε, q_M − ε]. If Λ is compact, then the existence of transverse curves is guaranteed by the smoothness of q(λ). To construct a curve, we begin at a point γ_M ∈ Λ such that q(γ_M) = q_M − δ, and follow the direction of the negative gradient until the curve either intersects the boundary or a minimum or saddle is reached, and denote that point γ_m. From smoothness, exactly one contour for each value of q(λ) between (q(γ_m), q(γ_M)) is intersected by this curve. If (q(γ_m), q(γ_M)) does not completely cover I_q,ε, then we select a point τ_m ∈ Λ such that q(τ_m) = q_m + δ, and follow the direction of the gradient until the curve either intersects the boundary or a maximum or saddle is reached, and denote this point τ_M. We now check if (q(γ_m), q(γ_M)) ∪ (q(τ_m), q(τ_M)) covers I_q,ε. If so, then we eliminate any part of the second curve that gives an overlap with contours intersected by the first. Otherwise, we continue to create this curve as above trying to cover the output interval defined by (q(τ_M), q(γ_m)). This process produces a countable number of connected curves whose union forms a (possibly discontinuous) transverse curve through the generalized contours that corresponds to a countable open cover of I_q,ε, which is compact. Hence, there is a finite subcover of I_q,ε, which implies that the transverse parameterization can be constructed from a finite number of curves.

In practice, we construct the transverse curve to the generalized contours of I_q by initially following the first two steps above with ε = 0, i.e., locate γ_M ∈ Λ such that q(γ_M) = q_M and τ_m ∈ Λ such that q(τ_m) = q_m and construct the pieces of the transverse curve by following the negative and positive directions of the gradient, respectively. If we now take ε to be half the minimum of q(γ_M) − q(γ_m) and q(τ_M) − q(τ_m), then following the steps above, we construct a curve transverse to all the contours of I_q in a finite number of steps.

3.1.1. Approximating the set of generalized contours

Suppose that q is a linear function of λ, i.e., q(λ) = γ^⊤ λ for some γ ∈ ℝ^d (recall Λ ⊂ ℝ^d). Then for fixed q̄ ∈ q(Λ) we have (with the same conventions as above) U_λ̄, V_λ̄ , and g_λ̄ : U_λ̄ → V_λ̄ such that {(λ^d−1, g_λ̄(λ^d−1))} is the generalized contour. In this case, we write the function g_λ̄(λ^d−1) = (q̄ − (γ^d−1)^⊤ (λ^d−1))/γ_d explicitly. The generalized contour above is a (d − 1)-dimensional hyperplane, and we refer to this as a generalized linear contour.

We approximate generalized contours locally by generalized linear contours, and approximate a generalized contour by a generalized piecewise-linear contour. We use generalized piecewise-linear contours computed from a piecewise-linear tangent plane approximation to q(λ). If q is an affine map of λ, i.e., q(λ) = γ^T λ + q0 for some q0 ∈ ℝ, then we use the function above with q̄ replaced by q̄ − q₀.

We obtain derivative information required to compute the tangent plane approximations implicitly by introducing the adjoint operator. This approach is very useful when the forward map is complicated to evaluate, e.g., involving the solution of a differential equation. But, the derivative information can be obtained by any convenient method.

Local linearization of the linear functional

The goal is to approximate the map q(λ) with a piecewise-linear map q̂(λ) since it is possible to calculate the generalized contours for this approximate map.

Theorem 3.3. The generalized linear contours converge pointwise to the true contours locally in Λ.

Proof. Suppose we choose a reference parameter value λ = μ at which to solve

f (x; λ) = b

exactly. Call this reference solution y. Then according to Taylor’s theorem,

f (x; λ) = f (y; μ) + D_{x} f (y; μ) (x - y) + D_{λ} f (y; μ) (λ - μ) + ℛ,

where ℛ ~ O(‖x − y‖² + ‖λ − μ‖²), for |α| = 2. Here D_xf and D_λf denote the derivatives of f with respect to x and λ, respectively.

In order to compute the tangent plane approximation efficiently, we use the generalized Green’s vector φ that solves the adjoint to the linearized problem

A^{⊤} ϕ = ψ,

(3.3)

where A = D_xf (y; μ). Recall that q(λ) = 〈x, ψ〉, so by substitution of the above and standard linear algebra we arrive at

q (λ) = q (μ) - 〈 D_{λ} f (y; μ) (λ - μ), ϕ 〉 - 〈 ℛ, ϕ 〉 .

Neglecting the higher order term leads to an approximation of q by an affine map q̂. If we denote the generalized contour of q given q̄ by {(λ^d−1, g_λ̄ (λ^d−1))} and the generalized linear contour of q̂ given q̄ by {(λ^d−1, ĝ_λ̄ (λ^d−1))}, then at any λ^d−1 ∈ U_λ̄,

[g_{\bar{λ}} (λ^{d - 1}) - {\hat{g}}_{\bar{λ}} (λ^{d - 1})] [ϕ^{⊤} \partial_{λ d} f (y, μ)] = - 〈 ℛ, ϕ 〉 .

(3.4)

By assumption, ∂_λd q(λ) = φ ^⊤∂_λd f(y, μ) ≠ 0, so we rewrite (3.4) as

[g_{\bar{λ}} (λ^{d - 1}) - {\hat{g}}_{\bar{λ}} (λ^{d - 1})] = C 〈 ℛ, ϕ 〉,

where C⁻¹ = −φ^⊤ ∂_λd f(y, μ), is a nonzero constant determined entirely by the reference point (y, μ). Thus, if we define

‖ U_{\bar{λ}} ‖ = sup_{λ \in U_{\bar{λ}}} {‖ λ - \bar{μ} ‖}_{2},

where ‖ ‖₂ denotes the standard Euclidean norm, then as ‖U_λ̄‖ → 0, ‖R‖₂ → 0 which implies that |g_λ̄(λ^d−1) − ĝ_λ̄ (λ^d−1)| → 0.

Global linearization of the linear functional

We extend the local linearization technique to obtain a global piecewise-linear approximation of the linear functional over all of Λ. We first define a partition of cells ${B_{i}}_{i = 1}^{M}$ of Λ. The geometry is immaterial, as long as we can integrate constant functions over the cells. We apply the local linearization technique described above for each cell, and defining

1_{B_{i}} (λ) : = {\begin{matrix} 1 & if λ \in B_{i}, \\ 0 & if λ \notin B_{i}, \end{matrix}

we obtain a global piecewise-linear approximation q̂(λ) to q(λ) defined by

\hat{q} (λ) : = \sum_{i = 1}^{M} (q (μ_{i}) + 〈 \nabla q (μ_{i}), (λ - μ_{i}) 〉) 1_{B_{i}} (λ),

(3.5)

where μ_i is the reference parameter value chosen in cell B_i.

Theorem 3.4. As ‖B_i‖ → 0 (or as M → ∞ when the number of sample points are distributed uniformly), the generalized linear contour converges pointwise to the generalized contour.

Proof. For the finite system of nonlinear equations, we have

\nabla q (μ_{i}) = ϕ_{i}^{⊤} D_{λ} f (y_{i}; μ_{i}),

where φ_i solves the linearized adjoint problem using the reference point (y_i, μ_i). If we let − 〈ℛ_i, φ_i〉 denote the higher-order terms neglected in the linearization of q(λ) in cell B_i, then we can write the error of the piecewise-linear approximation, e(λ) = q̂(λ) − q(λ), as

e (λ) = - \sum_{i = 1}^{M} 〈 ℛ_{i} ϕ_{i} 〉 1_{B_{i}} (λ) .

The generalized linear contour of q̂ given q̄ is a collection of hyperplanes in Λ. Using the same notation as above,

\begin{matrix} | g_{\bar{λ}} (λ^{d - 1}) - {\hat{g}}_{\bar{λ}} (λ^{d - 1}) | \leq C \sum_{i = 1}^{M} | 〈 ℛ_{i}, ϕ_{i} 〉 |, & C^{- 1} = min_{i} \end{matrix} {| ϕ_{i}^{⊤} \partial_{λ d} f (y_{i}, μ_{i}) |} .

This yields the convergence result.

The transverse parameterization (TP) for the generalized linear contours is constructed using q̂ in the same way as described in the proof of Theorem 3.2. Since q̂ is a piecewise-linear surface, the resulting TP is a piecewise-linear curve in Λ.

Examples

We illustrate the convergence of generalized linear contours to true contours in the two examples below.

In the first example, we suppose that $q (λ_{1}, λ_{2}) - λ_{1} λ_{2} exp [- (λ_{1}^{2} + 1.25 λ_{2}^{2} - 1)]$ over [0, 2] × [0, 2]. We approximate q over a uniform partition {B_i} of [0, 2] × [0, 2] into squares, and we linearize around the midpoint of each B_i to form q̂ in (3.5). We plot various contour curves and two TP’s on each plot. The results are summarized in Figure 3.3.

Fig. 3.3 — Contours of q̂ using 5×5 cells (top left), 10×10 cells (top right), 25×25 cells (bottom left), and 50 × 50 cells (bottom right). The TP is created using the algorithm outlined in the proof of its existence and is denoted by the circle-dotted and plus-dotted lines. The circle-dotted line is constructed from the maximum of q(λ) and follows the negative direction of the gradient of q(λ), and the plus-dotted line is constructed from the minimum of q(λ) and follows the direction of the gradient.

For a second example, we suppose q(λ₁, λ₂) = exp [cos(λ₁) + sin(λ₂)] on [−2π − 0.1, 2π + 0.1]². We proceed as above to obtain the numerical results summarized in Figure 3.4.

Fig. 3.4 — Contours of q̂ using 7×7 cells (top left), 10×10 cells (top right), 25×25 cells (bottom left), and 50 × 50 cells (bottom right). The TP is created using the algorithm outlined in the proof of its existence and is denoted by the square-dotted and circle-dotted lines. The square-dotted line is constructed from the maximum of q(λ) and follows the negative direction of the gradient of q(λ), and the circle-dotted line is constructed from the minimum of q(λ) and follows the direction of the gradient.

3.2. Computing the parameter probability density

We now explain how to use the unique solution to the inverse problem in the space of generalized contours to compute an approximation of the probability density σ_Λ on Λ. We first observe if I = [q₁, q₂] ⊂ 𝒟 is an event with probability P(I) = P(q(λ) ∈ I), then this corresponds to a measurable set in Λ that is defined as the set of all contours obtained by q⁻¹(I). From the basic assumptions of smoothness and the nonintersecting property of the contours, the set of all contours is a set in Λ that is contained between the two contours defined by q⁻¹(q₁) and q⁻¹(q₂) (or possibly one of these contours and the boundary of Λ). We assign this set the probability P(I). It follows immediately that we can define the inverse into the set of generalized contours for a given distribution of q(λ) uniquely.

Theorem 3.5. Suppose f is smooth in (3.1) and q(λ) is a linear functional of the solution to (3.1). If q(λ) is a random variable with distribution F_q(q(λ)), then for a fixed TP in Λ, the distribution of the intersections of the generalized contours on the TP, which is a random variable, is unique.

The probability of a measurable set in Λ is determined by the contours the set contains and the amount of each contour the set contains and the probabilities of those contours. The parameter volume measure μ_Λ determines the contours a given set contains and the amount of each contour the set contains.

3.2.1. Computational measure theory

The method we develop for computing an approximate probability distribution is based on constructions used in measure theory.

Theorem 3.6. Given a measurable set A ⊂ Λ. we can approximate P(A) using a simple function approximation to σ_Λ(λ), which requires only calculations of volumes in Λ

The constructive proof below parallels Algorithm 1 for approximating the probability of a measurable set A ⊂ Λ.

Proof. For λ restricted between any two contours induced by a subinterval of a partition of 𝒟 as in Algorithm 1, q(λ) is approximately a uniformly distributed random variable. Suppose that ${q_{j}}_{j = 0}^{N}$ is a partition of 𝒟 such that q₀ < q₁ < … < q_N, and if E_j = [q_j−1, q_j ], then 𝒟 = ∪_jE_j. Let A_j = {λ | q(λ) ∈ E_j}. We assume that Λ = ∪_jA_j . The probability of A_j is given by

P (A_{j}) = \int_{A j} σ_{Λ} (λ) d μ_{Λ} (λ) .

We can compute this probability because of the 1-1 correspondence between the contours and output values, i.e., P(A_j) = P(E_j) = ∫_Ej ρ_𝒟(q) dμ_𝒟(q). Therefore, we have a simple function approximation to σ_Λ(λ) given by

σ_{Λ} (λ) \approx σ_{Λ, N} (λ) = \sum_{J = 1}^{N} \frac{P (A_{j})}{μ_{Λ} (A_{j})} 1_{A_{j}} (λ),

Algorithm 1.

Approximate Parameter Probability Distribution Method

Fix simple function approximation,

ρ_{𝒟}^{(M)} (q)

, to ρ_𝒟(q) that induces a partition

\cup_{i = 1}^{N (M)} [q_{i - 1}, q_{i})

of 𝒟 where for each i = 1, …, N(M),

ρ_{𝒟}^{(M)} (q)

is constant on each subinterval [q_i−1, q_i)

\cup_{i = 1}^{N (M)} [q_{i - 1}, q_{i})

induces a partition of Λ by generalized contours and

{A_{j}}_{j = 1}^{N (M)}

denotes this partition

Let P_j denote probability of A_j given by

\int_{[q_{j - 1,} q_{j})} ρ_{𝒟}^{(M)} (d) d μ_{𝒟} (q)

Partition Λ with small cells

{b_{i}}_{i = 1}^{M'}

for i = 1, … M′ do

for j = 1, …,N(M) do

Calculate ratio of volume of b_i ∩ A_j to volume of A_j, store in matrix V_ij

end for

Set P(b_i) equal to

\sum_{j = 1}^{N (M)} V_{i j} P_{j}

end for

Given event A ⊂ Λ, estimate P(A) using

inner sums, i.e., sum of P(b_i) for all i ∈ I ⊂ {1, …,M′} such that b_i ⊂ A,
outer sums, i.e., sum of P(b_i) for all i ∈ I ⊂ {1, …,M′} such that b_i ∩ A ≠ ∅,
average of inner and outer sums, or
∫_A σ_Λ,M′ (λ) dμ_Λ(λ), where $σ_{Λ, M'} (λ) = \sum_{i = 1}^{M'} P (b_{i}) 1_{b_{i}} (λ)$ .

Open in a new tab

Given event A ⊂ Λ, we use the law of total probability to write

P (A) = \sum_{j = 1}^{N} P (A | A_{j}) P (A_{j}) .

Using the above simple function approximation to the parameter density, we have

P (A | A_{j}) = \frac{P (A \cap A_{j})}{P (A_{j})} = \frac{\int_{A \cap A_{j}} d μ_{Λ} (λ)}{\int_{A_{j}} d μ_{Λ} (λ)} = \frac{μ_{Λ} (A \cap A_{j})}{μ_{Λ} (A_{j})} .

Hence, the probability P(λ ∈ A| q(λ) ∈ E_j) = P(A|A_j) can be calculated from the volume measure on model space since it depends only on measurable sets in Λ if we use the approximation q(λ) ~ 𝒰 (E_j) for λ ∈ A_j. The value is the ratio of volume of A ∩ A_j to the volume of A_j. Since the density on data space is a nonnegative integrable function, there exists a sequence of simple functions ${ρ_{𝒟}^{(M)} (q)}_{M = 1}^{\infty}$ with

ρ_{𝒟}^{(M)} (q) = \sum_{k = 1}^{2^{2 M} + 1} \frac{k - 1}{2^{M}} 1_{I_{M, k}} (ρ 𝒟 (q)),

and I_M,k = [(k − 1)/2^M, k/2^M]. We first observe that the partition {I_M,k} induces a partition {E_M,k} of 𝒟. Also, we observe that $ρ_{𝒟}^{(M)} (q) \to ρ_{𝒟} (q)$ in L¹ as M → ∞ by the monotone convergence theorem, and for any measurable set E ⊂ 𝒟,

\int_{E} ρ_{𝒟}^{(M)} (q) 𝒟 μ_{𝒟} (q) = \sum_{k = 1}^{2^{2 M} + 1} \frac{k - 1}{2^{M}} μ_{𝒟} (E_{M, k} \cap E) \to P_{𝒟} (E) as M \to \infty .

Thus, we can approximate the value of P(A|A_j) by the ratio of volume of A ∩ A_j to the volume of A_j obtained from the volume measure on model space if the induced partitions {A_j} come from a sufficiently fine partition {E_j} of data space so that the distribution of q(λ) for λ ∈ A_j is approximated by 𝒰(E_j.

Since P(A) = sup{P(K) : K ⊂ A, K compact} and P(A) =inf {P(U) : A⊂U, U open}, we can estimate P(A) using the inner and outer sums described by Algorithm 1.

Remark 3.4. If the set A has not (yet) been specified, we may still carry out the first part of Algorithm 1 to obtain a discretized approximation of the measure P on model space.

Remark 3.5. The set of cells ${b_{i}}_{i = 1}^{M^{'}}$ in Algorithm 1 is introduced purely for computational purposes and is not necessary to the approximation of P(A). We choose ${b_{i}}_{i = 1}^{M^{'}}$ in order to approximate P(A), for any event A ⊂ Λ, without carrying out the calculations in the nested loops of Algorithm 1 for each new event. If we are interested only in one event, A ⊂ Λ, then we might skip the step of partitioning Λ by ${b_{i}}_{i = 1}^{M^{'}}$ and replace the step in the nested loop by the following: Calculate ratio of volume of A ∩ A_j to volume of A_j, store in vector V_j. We may then approximate $\sum_{j = 1}^{N (M)} V_{j} P_{j}$ .

Remark 3.6. Note that as we refine the partition {E_j} on the data space, which in turn refines the partition {A_j} on model space, we should consider refining the mesh that defines the partition {b_i} on model space. The reason is that we assign a probability P(b_i) to each cell b_i that in essence reapproximates the simple function approximation,

σ_{Λ} (λ) \approx σ_{Λ, N} (λ) = \sum_{j = 1}^{N} \frac{P (A_{j})}{μ_{Λ} (A_{j})} 1_{A_{j}} (λ),

by the new simple function

σ_{Λ} (λ) \approx σ_{Λ, M^{'}} (λ) = \sum_{i = 1}^{M^{'}} \frac{P (b_{i})}{μ_{Λ} (b_{i})} 1_{b_{i}} (λ) .

If the partition {b_i} remains fixed as the approximation of ρ_𝒟(q) by simple functions is refined by the partition {E_j}, then the representation of σ_Λ(λ) as a simple function converges with respect to the fixed {b_i}. When choosing {b_i}, we should consider that a cell b_i might be large relative to the A_j that it intersects, i.e., b_i might intersect many A_j. When this is the case, estimating the probability over b_i by a constant P(b_i) might not be an appropriate approximation. In general, it is not computationally demanding to estimate an appropriate size of the b_i.

Observations on simple function approximations

The use of simple function approximations of a probability density is sufficiently unusual in the context of stochastic analysis of differential equations as to justify comment. Simple function approximations form the basis for classic measure theory because they yield several benefits, including

Simple function approximations are widely applicable under minimal assumptions on the density being approximated. As the examples below suggest, probability densities solving inverse problems appear to be highly complex in general.
The convergence analysis for simple function approximations is also widely applicable. This contrasts with sampling techniques such as Markov chain Monte Carlo methods whose convergence properties are stochastic and can be highly sensitive to properties of the problem.

Though we have not exploited the fact in this paper, simple function approximations also offer significant benefits for stochastic sensitivity analysis of differential equations [12, 13, 9, 10, 7]. In particular, combining a simple function approximation with sensitivity derivatives of a quantity of interest with respect to parameters provides both a natural dimension reduction mechanism and the basis for adaptive sampling.

Of course, a significant issue with simple function approximations is the nominal dependence of accuracy on the dimension of the parameter space. This may be a consequence of the common approach of using hyper-rectangular cell discretizations of the underlying space combined with the unfortunate growth in diagonal dimension of hyper-rectangles as dimension increases, though we report on some inconclusive results of using radial basis functions in [12]. In our experience, the effects of dimension are nominal up to dimensions of 8–10, and we have effectively used the piecewise constant approximations to dimensions of order 15–18. We note that this is effective dimension. By exploiting dimension reduction, the nominal dimension of the parameter space may be higher.

4. Examples

We apply the new method to solve inverse problems associated with a variety of maps. We first consider three constrained geometric optimization problems. We then discuss examples involving a nonlinear ordinary differential equation and a nonlinear elliptic partial differential equation with two parameters. Finally, we discuss the determination of regions with high probability.

In the following examples, we have chosen the uniform Lebesgue measure for the parameter volume measure and often impose a normal distribution on the output quantity of interest. The first choice is made because it is commonly the (implicit) default, e.g., in Bayesian inference. The imposition of a normal distribution on the output is also a common choice. In our examples, it serves the purpose of illustrating the complex nature of the inverse probability measure that results even when a normal distribution has been imposed on the output. However, we emphasize that neither of these choices are important in terms of implementing the numerical solution method, which is readily applied for any distributions.

4.1. A 2-dimensional nonlinear function

We consider the map determined implicitly as the solution of the finite-dimensional nonlinear system of equations given by

\begin{matrix} λ_{1} x_{1}^{2} + x_{2}^{2} = 1, \\ x_{1}^{2} - λ_{2} x_{2}^{2} = 1, \end{matrix}

where λ₁ and λ₂ are the parameters. Geometrically, solutions x = (x₁, x₂)^T to the system represent intersections of the hyperbola and ellipse. The quantity of interest is the second component of the solution in the first-quadrant, i.e., q(λ) = q(x(λ)) = x₂=〈x, ψ〉, where ψ = (0, 1)^T. According to (3.3), the adjoint problem is

(\begin{matrix} 2 μ_{1} y_{1} & 2 y_{1} \\ 2 y_{2} & - 2 μ_{2} y_{2} \end{matrix}) ϕ = ψ,

where μ = (μ₁, μ₂)^T and y = (y₁, y₂)^T are the reference parameter and reference solution for the forward problem.

In order to create an interesting example, we choose $Λ = [.79, .99] \times [1 - 4.5 \sqrt{0.1}, 1 + 4.5 \sqrt{0.1}]$ based on a sensitivity analysis of the forward problem in [23]. We use six-uniformly spaced mesh points in both the λ₁ and λ₂ directions of Λ to create cells ${B_{i}}_{i = 1}^{25}$ that partition Λ. We use the centroid of each cell as the reference parameter μ_i = (μ_{1, i}, μ_2,i)^T in that cell and solve the forward problem to obtain reference solutions y_i = (y_1,i, y_2,i)^T at these points, and then solve for the generalized Green’s vector φ_i = (φ_1,i, φ_2,i)^T at the reference point (μ_i, y_i). According to (3.5), we obtain a global piecewise-linear approximation q̂ to q defined as

\hat{q} (λ) : = \sum_{i = 1}^{25} (y_{2, i} + {(λ - μ_{i})}^{T} (\begin{matrix} y_{1, i}^{2} & 0 \\ 0 & - y_{2, i}^{2} \end{matrix}) ϕ_{i}) 1_{B i} (λ) .

We assume that the output data is a random variable with normal distribution on the data space defined by q̂(Λ) (Figure 4.1). We assume μ_Λ is the Lebesgue measure. We implement Algorithm 1 to calculate P(b_i) for small cells for each fine partition of Λ and determine the probabilities of events A ⊂ Λ. We plot the results in Figure 4.2.

Fig. 4.1 — Left: Uncertainty of output is modeled as a random variable with a normal distribution. Right: A plot of the map q : Λ → ℝ.

Fig. 4.2 — Illustration of an application of Algorithm 1. Left: We determine which contours are contained in an event A ⊂ Λ and how much of each contour is inside the event. Right: We estimate the probabilities of small cells contained in the event and use an inner and outer estimate to obtain an approximation of the probability of the event A.

4.1.1. A three-parameter geometric constrained optimization problem

The map to be inverted is determined by minimizing the distance to the point (1,−1, 1) among points constrained to lie on the surface g = 4, where

g (x_{1,} x_{2}, x_{3}; λ_{1}, λ_{2}, λ_{3}) = λ_{1} x_{1}^{2} + λ_{2} x_{2}^{2} + λ_{3} x_{3}^{2} .

Geometrically, the parameters determine the shape of the ellipsoid that defines the constraint. Using the method of Lagrange multipliers we set up a system of nonlinear equations with four state variables and three parameters. We take the quantity of interest as the first state variable, which geometrically is interpreted as the first spatial coordinate in the solution to the constrained minimization problem. We set Λ = [.35, .65] × [.28, .52] × [.42, .78] and construct a piecewise-linear approximation using 125 points in Λ. We assume a normal distribution on q(λ) and taking the underlying parameter volume measure μ_Λ to be a normalized Lebesgue measure. We use 3375 small cells {B_i} in Algorithm 1. We plot the probabilities at the midpoint of each cell with the color of the point determined by the probability of the small cell in Figures 4.3–4.4.

Fig. 4.3 — We use 15×15×15 small cells in Algorithm 1. We plot the approximate distribution from several angles. Left: A 3-dimensional view. Right: The same 3-dimensional view rotated 90 degrees clockwise.

Fig. 4.4 — We use 15×15×15 small cells in Algorithm 1. We plot the approximate distribution from several angles. Left: The original 3-dimensional view rotated 180 degrees clockwise. Right: The original 3-dimensional view rotated 270 degrees clockwise.

4.1.2. A four-parameter geometric constrained optimization problem

The map to be inverted is determined by minimizing the distance to the point (5, 5, 5) among points constrained to lie on the intersection of the surfaces g = 1 and h = 0, where

\begin{matrix} g (x_{1,} x_{2}, x_{3}; λ_{1}, λ_{2}) = λ_{1} x_{1}^{2} + λ_{2} x_{2}^{2} + x_{3}^{2}, \\ h (x_{1,} x_{2}, x_{3}; λ_{3}, λ_{4}) = λ_{1} x_{1} + λ_{2} x_{2} + x_{3} . \end{matrix}

Geometrically, g = 1 defines a hyperboloid of one sheet and h = 0 defines a plane through the origin, and the intersection of the two constraints is a closed curve. Using the method of Lagrange multipliers we set up a system of nonlinear equations with five state variables and four parameters. We take the quantity of interest as the first state variable, which geometrically is interpreted as the first spatial coordinate in the solution to the constrained minimization problem. We set Λ = [1.4, 2.6] × [.7, 1.3] × [1.4, 2.6] × [.35, .65] and construct a piecewise-linear approximation using 750 points in Λ. We assume a normal distribution on q(λ) and take μ_Λ to be a normalized Lebesgue measure. We use 60750 small cells {b_i} in Algorithm 1. Displaying a 4-dimensional distribution is problematic. We plot “snapshots” of the approximated probability density for three fixed λ₄ values in Figure 4.5.

Fig. 4.5 — We use 15 × 15 × 15 × 18 small cells in Algorithm 1. We plot “snapshots” of the approximate probability distribution for three values of the fourth parameter. Left: The fourth parameter is set at its minimum value. Middle: The fourth parameter is set at its midpoint value. Right: The fourth parameter is set at its maximum value. Notice how the probabilities vary in space as we vary the fourth parameter.

4.1.3. A two-parameter ordinary differential equation

We now study the nonlinear ordinary differential equation

{\begin{matrix} \dot{x} = λ_{1} \sin (λ_{2} x), & 0 < t \leq T, \\ x (0) = 1 . \end{matrix}

The linear functionals (quantities of interest, q(λ)) we study take the form

q (λ) = 〈 x (t), ψ (t) 〉 = \int_{0}^{T} (x (s; λ), ψ (s)) d s,

and we take the quantity of interest to be the average value of x(t) over the time interval [0, 2]. Thus, we set ψ(t) = 1_[0,2](t)/2, and the generalized Green’s function φ(t) solves the adjoint problem,

{\begin{matrix} - \dot{ϕ} (t) - A^{⊤} (t) ϕ (t) = ψ (t), & T > t \geq 0, \\ ϕ (T) = ψ (T), \end{matrix}

where A(t) := f′ (y(t; μ)) is the Jacobian of f = λ₁ sin(λ₂x) evaluated at y(t; μ), μ is a reference parameter, and y(t; μ) is the solution to (4.1.3) for this reference parameter. Compare this to (3.3). Using substitution, integration by parts, and Taylor’s theorem, we arrive at a linear approximation to q(λ) for parameters near μ, and analogous to the finite dimensional case, we obtain a global piecewise-linear approximation to q(λ) over Λ = [.8, 1.2] × [.1, π − .1] shown in Figure 4.6.

Fig. 4.6 — Left: The global piecewise-linear approximation to q(λ) obtained using Algorithm 1. The cells in Λ illustrate the coarse discretization of this space for the forward problem of obtaining a piecewise-linear approximation and the circles in each cell indicate the reference parameter used to linearize q(λ) in that cell. We assume a normal distribution for q(λ) and use a grid of 40 × 40 small cells.

Remark 4.1. There can be substantial error in the reference solutions and gradients used when applying the method to differential equations whose solutions must be approximated numerically, and we study the effect of these errors in the second paper [4].

4.1.4. A two-parameter elliptic partial differential equation

We now study a nonlinear elliptic partial differential equation

{\begin{matrix} - Δ u = λ_{1} {(u - λ_{2})}^{2}, & (x, y) \in Ω = [0, 1] \times [0, 1], \\ u = 0, & (x, y) \in \partial Ω . \end{matrix}

The quantities of interest, q(λ), take the form

q (λ) = 〈 u, ψ 〉 = \int_{Ω} u (x, y) ψ (x, y) d x d y,

and we take the quantity of interest to be the average value of u over Ω. Thus, we set ψ(x, y) = 1, and the generalized Green’s function φ(t) solves the adjoint problem,

{\begin{matrix} - Δ ϕ - A^{⊤} ϕ = ψ, & (x, y) \in Ω, \\ ϕ = 0, & (x, y) \in \partial Ω, \end{matrix}

where A := f′ (w(x, y; μ); μ) is the Jacobian of f = λ₁ exp(λ₂u) evaluated at w(x, y; μ), μ is a reference parameter, and w(x, y; μ) is the solution to (4.1.4) for this reference parameter. Using substitution, the weak form of (4.1.4), and Taylor’s theorem, we arrive at a linear approximation to q(λ) for parameters near μ, and just as with the previous examples, we obtain a global piecewise-linear approximation to q(λ) over Λ = [.95, 1.05]×[−.1, .1] using Algorithm 1. We show the results in Figure 4.7.

Fig. 4.7 — Left: Global piecewise-linear approximation to q(λ) obtained using Algorithm 1. We used a 11×13 grid of coarse cells to discretize Λ and used the midpoint of each cell as the reference parameter in that cell. We assume a normal distribution of q(λ) and we use a 33×39 grid of small cells.

4.2. Determining regions of high probability

The new method can be applied to find regions of high probability. Consider q(λ) = λ₁ + λ₂, where Λ = [0, 1] × [0, 1]. Figure 4.8 shows the generalized contours for 500 samples of q(λ) taken from a N(0, 2/25) distribution along with the TP and the intersections of contours on the TP. Where the contours intersect the TP most densely corresponds to a region of high probability in the space of contours.

Fig. 4.8 — Left: Generalized contours from 500 samples of q(λ) = λ₁ + λ₂ generated from a N(0, 2/25) distribution. Middle: The TP intersects each contour once and goes from the minimum of q(λ) in the lower left corner to the maximum of q(λ) in the upper-right corner of the plot. Right: Intersections of contours on the TP are marked with a star and can be used to index the inverses and determine a unique distribution of the contours on the TP using any consistent indexing scheme.

We can locate regions of high probability by sorting through the probability of the fine cells {b_i}. We can rank order these cells and determine any cells of high probability. We can also determine regions of neighboring cells that all have relatively high probability. We illustrate using the four-parameter geometric constrained optimization problem in section 4.1.2. In Table 1, we list the ten small cells with highest probability. If we let the events {b_i} become small, under a smoothness assumption, the probabilities of these events are related to the maximum-likelihood estimate.

Table 1.

We indicate the location of the ten cells with the highest probabilities for the example in section 4.1.2. The first column gives the probability and the second column gives the dimensions and location of the cells. There are clearly two distinct regions for events with relatively high probability. In general, one can use this information to determine where the largest regions of highest probability are located in a high-dimensional parameter space.

P(b_i) order 10⁻⁴	b_i location

0.600381927	[2.44, 2.52] × [1.22, 1.26] × [2.04, 2.12] × [0.4, 0.4167]
0.600446977	[2.36, 2.44] × [1.06, 1.1] × [1.96, 2.04] × [0.4333, 0.45]
0.600462420	[2.44, 2.52] × [1.18, 1.22] × [2.04, 2.12] × [0.4333, 0.45]
0.600465732	[2.36, 2.44] × [0.98, 1.02] × [2.04, 2.12] × [0.4167, 0.4333]
0.600470136	[2.36, 2.44] × [1.06, 1.1] × [1.96, 2.04] × [0.4167, 0.4333]
0.600474821	[2.36, 2.44] × [1.26, 1.3] × [1.96, 2.04] × [0.4167, 0.4333]
0.600501752	[2.36, 2.44] × [0.98, 1.02] × [2.04, 2.12] × [0.4333, 0.45]
0.600463048	[1.4, 1.48] × [1.18, 1.22] × [1.64, 1.72] × [0.3833, 0.4]
0.600464252	[1.4, 1.48] × [1.18, 1.22] × [1.64, 1.72] × [0.35, 0.3667]
0.600468545	[1.4, 1.48] × [1.18, 1.22] × [1.64, 1.72] × [0.3667, 0.3833]

Open in a new tab

5. Conclusion

We consider the probabilistic inverse sensitivity analysis problem: Given a specified uncertainty in the output of a map, determine variations in the parameters that produce the observed uncertainty. We formulate this inverse problem using the law of total probability. We describe and analyze a method for computing the approximate probability density that solves the inverse problem and does not require random sampling. Our approach breaks the solution down into two stages:

Construct an approximate representation of the set-valued inverse solution of the ill-posed deterministic inverse problem.
Approximate the density on the parameter space that corresponds to the set-valued inverse and the observed output density using a simple function representation.

We illustrate the method and several features using a variety of examples.

In [4] we present numerical analysis of discretization error, e.g., in evaluating the model by numerical solution and in finite sampling. In [5], we discuss the problem of dealing with multiple quantities of interest, which has application to data assimilation and “cascaded” uncertainty in operator decomposition solution of multiphysics problems.

Acknowledgments

The first author’s work was supported in part by the National Aeronautics and Space Administration, Earth Sciences Division (#NNX08AK08G) the National Science Foundation (SES-0922142), and the Joint NSF/NIGMS Initiative to Support Research in the Area of Mathematical Biology (#R01GM096192). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institute Of General Medical Sciences or the National Institutes of Health. The last author’s work was supported in part by the Defense Threat Reduction Agency (HDTRA1-09-1-0036), Department of Energy (DE-FG02-04ER25620, DE-FG02-05ER25699, DE-FC02-07ER54909, DE-SC0001724), Lawrence Livermore National Laboratory (B573139, B584647), the National Aeronautics and Space Administration (NNG04GH63G), the National Science Foundation (DMS-0107832, DMS-0715135, DGE-0221595003, MSPA-CSE-0434354, ECCS-0700559), Idaho National Laboratory (00069249), NSF/NIGMS (#R01GM096192), and the Sandia Corporation (PO299784).

This author’s work was supported in part by the Department of Energy (DE-FG02-05ER25699) and the National Science Foundation (DGE-0221595003, MSPACSE-0434354).

Contributor Information

J. Breidt, Email: jbreidt@stat.colostate.edu.

T. Butler, Email: tbutler@ices.utexas.edu.

D. Estep, Email: estep@math.colostate.edu.

REFERENCES

1.Bernardo JM. Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B. 1979;41:113–147. [Google Scholar]
2.Billingsley P. Probability and Measure. 3rd ed. New York: John Wiley & Sons; 1995. [Google Scholar]
3.Butler T. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2009. Computational Measure Theoretic Approach to Inverse Sensitivity Analysis: Methods and Analysis. [Google Scholar]
4.Butler T, Estep D. A measure-theoretic computational method for inverse sensitivity problems II: A posterior error analysis. SIAM J. Numer. Anal. doi: 10.1137/100785946. submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]
5.Butler T, Estep D. A measure-theoretic computational method for inverse sensitivity problems III: Multiple output quantities of interest. in preparation, 2010. [Google Scholar]
6.Cacuci D. Sensitivity and Uncertainty Analysis: Theory. Vol. I. Boca Raton, FL: Chapman & Hall/CRC; 1997. [Google Scholar]
7.Estep D, Holst MJ, Målqvist A. Nonparametric density estimation for randomly perturbed elliptic problems III: Convergence, complexity, and generalizations. J. Appl. Math. Comput. to appear. [Google Scholar]
8.Estep D, Larson MG, Williams RD. Estimating the error of numerical solutions of systems of reaction-diffusion equations. Mem. Amer. Math. Soc. 2000;146 pp. viii+109. [Google Scholar]
9.Estep D, Målqvist A, Tavener S. Nonparametric density estimation for randomly perturbed elliptic problems. I: Computational methods, a posteriori analysis, and adaptive error control. SIAM J. Sci. Comput. 2009;31:2935–2959. [Google Scholar]
10.Estep D, Målqvist A, Tavener S. Nonparametric density estimation for randomly perturbed elliptic problems. II: Applications and adaptive modeling. Internat. J. Numer. Methods Engrg. 2009;80:846–867. [Google Scholar]
11.Estep D, Mckeown B, Neckels D, Sandelin J. 2006. GAASP: Globally Accurate Adaptive Sensitivity Package. write to estep@math.colostate.edu for information. [Google Scholar]
12.Estep D, Neckels D. Fast and reliable methods for determining the evolution of uncertain parameters in differential equations. J. Comput. Phys. 2006;213:530–556. [Google Scholar]
13.Estep D, Neckels D. Fast methods for determining the evolution of uncertain parameters in reaction-diffusion equations. Comput. Methods Appl. Mech. Engrg. 2007;196:3967–3979. [Google Scholar]
14.Huelsenbeck JP, Larget B, Miller RE, Ronquist F. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 2002;51:673–688. doi: 10.1080/10635150290102366. [DOI] [PubMed] [Google Scholar]
15.Folland G. Real Analysis. 2nd ed. New York: John Wiley & Sons, Modern Techniques and their Applications; 1999. [Google Scholar]
16.Gentle JE. Random Number Generation and Monte Carlo Methods. 2nd ed. New York: Springer; 2003. [Google Scholar]
17.Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Boca Raton, FL: CRC Press; 1995. [Google Scholar]
18.Kaipio J, Somersalo E. Statistical and Computational Inverse Problems. New York: Springer-Verlag; 2005. [Google Scholar]
19.Knill DC, Richards W. Perception as Bayesian Inference. Cambridge, UK: Cambridge University Press; 1996. [Google Scholar]
20.Lanczos C. Linear Differential Operators. Mineola, NY: Dover Publications; 1997. [Google Scholar]
21.Marchuk GI. Adjoint Equations and Analysis of Complex Systems. Dordrecht: Kluwer Academic Publishers; 1995. [Google Scholar]
22.Marchuk GI, Agoshkov VI, Shutyaev VP. Adjoint Equations and Perturbation Algorithms in Nonlinear Problems. Boca Raton, FL: CRC Press; 1996. [Google Scholar]
23.Neckels D. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2005. Variational Methods for Uncertainty Quantification. [Google Scholar]
24.Robert CP, Casella G. Monte Carlo Statistical Methods. New York: Springer-Verlag; 2004. [Google Scholar]
25.Sandelin J. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2006. Global Estimate and Control of Model, Numerical, and Parameter Error. [Google Scholar]
26.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM; 2005. [Google Scholar]
27.Zabaras N, Ganapathysubramanian B. A scalable framework for the solution of stochastic inverse problems using a sparse grid collocation approach. J. Comput. Phys. 2008;227:4697–4735. [Google Scholar]

[R1] 1.Bernardo JM. Reference posterior distributions for Bayesian inference. J. Roy. Statist. Soc. Ser. B. 1979;41:113–147. [Google Scholar]

[R2] 2.Billingsley P. Probability and Measure. 3rd ed. New York: John Wiley & Sons; 1995. [Google Scholar]

[R3] 3.Butler T. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2009. Computational Measure Theoretic Approach to Inverse Sensitivity Analysis: Methods and Analysis. [Google Scholar]

[R4] 4.Butler T, Estep D. A measure-theoretic computational method for inverse sensitivity problems II: A posterior error analysis. SIAM J. Numer. Anal. doi: 10.1137/100785946. submitted. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R5] 5.Butler T, Estep D. A measure-theoretic computational method for inverse sensitivity problems III: Multiple output quantities of interest. in preparation, 2010. [Google Scholar]

[R6] 6.Cacuci D. Sensitivity and Uncertainty Analysis: Theory. Vol. I. Boca Raton, FL: Chapman & Hall/CRC; 1997. [Google Scholar]

[R7] 7.Estep D, Holst MJ, Målqvist A. Nonparametric density estimation for randomly perturbed elliptic problems III: Convergence, complexity, and generalizations. J. Appl. Math. Comput. to appear. [Google Scholar]

[R8] 8.Estep D, Larson MG, Williams RD. Estimating the error of numerical solutions of systems of reaction-diffusion equations. Mem. Amer. Math. Soc. 2000;146 pp. viii+109. [Google Scholar]

[R9] 9.Estep D, Målqvist A, Tavener S. Nonparametric density estimation for randomly perturbed elliptic problems. I: Computational methods, a posteriori analysis, and adaptive error control. SIAM J. Sci. Comput. 2009;31:2935–2959. [Google Scholar]

[R10] 10.Estep D, Målqvist A, Tavener S. Nonparametric density estimation for randomly perturbed elliptic problems. II: Applications and adaptive modeling. Internat. J. Numer. Methods Engrg. 2009;80:846–867. [Google Scholar]

[R11] 11.Estep D, Mckeown B, Neckels D, Sandelin J. 2006. GAASP: Globally Accurate Adaptive Sensitivity Package. write to estep@math.colostate.edu for information. [Google Scholar]

[R12] 12.Estep D, Neckels D. Fast and reliable methods for determining the evolution of uncertain parameters in differential equations. J. Comput. Phys. 2006;213:530–556. [Google Scholar]

[R13] 13.Estep D, Neckels D. Fast methods for determining the evolution of uncertain parameters in reaction-diffusion equations. Comput. Methods Appl. Mech. Engrg. 2007;196:3967–3979. [Google Scholar]

[R14] 14.Huelsenbeck JP, Larget B, Miller RE, Ronquist F. Potential applications and pitfalls of Bayesian inference of phylogeny. Syst. Biol. 2002;51:673–688. doi: 10.1080/10635150290102366. [DOI] [PubMed] [Google Scholar]

[R15] 15.Folland G. Real Analysis. 2nd ed. New York: John Wiley & Sons, Modern Techniques and their Applications; 1999. [Google Scholar]

[R16] 16.Gentle JE. Random Number Generation and Monte Carlo Methods. 2nd ed. New York: Springer; 2003. [Google Scholar]

[R17] 17.Gilks WR, Richardson S, Spiegelhalter DJ. Markov Chain Monte Carlo in Practice. Boca Raton, FL: CRC Press; 1995. [Google Scholar]

[R18] 18.Kaipio J, Somersalo E. Statistical and Computational Inverse Problems. New York: Springer-Verlag; 2005. [Google Scholar]

[R19] 19.Knill DC, Richards W. Perception as Bayesian Inference. Cambridge, UK: Cambridge University Press; 1996. [Google Scholar]

[R20] 20.Lanczos C. Linear Differential Operators. Mineola, NY: Dover Publications; 1997. [Google Scholar]

[R21] 21.Marchuk GI. Adjoint Equations and Analysis of Complex Systems. Dordrecht: Kluwer Academic Publishers; 1995. [Google Scholar]

[R22] 22.Marchuk GI, Agoshkov VI, Shutyaev VP. Adjoint Equations and Perturbation Algorithms in Nonlinear Problems. Boca Raton, FL: CRC Press; 1996. [Google Scholar]

[R23] 23.Neckels D. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2005. Variational Methods for Uncertainty Quantification. [Google Scholar]

[R24] 24.Robert CP, Casella G. Monte Carlo Statistical Methods. New York: Springer-Verlag; 2004. [Google Scholar]

[R25] 25.Sandelin J. Ph.D. thesis. Fort Collins, CO: Department of Mathematics, Colorado State University; 2006. Global Estimate and Control of Model, Numerical, and Parameter Error. [Google Scholar]

[R26] 26.Tarantola A. Inverse Problem Theory and Methods for Model Parameter Estimation. Philadelphia: SIAM; 2005. [Google Scholar]

[R27] 27.Zabaras N, Ganapathysubramanian B. A scalable framework for the solution of stochastic inverse problems using a sparse grid collocation approach. J. Comput. Phys. 2008;227:4697–4735. [Google Scholar]

PERMALINK

A MEASURE-THEORETIC COMPUTATIONAL METHOD FOR INVERSE SENSITIVITY PROBLEMS I: METHOD AND ANALYSIS

J Breidt

T Butler

D Estep

Abstract

1. Introduction

2. Formulation of the probabilistic inverse problem

Fig. 2.1.

Fig. 2.3.

Fig. 2.2.

Comparison to a Bayesian inverse problem

3. Solving the inverse problem

Fig. 3.1.

Fig. 3.2.

3.1. Determining the inverse of the deterministic model using generalized contours

3.1.1. Approximating the set of generalized contours

Local linearization of the linear functional

Global linearization of the linear functional

Examples

Fig. 3.3.

Fig. 3.4.

3.2. Computing the parameter probability density

3.2.1. Computational measure theory

Algorithm 1.

Observations on simple function approximations

4. Examples

4.1. A 2-dimensional nonlinear function

Fig. 4.1.

Fig. 4.2.

4.1.1. A three-parameter geometric constrained optimization problem

Fig. 4.3.

Fig. 4.4.

4.1.2. A four-parameter geometric constrained optimization problem

Fig. 4.5.

4.1.3. A two-parameter ordinary differential equation

Fig. 4.6.

4.1.4. A two-parameter elliptic partial differential equation

Fig. 4.7.

4.2. Determining regions of high probability

Fig. 4.8.

Table 1.

5. Conclusion

Acknowledgments

Contributor Information

REFERENCES

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases