In this example we tackle the problem of quantifying the uncertainty in the solution of an inverse problem governed by an elliptic PDE via the Bayesian inference framework. Hence, we state the inverse problem as a problem of statistical inference over the space of uncertain parameters, which are to be inferred from data and a physical model. The resulting solution to the statistical inverse problem is a posterior distribution that assigns to any candidate set of parameter fields our belief (expressed as a probability) that a member of this candidate set is the ``true'' parameter field that gave rise to the observed data.
For simplicity, in what follows we give finite-dimensional expressions (i.e., after discretization of the parameter space) for the Bayesian formulation of the inverse problem.
The posterior probability distribution combines the prior pdf $\pi_{\text{prior}}(\m)$ over the parameter space, which encodes any knowledge or assumptions about the parameter space that we may wish to impose before the data are considered, with a likelihood pdf $\pi_{\text{like}}(\data \; | \; \m)$, which explicitly represents the probability that a given set of parameters $\m$ might give rise to the observed data $\data \in \mathbb{R}^m$, namely:
$$ \begin{align} \pi_{\text{post}}(\m | \data) \propto \pi_{\text{prior}}(\m) \pi_{\text{like}}(\data | \m). \end{align} $$Note that infinite-dimensional analog of Bayes' formula requires the use Radon-Nikodym derivatives instead of probability density functions.
We consider a Gaussian prior with mean ${\vec m}_{\text{prior}}$ and covariance $\prcov$. The covariance is given by the discretization of the inverse of differential operator $\mathcal{A}^{-2} = (-\gamma \Delta + \delta I)^{-2}$, where $\gamma$, $\delta > 0$ control the correlation length and the variance of the prior operator. This choice of prior ensures that it is a trace-class operator, guaranteeing bounded pointwise variance and a well-posed infinite-dimensional Bayesian inverse problem.
Here ${\bf f}$ is the parameter-to-observable map that takes a parameter vector $\m$ and maps it to the space observation vector $\data$.
The mean of this posterior distribution, ${\vec \map}$, is the parameter vector maximizing the posterior, and is known as the maximum a posteriori (MAP) point. It can be found by minimizing the negative log of the posterior, which amounts to solving a deterministic inverse problem with appropriately weighted norms,
$$ \map := \underset{\m}{\arg \min} \; \mathcal{J}(\m) \;:=\; \Big( \frac{1}{2} \| {\bf f}(\m) - \data \|^2_{ {\bf \Gamma}_{\text{noise}}^{-1}} +\frac{1}{2} \| \m -\m_{\text prior} \|^2_{\prcov^{-1}} \Big). $$The posterior covariance matrix is then given by the inverse of the Hessian matrix of $\mathcal{J}$ at $\map$, namely
$$ \postcov = \left(\Hmisfit(\map) + \prcov^{-1} \right)^{-1} $$where ${\matrix \Lambda} = \diag(\lambda_i) \in \mathbb{R}^{n\times n}$ contains the generalized eigenvalues and the columns of ${\matrix V}\in \mathbb R^{n\times n}$ the generalized eigenvectors such that ${\matrix V}^T \prcov^{-1} {\matrix V} = {\matrix I}$.
When the generalized eigenvalues $\{\lambda_i\}$ decay rapidly, we can extract a low-rank approximation of $\Hmisfit$ by retaining only the $r$ largest eigenvalues and corresponding eigenvectors,
$$ \Hmisfit = \prcov^{-1} \Vr {\matrix{\Lambda}}_r \Vr^T \prcov^{-1}, $$Here, $\Vr \in \mathbb{R}^{n\times r}$ contains only the $r$ generalized eigenvectors of $\Hmisfit$ that correspond to the $r$ largest eigenvalues, which are assembled into the diagonal matrix ${\matrix{\Lambda}}_r = \diag (\lambda_i) \in \mathbb{R}^{r \times r}$.
Using the Sherman–Morrison–Woodbury formula, we write
$$ \begin{align} \notag \postcov = \left(\Hmisfit+ \prcov^{-1}\right)^{-1} = \prcov-\Vr {\matrix{D}}_r \Vr^T + \mathcal{O}\left(\sum_{i=r+1}^{n} \frac{\lambda_i}{\lambda_i + 1}\right), \end{align} $$where ${\matrix{D}}_r :=\diag(\lambda_i/(\lambda_i+1)) \in \mathbb{R}^{r\times r}$. The last term in this expression captures the error due to truncation in terms of the discarded eigenvalues; this provides a criterion for truncating the spectrum, namely that $r$ is chosen such that $\lambda_r$ is small relative to 1.
Therefore we can approximate the posterior covariance as
$$ \postcov \approx \prcov - \Vr {\matrix{D}}_r \Vr^T $$Let ${\bf x}$ be a sample for the prior distribution, i.e. ${\bf x} \sim \mathcal{N}({\bf 0}, \prcov)$, then, using the low rank approximation of the posterior covariance, we compute a sample ${\bf v} \sim \mathcal{N}({\bf 0}, \H^{-1})$ as
$$ {\bf v} = \big\{ \Vr \big[ ({\matrix{\Lambda}}_r + \Ir)^{-1/2} - \Ir \big] \Vr^T\prcov^{-1} + {\bf I} \big\} {\bf x} $$By the end of this notebook, you should be able to:
import dolfin as dl
import ufl
import math
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import sys
import os
sys.path.append( os.environ.get('HIPPYLIB_BASE_DIR', "../") )
from hippylib import *
import logging
logging.getLogger('FFC').setLevel(logging.WARNING)
logging.getLogger('UFL').setLevel(logging.WARNING)
dl.set_log_active(False)
np.random.seed(seed=1)
This function generates a random field with a prescribed anysotropic covariance function.
def true_model(prior):
noise = dl.Vector()
prior.init_vector(noise,"noise")
parRandom.normal(1., noise)
mtrue = dl.Vector()
prior.init_vector(mtrue, 0)
prior.sample(noise,mtrue)
return mtrue
We compute a two dimensional mesh of a unit square with nx by ny elements. We define a P2 finite element space for the state and adjoint variable and P1 for the parameter.
ndim = 2
nx = 64
ny = 64
mesh = dl.UnitSquareMesh(nx, ny)
Vh2 = dl.FunctionSpace(mesh, 'Lagrange', 2)
Vh1 = dl.FunctionSpace(mesh, 'Lagrange', 1)
Vh = [Vh2, Vh1, Vh2]
print( "Number of dofs: STATE={0}, PARAMETER={1}, ADJOINT={2}".format(
Vh[STATE].dim(), Vh[PARAMETER].dim(), Vh[ADJOINT].dim()) )
To set up the forward problem we use the PDEVariationalProblem
class, which requires the following inputs
Vh
pde_varf
bc
for the forward problem and bc0
for the adjoint and incremental problems.The PDEVariationalProblem
class offer the following functionality:
def u_boundary(x, on_boundary):
return on_boundary and ( x[1] < dl.DOLFIN_EPS or x[1] > 1.0 - dl.DOLFIN_EPS)
u_bdr = dl.Expression("x[1]", degree=1)
u_bdr0 = dl.Constant(0.0)
bc = dl.DirichletBC(Vh[STATE], u_bdr, u_boundary)
bc0 = dl.DirichletBC(Vh[STATE], u_bdr0, u_boundary)
f = dl.Constant(0.0)
def pde_varf(u,m,p):
return ufl.exp(m)*ufl.inner(ufl.grad(u), ufl.grad(p))*ufl.dx - f*p*ufl.dx
pde = PDEVariationalProblem(Vh, pde_varf, bc, bc0, is_fwd_linear=True)
To obtain the synthetic true paramter $m_{\rm true}$ we generate a realization from the prior distribution. Here we assume a Gaussian prior with zero average and covariance matrix $\mathcal{C} = \mathcal{A}^{-2}$. The action of $\mathcal{A}$ on a field $m$ is given by
$$ \mathcal{A}m = \left\{ \begin{array}{rl} \gamma \nabla \cdot \left( \Theta\nabla m\right)+ \delta m & \text{in } \Omega\\ \left( \Theta\, \nabla m\right) \cdot \boldsymbol{n} + \beta m & \text{on } \partial\Omega, \end{array} \right. $$where $\beta \propto \sqrt{\gamma\delta}$ is chosen to minimize boundary artifacts. Here $\Theta$ is an s.p.d. anisotropic tensor of the form
$$ \Theta = \begin{bmatrix} \theta_1 \sin(\alpha)^2 & (\theta_1-\theta_2) \sin(\alpha) \cos{\alpha} \\ (\theta_1-\theta_2) \sin(\alpha) \cos{\alpha} & \theta_2 \cos(\alpha)^2 \end{bmatrix}. $$gamma = .1
delta = .5
theta0 = 2.
theta1 = .5
alpha = math.pi/4
anis_diff = dl.CompiledExpression(ExpressionModule.AnisTensor2D(), degree = 1)
anis_diff.set(theta0, theta1, alpha)
prior = BiLaplacianPrior(Vh[PARAMETER], gamma, delta, anis_diff, robin_bc=True)
mtrue = true_model(prior)
print("Prior regularization: (delta_x - gamma*Laplacian)^order: delta={0}, gamma={1}, order={2}".format(delta, gamma,2))
objs = [dl.Function(Vh[PARAMETER],mtrue), dl.Function(Vh[PARAMETER],prior.mean)]
mytitles = ["True Parameter", "Prior mean"]
nb.multi1_plot(objs, mytitles)
plt.show()
To setup the observation operator $\mathcal{B}: \mathcal{V} \mapsto \mathbb{R}^{n_t}$, we generate $n_t$ (ntargets
in the code below) random locations where to evaluate the value of the state.
Under the assumption of Gaussian additive noise, the likelihood function $\pi_{\rm like}$ has the form
$$\pi_{\rm like}( \data \,| \, m ) \propto \exp\left( -\frac{1}{2}\|\mathcal{B}\,u(m) - \data \|^2_{\Gamma_{\rm noise}^{-1}}\right), $$where $u(m)$ denotes the solution of the forward model at a given parameter $m$.
The class PointwiseStateObservation
implements the evaluation of the log-likelihood function and of its partial derivatives w.r.t. the state $u$ and parameter $m$.
To generate the synthetic observation, we first solve the forward problem using the true parameter $m_{\rm true}$. Synthetic observations are obtained by perturbing the state variable at the observation points with a random Gaussian noise.
rel_noise
is the signal to noise ratio.
ntargets = 50
rel_noise = 0.01
#Targets only on the bottom
targets_x = np.random.uniform(0.1,0.9, [ntargets] )
targets_y = np.random.uniform(0.1,0.5, [ntargets] )
targets = np.zeros([ntargets, ndim])
targets[:,0] = targets_x
targets[:,1] = targets_y
#targets everywhere
#targets = np.random.uniform(0.1,0.9, [ntargets, ndim] )
print( "Number of observation points: {0}".format(ntargets) )
misfit = PointwiseStateObservation(Vh[STATE], targets)
utrue = pde.generate_state()
x = [utrue, mtrue, None]
pde.solveFwd(x[STATE], x)
misfit.B.mult(x[STATE], misfit.d)
MAX = misfit.d.norm("linf")
noise_std_dev = rel_noise * MAX
parRandom.normal_perturb(noise_std_dev, misfit.d)
misfit.noise_variance = noise_std_dev*noise_std_dev
vmax = max( utrue.max(), misfit.d.max() )
vmin = min( utrue.min(), misfit.d.min() )
plt.figure(figsize=(15,5))
nb.plot(dl.Function(Vh[STATE], utrue), mytitle="True State", subplot_loc=121, vmin=vmin, vmax=vmax)
nb.plot_pts(targets, misfit.d, mytitle="Observations", subplot_loc=122, vmin=vmin, vmax=vmax)
plt.show()
The model is defined by three component:
PDEVariationalProblem
pde
which provides methods for the solution of the forward problem, adjoint problem, and incremental forward and adjoint problems.Prior
prior
which provides methods to apply the regularization (precision) operator to a vector or to apply the prior covariance operator (i.e. to solve linear system with the regularization operator)Misfit
misfit
which provides methods to compute the cost functional and its partial derivatives with respect to the state and parameter variables.To test gradient and the Hessian of the model we use forward finite differences.
model = Model(pde, prior, misfit)
m0 = dl.interpolate(dl.Expression("sin(x[0])", degree=5), Vh[PARAMETER])
_ = modelVerify(model, m0.vector())
We used the globalized Newtown-CG method to compute the MAP point.
m = prior.mean.copy()
solver = ReducedSpaceNewtonCG(model)
solver.parameters["rel_tolerance"] = 1e-6
solver.parameters["abs_tolerance"] = 1e-12
solver.parameters["max_iter"] = 25
solver.parameters["GN_iter"] = 5
solver.parameters["globalization"] = "LS"
solver.parameters["LS"]["c_armijo"] = 1e-4
x = solver.solve([None, m, None])
if solver.converged:
print( "\nConverged in ", solver.it, " iterations.")
else:
print( "\nNot Converged")
print( "Termination reason: ", solver.termination_reasons[solver.reason] )
print( "Final gradient norm: ", solver.final_grad_norm )
print( "Final cost: ", solver.final_cost )
plt.figure(figsize=(15,5))
nb.plot(dl.Function(Vh[STATE], x[STATE]), subplot_loc=121,mytitle="State")
nb.plot(dl.Function(Vh[PARAMETER], x[PARAMETER]), subplot_loc=122,mytitle="Parameter")
plt.show()
We used the double pass algorithm to compute a low-rank decomposition of the Hessian Misfit. In particular, we solve
$$ \Hmisfit {\bf v}_i = \lambda_i \prcov^{-1} {\bf v}_i. $$The figure shows the largest k generalized eigenvectors of the Hessian misfit. The effective rank of the Hessian misfit is the number of eigenvalues above the red line ($y=1$). The effective rank is independent of the mesh size.
model.setPointForHessianEvaluations(x, gauss_newton_approx=False)
Hmisfit = ReducedHessian(model, misfit_only=True)
k = 50
p = 20
print( "Single/Double Pass Algorithm. Requested eigenvectors: {0}; Oversampling {1}.".format(k,p) )
Omega = MultiVector(x[PARAMETER], k+p)
parRandom.normal(1., Omega)
lmbda, V = doublePassG(Hmisfit, prior.R, prior.Rsolver, Omega, k)
posterior = GaussianLRPosterior(prior, lmbda, V)
posterior.mean = x[PARAMETER]
plt.plot(range(0,k), lmbda, 'b*', range(0,k+1), np.ones(k+1), '-r')
plt.yscale('log')
plt.xlabel('number')
plt.ylabel('eigenvalue')
nb.plot_eigenvectors(Vh[PARAMETER], V, mytitle="Eigenvector", which=[0,1,2,5,10,15])
compute_trace = True
if compute_trace:
post_tr, prior_tr, corr_tr = posterior.trace(method="Randomized", r=200)
print( "Posterior trace {0:5e}; Prior trace {1:5e}; Correction trace {2:5e}".format(post_tr, prior_tr, corr_tr) )
post_pw_variance, pr_pw_variance, corr_pw_variance = posterior.pointwise_variance(method="Randomized", r=200)
objs = [dl.Function(Vh[PARAMETER], pr_pw_variance),
dl.Function(Vh[PARAMETER], post_pw_variance)]
mytitles = ["Prior variance", "Posterior variance"]
nb.multi1_plot(objs, mytitles, logscale=False)
plt.show()
nsamples = 5
noise = dl.Vector()
posterior.init_vector(noise,"noise")
s_prior = dl.Function(Vh[PARAMETER], name="sample_prior")
s_post = dl.Function(Vh[PARAMETER], name="sample_post")
pr_max = 2.5*math.sqrt( pr_pw_variance.max() ) + prior.mean.max()
pr_min = -2.5*math.sqrt( pr_pw_variance.max() ) + prior.mean.min()
ps_max = 2.5*math.sqrt( post_pw_variance.max() ) + posterior.mean.max()
ps_min = -2.5*math.sqrt( post_pw_variance.max() ) + posterior.mean.min()
for i in range(nsamples):
parRandom.normal(1., noise)
posterior.sample(noise, s_prior.vector(), s_post.vector())
plt.figure(figsize=(15,5))
nb.plot(s_prior, subplot_loc=121,mytitle="Prior sample", vmin=pr_min, vmax=pr_max)
nb.plot(s_post, subplot_loc=122,mytitle="Posterior sample", vmin=ps_min, vmax=ps_max)
plt.show()
Copyright (c) 2016-2018, The University of Texas at Austin & University of California, Merced.
Copyright (c) 2019-2020, The University of Texas at Austin, University of California--Merced, Washington University in St. Louis.
Copyright (c) 2021-2023, The University of Texas at Austin & University of California, Merced.
All Rights reserved.
See file COPYRIGHT for details.
This file is part of the hIPPYlib library. For more information and source code availability see https://hippylib.github.io.
hIPPYlib is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License (as published by the Free Software Foundation) version 2.0 dated June 1991.