Coefficient field inversion in an elliptic partial differential equation¶

We consider the estimation of a coefficient in an elliptic partial differential equation as a first model problem. Depending on the interpretation of the unknowns and the type of measurements, this model problem arises, for instance, in inversion for groundwater flow or heat conductivity. It can also be interpreted as finding a membrane with a certain spatially varying stiffness.

Let $\Omega\subset\mathbb{R}^n$, $n\in\{1,2,3\}$ be an open, bounded domain and consider the following problem:

$$ \min_{m} J(m):=\frac{1}{2}\int_{\Omega} (u-d)^2\, dx + \frac{\gamma}{2}\int_\Omega|\nabla m|^2\,dx, $$

where $u$ is the solution of

$$ \begin{split} \quad -\nabla\cdot(e^m \nabla u) &= f \text{ in }\Omega,\\ u &= 0 \text{ on }\partial\Omega. \end{split} $$

Here $m \in \mathcal{M}:=\{m\in L^{\infty}(\Omega) \bigcap H^1(\Omega)\}$ denotes the unknown coefficient field, $u \in \mathcal{V}:= H^1_0(\Omega)$ the state variable, $u_d$ the (possibly noisy) data, $f\in H^{-1}(\Omega)$ a given volume force, and $\gamma\ge 0$ the regularization parameter.

The variational (or weak) form of the forward problem:¶

Find $u\in \mathcal{V}$ such that

$$ \int_{\Omega}e^m \nabla u \cdot \nabla \tilde{p} \, dx - \int_{\Omega} f \tilde{p} \,dx = 0, \text{ for all } \tilde{p}\in \mathcal{V}.$$

Gradient evaluation:¶

The Lagrangian functional $\mathscr{L}:\mathcal{V}\times\mathcal{M}\times\mathcal{V}\rightarrow \mathbb{R}$ is given by

$$ \mathscr{L}(u,m,p):= \frac{1}{2}\int_{\Omega}(u-u_d)^2 dx + \frac{\gamma}{2}\int_\Omega \nabla m \cdot \nabla m dx + \int_{\Omega} e^m\nabla u \cdot \nabla p dx - \int_{\Omega} f\,p\, dx. $$

Then the gradient of the cost functional $\mathcal{J}(m)$ with respect to the parameter $m$ in an arbitrary direction $\tilde m$ is

$$ (\mathcal{G}(m), \tilde m) := \mathscr{L}_m(u,m,p)(\tilde{m}) = \gamma \int_\Omega \nabla m \cdot \nabla \tilde{m}\, dx + \int_\Omega \tilde{m}e^m\nabla u \cdot \nabla p\, dx \quad \forall \tilde{m} \in \mathcal{M}, $$

where $u \in \mathcal{V}$ is the solution of the forward problem,

$$ (\mathscr{L}_p(u,m,p), \tilde{p}) := \int_{\Omega}e^m\nabla u \cdot \nabla \tilde{p}\, dx - \int_{\Omega} f\,\tilde{p}\, dx = 0 \quad \forall \tilde{p} \in \mathcal{V}, $$

and $p \in \mathcal{V}$ is the solution of the adjoint problem,

$$ (\mathscr{L}_u(u,m,p), \tilde{u}) := \int_{\Omega} e^m\nabla p \cdot \nabla \tilde{u}\, dx + \int_{\Omega} (u-d)\tilde{u}\,dx = 0 \quad \forall \tilde{u} \in \mathcal{V}.$$

Steepest descent method.¶

Written in abstract form, the steepest descent methods computes an update direction $\hat{m}_k$ in the direction of the negative gradient defined as

$$ \int_\Omega \hat{m}_k \tilde{m}\, dx = -\left(\mathcal{G}(m_k), \tilde m\right) \quad \forall \tilde{m} \in \mathcal{M}, $$

where the evaluation of the gradient $\mathcal{G}(m_k)$ involve the solution $u_k$ and $p_k$ of the forward and adjoint problem (respectively) for $m = m_k$.

Then we set $m_{k+1} = m_k + \alpha \hat{m}_k$, where the step length $\alpha$ is chosen to guarantee sufficient descent.

Goals:¶

By the end of this notebook, you should be able to:

solve the forward and adjoint Poisson equations
understand the inverse method framework
visualise and understand the results
modify the problem and code

Mathematical tools used:¶

Finite element method
Derivation of gradient via the adjoint method
Armijo line search

Import dependencies¶

import matplotlib.pyplot as plt
%matplotlib inline

import dolfin as dl
import ufl
from hippylib import nb

import numpy as np
import logging


logging.getLogger('FFC').setLevel(logging.WARNING)
logging.getLogger('UFL').setLevel(logging.WARNING)
dl.set_log_active(False)

np.random.seed(seed=1)

Model set up:¶

As in the introduction, the first thing we need to do is to set up the numerical model.

In this cell, we set the mesh mesh, the finite element spaces Vm and Vu corresponding to the parameter space and state/adjoint space, respectively. In particular, we use linear finite elements for the parameter space, and quadratic elements for the state/adjoint space.

The true parameter mtrue is the finite element interpolant of the function

$$ m_{\rm true} = \left\{ \begin{array}{l} \ln 4 \; \forall \,(x,y) \, {\rm s.t.}\, \sqrt{ (x-.5)^2 + (y-.5)^2} \leq 0.2 \\ \ln 8 \; {\rm otherwise}. \end{array}\right. $$

The forcing term f and the boundary conditions u0 for the forward problem are

$$ f = 1 \; \forall {\bf x} \in \Omega, $$

and $$ u = 0 \; \forall {\bf x} \in \partial \Omega. $$

# create mesh and define function spaces
nx = 32
ny = 32
mesh = dl.UnitSquareMesh(nx, ny)
Vm = dl.FunctionSpace(mesh, 'Lagrange', 1)
Vu = dl.FunctionSpace(mesh, 'Lagrange', 2)

# The true and initial guess for inverted parameter
mtrue_str = 'std::log( 8. - 4.*(pow(x[0] - 0.5,2) + pow(x[1] - 0.5,2) < pow(0.2,2) ) )'
mtrue = dl.interpolate(dl.Expression(mtrue_str, degree=5), Vm)

# define function for state and adjoint
u = dl.Function(Vu)
m = dl.Function(Vm)
p = dl.Function(Vu)

# define Trial and Test Functions
u_trial, m_trial, p_trial = dl.TrialFunction(Vu), dl.TrialFunction(Vm), dl.TrialFunction(Vu)
u_test,  m_test,  p_test  = dl.TestFunction(Vu),  dl.TestFunction(Vm),  dl.TestFunction(Vu)

# initialize input functions
f  = dl.Constant(1.0)
u0 = dl.Constant(0.0)

# plot
plt.figure(figsize=(15,5))
nb.plot(mesh, subplot_loc=121, mytitle="Mesh", show_axis='on')
nb.plot(mtrue, subplot_loc=122, mytitle="True parameter field")
plt.show()

# set up dirichlet boundary conditions
def boundary(x,on_boundary):
    return on_boundary

bc_state = dl.DirichletBC(Vu, u0, boundary)
bc_adj   = dl.DirichletBC(Vu, dl.Constant(0.), boundary)

Set up synthetic observations (inverse crime):¶

To generate the synthetic observation we first solve the PDE for the state variable utrue corresponding to the true parameter mtrue. Specifically, we solve the variational problem

Find $u\in \mathcal{V}$ such that

$$\underbrace{\int_\Omega e^{m_{\text true}} \nabla u \cdot \nabla v \, dx}_{\; := \; a_{\rm true}} - \underbrace{\int_{\Omega} f\,v\,dx}_{\; := \;L_{\rm true}} = 0, \text{ for all } v\in \mathcal{V}$$

.

Then we perturb the true state variable and write the observation d as

$$ d = u_{\rm true} + \eta, \quad {\rm where} \; \eta \sim \mathcal{N}(0, \sigma^2).$$

Here the standard variation $\sigma$ is proportional to noise_level.

# noise level
noise_level = 0.01

# weak form for setting up the synthetic observations
a_true = ufl.inner( ufl.exp(mtrue) * ufl.grad(u_trial), ufl.grad(u_test)) * ufl.dx
L_true = f * u_test * ufl.dx

# solve the forward/state problem to generate synthetic observations
A_true, b_true = dl.assemble_system(a_true, L_true, bc_state)

utrue = dl.Function(Vu)
dl.solve(A_true, utrue.vector(), b_true)

d = dl.Function(Vu)
d.assign(utrue)

# perturb state solution and create synthetic measurements d
# d = u + ||u||/SNR * random.normal
MAX = d.vector().norm("linf")
noise = dl.Vector()
A_true.init_vector(noise,1)
noise.set_local( noise_level * MAX * np.random.normal(0, 1, len(d.vector().get_local())) )
bc_adj.apply(noise)

d.vector().axpy(1., noise)

# plot
nb.multi1_plot([utrue, d], ["State solution with mtrue", "Synthetic observations"])
plt.show()

The cost functional evaluation:¶

$$ J(m):=\underbrace{\frac{1}{2}\int_\Omega (u-d)^2\, dx}_{\text misfit} + \underbrace{\frac{\gamma}{2}\int_\Omega|\nabla m|^2\,dx}_{\text reg} $$

# Regularization parameter
gamma = 1e-9

# Define cost function
def cost(u, d, m, gamma):
    reg = 0.5*gamma * dl.assemble( ufl.inner(dl.grad(m), ufl.grad(m))*ufl.dx ) 
    misfit = 0.5 * dl.assemble( (u-d)**2*ufl.dx)
    return [reg + misfit, misfit, reg]

Setting up the variational form for the state/adjoint equations and gradient evaluation¶

Below we define the variational forms that appears in the the state/adjoint equations and gradient evaluations.

Specifically,

a_state, L_state stand for the bilinear and linear form of the state equation, repectively;
a_adj, L_adj stand for the bilinear and linear form of the adjoint equation, repectively;
grad_misfit, grad_reg stand for the contributions to the gradient coming from the data misfit and the regularization, respectively.

We also build the mass matrix $M$ that is used to discretize the $L^2(\Omega)$ inner product.

# weak form for setting up the state equation
a_state = ufl.inner( ufl.exp(m) * ufl.grad(u_trial), ufl.grad(u_test)) * ufl.dx
L_state = f * u_test * ufl.dx

# weak form for setting up the adjoint equations
a_adj = ufl.inner( ufl.exp(m) * ufl.grad(p_trial), ufl.grad(p_test) ) * ufl.dx
L_adj = - ufl.inner(u - d, p_test) * ufl.dx

# weak form for gradient
grad_misfit = ufl.inner(ufl.exp(m)*m_test*ufl.grad(u), ufl.grad(p)) * ufl.dx
grad_reg    = dl.Constant(gamma)*ufl.inner(ufl.grad(m), ufl.grad(m_test))*ufl.dx

# Mass matrix in parameter space
Mvarf    = ufl.inner(m_trial, m_test) * ufl.dx
M = dl.assemble(Mvarf)

Finite difference check of the gradient¶

We use a finite difference check to verify that our gradient derivation is correct. Specifically, we consider a function $ m_0\in \mathcal{M}$ and we verify that for an arbitrary direction $\tilde{m} \in \mathcal{M}$ we have $$ r := \left| \frac{ \mathcal{J}(m_0 + \varepsilon \tilde{m}) - \mathcal{J}(m_0)}{\varepsilon} - \left(\mathcal{G}(m_0), \tilde{m}\right)\right| = \mathcal{O}(\varepsilon).$$

In the figure below we show in a loglog scale the value of $r$ as a function of $\varepsilon$. We observe that $r$ decays linearly for a wide range of values of $\varepsilon$, however we notice an increase in the error for extremely small values of $\varepsilon$ due to numerical stability and finite precision arithmetic.

m0 = dl.interpolate(dl.Constant(np.log(4.) ), Vm )

n_eps = 32
eps = np.power(2., -np.arange(n_eps))
err_grad = np.zeros(n_eps)

m.assign(m0)

#Solve the fwd problem and evaluate the cost functional
A, state_b = dl.assemble_system (a_state, L_state, bc_state)
dl.solve(A, u.vector(), state_b)

c0, _, _ = cost(u, d, m, gamma)

# Solve the adjoint problem and evaluate the gradient
adj_A, adjoint_RHS = dl.assemble_system(a_adj, L_adj, bc_adj)
dl.solve(adj_A, p.vector(), adjoint_RHS)

# evaluate the  gradient
grad0 = dl.assemble(grad_misfit + grad_reg)

# Define an arbitrary direction m_hat to perform the check 
mtilde = dl.Function(Vm).vector()
mtilde.set_local(np.random.randn(Vm.dim()))
mtilde.apply("")
mtilde_grad0 = grad0.inner(mtilde)

for i in range(n_eps):
    m.assign(m0)
    m.vector().axpy(eps[i], mtilde)
    
    A, state_b = dl.assemble_system (a_state, L_state, bc_state)
    dl.solve(A, u.vector(), state_b)

    cplus, _, _ = cost(u, d, m, gamma)
   
    err_grad[i] = abs( (cplus - c0)/eps[i] - mtilde_grad0 )

plt.figure()    
plt.loglog(eps, err_grad, "-ob", label="Error Grad")
plt.loglog(eps, (.5*err_grad[0]/eps[0])*eps, "-.k", label="First Order")
plt.title("Finite difference check of the first variation (gradient)")
plt.xlabel("eps")
plt.ylabel("Error grad")
plt.legend(loc = "upper left")
plt.show()

Initial guess¶

We solve the state equation and compute the cost functional for the initial guess of the parameter m0

m.assign(m0)

# solve state equation
A, state_b = dl.assemble_system (a_state, L_state, bc_state)
dl.solve(A, u.vector(), state_b)

# evaluate cost
[cost_old, misfit_old, reg_old] = cost(u, d, m, gamma)

# plot
plt.figure(figsize=(15,5))
nb.plot(m,subplot_loc=121, mytitle="m0", vmin=mtrue.vector().min(), vmax=mtrue.vector().max())
nb.plot(u,subplot_loc=122, mytitle="u(m0)")
plt.show()

The steepest descent with Armijo line search:¶

We solve the constrained optimization problem using the steepest descent method with Armijo line search.

The stopping criterion is based on a relative reduction of the norm of the gradient (i.e. $\frac{\|g_{n}\|}{\|g_{0}\|} \leq \tau$).

The gradient is computed by solving the state and adjoint equation for the current parameter $m$, and then substituing the current state $u$, parameter $m$ and adjoint $p$ variables in the weak form expression of the gradient:

$$ (g, \tilde{m}) = \gamma(\nabla m, \nabla \tilde{m}) +(\tilde{m}e^m\nabla u, \nabla p).$$

The Armijo line search uses backtracking to find $\alpha$ such that a sufficient reduction in the cost functional is achieved. Specifically, we use backtracking to find $\alpha$ such that:

$$J( m - \alpha g ) \leq J(m) - \alpha c_{\rm armijo} (g,g). $$

# define parameters for the optimization
tol = 1e-4
maxiter = 1000
print_any = 10
plot_any = 50
c_armijo = 1e-5

# initialize iter counters
iter = 0
converged = False

# initializations
g = dl.Vector()
M.init_vector(g,0)

m_prev = dl.Function(Vm)

print( "Nit  cost          misfit        reg         ||grad||       alpha  N backtrack" )

while iter <  maxiter and not converged:

    # solve the adoint problem
    adj_A, adjoint_RHS = dl.assemble_system(a_adj, L_adj, bc_adj)
    dl.solve(adj_A, p.vector(), adjoint_RHS)

    # evaluate the  gradient
    MG = dl.assemble(grad_misfit + grad_reg)
    dl.solve(M, g, MG)

    # calculate the norm of the gradient
    grad_norm2 = g.inner(MG)
    gradnorm = np.sqrt(grad_norm2)
    
    if iter == 0:
        gradnorm0 = gradnorm

    # linesearch
    it_backtrack = 0
    m_prev.assign(m)
    alpha = 1.e5
    backtrack_converged = False
    for it_backtrack in range(20):
        
        m.vector().axpy(-alpha, g )

        # solve the forward problem
        state_A, state_b = dl.assemble_system(a_state, L_state, bc_state)
        dl.solve(state_A, u.vector(), state_b)

        # evaluate cost
        [cost_new, misfit_new, reg_new] = cost(u, d, m, gamma)

        # check if Armijo conditions are satisfied
        if cost_new < cost_old - alpha * c_armijo * grad_norm2:
            cost_old = cost_new
            backtrack_converged = True
            break
        else:
            alpha *= 0.5
            m.assign(m_prev)  # reset m
            
    if backtrack_converged == False:
        print( "Backtracking failed. A sufficient descent direction was not found" )
        converged = False
        break

    sp = ""
    if (iter % print_any)== 0 :
        print( "%3d %1s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %3d" % \
            (iter, sp, cost_new, sp, misfit_new, sp, reg_new, sp, \
            gradnorm, sp, alpha, sp, it_backtrack) )

    if (iter % plot_any)== 0 :
        nb.multi1_plot([m,u,p], ["m","u","p"], same_colorbar=False)
        plt.show()
    
    # check for convergence
    if gradnorm < tol*gradnorm0 and iter > 0:
        converged = True
        print ("Steepest descent converged in ",iter,"  iterations")
        
    iter += 1
    
if not converged:
    print ( "Steepest descent did not converge in ", maxiter, " iterations")

Nit  cost          misfit        reg         ||grad||       alpha  N backtrack
  0   1.11658e-05   1.10372e-05   1.28638e-07   6.09224e-05   5.00000e+04     1

 10   3.07468e-06   2.81411e-06   2.60567e-07   9.96798e-06   2.50000e+04     2
 20   1.68333e-06   1.36733e-06   3.15999e-07   4.79045e-06   2.50000e+04     2
 30   1.08642e-06   7.90406e-07   2.96014e-07   1.08801e-06   1.00000e+05     0
 40   6.99077e-07   4.72703e-07   2.26374e-07   2.65053e-06   2.50000e+04     2
 50   4.33194e-07   2.77234e-07   1.55960e-07   1.89540e-06   2.50000e+04     2

 60   2.08625e-07   1.30743e-07   7.78817e-08   2.16983e-06   2.50000e+04     2
 70   6.76060e-08   4.57894e-08   2.18166e-08   8.24665e-07   5.00000e+04     1
 80   1.48917e-08   1.10392e-08   3.85253e-09   6.15229e-07   5.00000e+04     1
 90   6.29351e-09   4.15392e-09   2.13959e-09   1.56127e-07   5.00000e+04     1
100   5.60737e-09   3.69792e-09   1.90946e-09   4.43594e-08   5.00000e+04     1

110   5.50313e-09   3.68332e-09   1.81981e-09   1.54409e-08   5.00000e+04     1
120   5.46016e-09   3.69724e-09   1.76293e-09   1.25966e-08   5.00000e+04     1
130   5.43430e-09   3.70647e-09   1.72783e-09   1.07081e-08   5.00000e+04     1
140   5.41730e-09   3.71189e-09   1.70542e-09   9.20452e-09   5.00000e+04     1
150   5.40583e-09   3.71517e-09   1.69066e-09   7.93538e-09   5.00000e+04     1

Steepest descent converged in  153   iterations

nb.multi1_plot([mtrue, m], ["mtrue", "m"])
nb.multi1_plot([u,p], ["u","p"], same_colorbar=False)
plt.show()

This file is part of cvips_labs, the teaching material for Computational and Variational Methods for Inverse Problems at The University of Texas at Austin. Please see https://hippylib.github.io/cvips_labs for more information and source code availability.

We would like to acknowledge the Advanced Cyberinfrastructure Coordination Ecosystem: Services & Support (ACCESS) program for providing cloud computing resources (Jetstream) for this course through allocation MTH230002. ACCESS is an advanced computing and data resource supported by the National Science Foundation and made possible through these lead institutions and their partners – Carnegie Mellon University; University of Colorado Boulder; University of Illinois at Urbana-Champaign; and State University of New York at Buffalo.