Coefficient field inversion in an elliptic partial differential equation
We consider the estimation of a coefficient in an elliptic partial differential equation as a model problem. Depending on the interpretation of the unknowns and the type of measurements, this model problem arises, for instance, in inversion for groundwater flow or heat conductivity. It can also be interpreted as finding a membrane with a certain spatially varying stiffness. Let , be an open, bounded domain and consider the following problem:
where is the solution of
Here the unknown coefficient field, denotes (possibly noisy) data, a given force, and the regularization parameter.
The variational (or weak) form of the state equation:
Find such that where is the space of functions vanishing on with square integrable derivatives. Here, denotes the -inner product, i.e, for scalar functions defined on we denote .
The Lagrangian functional , which we use as a tool to derive the optimality system, is given by
The Lagrange multiplier theory shows that, at a solution all variations of the Lagrangian functional with respect to all variables must vanish. These variations of with respect to in directions are given by
where the variations are taken from the same spaces as .
The gradient of the cost functional therefore is
Newton's method requires second-order variational derivatives of the Lagrangian . Written in abstract form, it computes an update direction from the following Newton step for the Lagrangian functional:
for all variations , where and denote the first and second variations of the Lagrangian. For the elliptic parameter inversion problem, this Newton step (written in variatonal form) is as follows: Find as the solution of the linear system
for all .
Discrete Newton system:
The discretized Newton step: denote the vectors corresponding to the discretization of the functions by and . Then, the discretization of the above system is given by the following symmetric linear system:
where , , , and are the components of the Hessian matrix of the Lagrangian, and are the Jacobian of the state equation with respect to the state and the control variables, respectively and , , and are the discrete gradients of the Lagrangian with respect to , and , respectively.
Reduced Hessian apply:
To eliminate the incremental state and adjoint variables, and , from the first and last equations we use
This results in the following reduced linear system for the Newton step
with the reduced Hessian applied to a vector given by
By the end of this notebook, you should be able to:
- solve the forward and adjoint Poisson equations
- understand the inverse method framework
- visualise and understand the results
- modify the problem and code
Mathematical tools used:
- Finite element method
- Derivation of gradiant and Hessian via the adjoint method
- inexact Newton-CG
- Armijo line search
List of software used:
- FEniCS, a parallel finite element element library for the discretization of partial differential equations
- PETSc, for scalable and efficient linear algebra operations and solvers
- Matplotlib, a python package used for plotting the results
- Numpy, a python package for linear algebra
from dolfin import * import sys sys.path.append( "../" ) from hippylib import * import numpy as np import time import logging import matplotlib.pyplot as plt %matplotlib inline import nb start = time.clock() logging.getLogger('FFC').setLevel(logging.WARNING) logging.getLogger('UFL').setLevel(logging.WARNING) set_log_active(False) np.random.seed(seed=1)
Model set up:
As in the introduction, the first thing we need to do is set up the numerical model. In this cell, we set the mesh, the finite element functions corresponding to state, adjoint and coefficient/gradient variables, and the corresponding test functions and the parameters for the optimization.
# create mesh and define function spaces nx = 64 ny = 64 mesh = UnitSquareMesh(nx, ny) Va = FunctionSpace(mesh, 'Lagrange', 1) Vu = FunctionSpace(mesh, 'Lagrange', 2) # The true and inverted parameter atrue = interpolate(Expression('log(2 + 7*(pow(pow(x - 0.5,2) + pow(x - 0.5,2),0.5) > 0.2))', degree=5),Va) a = interpolate(Expression("log(2.0)", degree=1),Va) # define function for state and adjoint u = Function(Vu) p = Function(Vu) # define Trial and Test Functions u_trial, p_trial, a_trial = TrialFunction(Vu), TrialFunction(Vu), TrialFunction(Va) u_test, p_test, a_test = TestFunction(Vu), TestFunction(Vu), TestFunction(Va) # initialize input functions f = Constant("1.0") u0 = Constant("0.0") # plot plt.figure(figsize=(15,5)) nb.plot(mesh,subplot_loc=121, mytitle="Mesh", show_axis='on') nb.plot(atrue,subplot_loc=122, mytitle="True parameter field") plt.show()
# set up dirichlet boundary conditions def boundary(x,on_boundary): return on_boundary bc_state = DirichletBC(Vu, u0, boundary) bc_adj = DirichletBC(Vu, Constant(0.), boundary)
Set up synthetic observations:
- Propose a coefficient field shown above
The weak form of the pde: Find such that .
Perturb the solution: , where
# noise level noise_level = 0.05 # weak form for setting up the synthetic observations a_goal = inner(exp(atrue) * nabla_grad(u_trial), nabla_grad(u_test)) * dx L_goal = f * u_test * dx # solve the forward/state problem to generate synthetic observations goal_A, goal_b = assemble_system(a_goal, L_goal, bc_state) utrue = Function(Vu) solve(goal_A, utrue.vector(), goal_b) ud = Function(Vu) ud.assign(utrue) # perturb state solution and create synthetic measurements ud # ud = u + ||u||/SNR * random.normal MAX = ud.vector().norm("linf") noise = Vector() goal_A.init_vector(noise,1) noise.set_local( noise_level * MAX * np.random.normal(0, 1, len(ud.vector().array())) ) bc_adj.apply(noise) ud.vector().axpy(1., noise) # plot nb.multi1_plot([utrue, ud], ["State solution with atrue", "Synthetic observations"]) plt.show()
The cost function evaluation:
In the code below, and are symmetric positive definite matrices that stem from finite element discretization of the misfit and regularization component of the cost functional, respectively.
# regularization parameter gamma = 1e-8 # weak for for setting up the misfit and regularization compoment of the cost W_equ = inner(u_trial, u_test) * dx R_equ = gamma * inner(nabla_grad(a_trial), nabla_grad(a_test)) * dx W = assemble(W_equ) R = assemble(R_equ) # refine cost function def cost(u, ud, a, W, R): diff = u.vector() - ud.vector() reg = 0.5 * a.vector().inner(R*a.vector() ) misfit = 0.5 * diff.inner(W * diff) return [reg + misfit, misfit, reg]
Setting up the state equations, right hand side for the adjoint and the necessary matrices:
# weak form for setting up the state equation a_state = inner(exp(a) * nabla_grad(u_trial), nabla_grad(u_test)) * dx L_state = f * u_test * dx # weak form for setting up the adjoint equation a_adj = inner(exp(a) * nabla_grad(p_trial), nabla_grad(p_test)) * dx L_adj = -inner(u - ud, p_test) * dx # weak form for setting up matrices Wua_equ = inner(exp(a) * a_trial * nabla_grad(p_test), nabla_grad(p)) * dx C_equ = inner(exp(a) * a_trial * nabla_grad(u), nabla_grad(u_test)) * dx Raa_equ = inner(exp(a) * a_trial * a_test * nabla_grad(u), nabla_grad(p)) * dx M_equ = inner(a_trial, a_test) * dx # assemble matrix M M = assemble(M_equ)
We solve the state equation and compute the cost functional for the initial guess of the parameter
# solve state equation state_A, state_b = assemble_system (a_state, L_state, bc_state) solve (state_A, u.vector(), state_b) # evaluate cost [cost_old, misfit_old, reg_old] = cost(u, ud, a, W, R) # plot plt.figure(figsize=(15,5)) nb.plot(a,subplot_loc=121, mytitle="a_ini", vmin=atrue.vector().min(), vmax=atrue.vector().max()) nb.plot(u,subplot_loc=122, mytitle="u(a_ini)") plt.show()
The reduced Hessian apply to a vector v:
Here we describe how to apply the reduced Hessian operator to a vector v. For an opportune choice of the regularization, the reduced Hessian operator evaluated in a neighborhood of the solution is positive define, whereas far from the solution the reduced Hessian may be indefinite. On the constrary, the Gauss-Newton approximation of the Hessian is always positive defined.
For this reason, it is beneficial to perform a few initial Gauss-Newton steps (5 in this particular example) to accelerate the convergence of the inexact Newton-CG algorithm.
The Hessian apply reads:
The Gauss-Newton Hessian apply is obtained by dropping the second derivatives operators , , and :
# Class HessianOperator to perform Hessian apply to a vector class HessianOperator(): cgiter = 0 def __init__(self, R, Raa, C, A, adj_A, W, Wua, use_gaussnewton=False): self.R = R self.Raa = Raa self.C = C self.A = A self.adj_A = adj_A self.W = W self.Wua = Wua self.use_gaussnewton = use_gaussnewton # incremental state self.du = Vector() self.A.init_vector(self.du,0) #incremental adjoint self.dp = Vector() self.adj_A.init_vector(self.dp,0) # auxiliary vectors self.CT_dp = Vector() self.C.init_vector(self.CT_dp, 1) self.Wua_du = Vector() self.Wua.init_vector(self.Wua_du, 1) def init_vector(self, v, dim): self.R.init_vector(v,dim) # Hessian performed on x, output as generic vector y def mult(self, v, y): self.cgiter += 1 y.zero() if self.use_gaussnewton: self.mult_GaussNewton(v,y) else: self.mult_Newton(v,y) # define (Gauss-Newton) Hessian apply H * v def mult_GaussNewton(self, v, y): #incremental forward rhs = -(self.C * v) bc_adj.apply(rhs) solve (self.A, self.du, rhs) #incremental adjoint rhs = - (self.W * self.du) bc_adj.apply(rhs) solve (self.adj_A, self.dp, rhs) # Reg/Prior term self.R.mult(v,y) # Misfit term self.C.transpmult(self.dp, self.CT_dp) y.axpy(1, self.CT_dp) # define (Newton) Hessian apply H * v def mult_Newton(self, v, y): #incremental forward rhs = -(self.C * v) bc_adj.apply(rhs) solve (self.A, self.du, rhs) #incremental adjoint rhs = -(self.W * self.du) - self.Wua * v bc_adj.apply(rhs) solve (self.adj_A, self.dp, rhs) #Reg/Prior term self.R.mult(v,y) y.axpy(1., Raa*v) #Misfit term self.C.transpmult(self.dp, self.CT_dp) y.axpy(1., self.CT_dp) self.Wua.transpmult(self.du, self.Wua_du) y.axpy(1., self.Wua_du)
The inexact Newton-CG optimization with Armijo line search:
We solve the constrained optimization problem using the inexact Newton-CG method with Armijo line search.
The stopping criterion is based on a relative reduction of the norm of the gradient (i.e. ).
First, we compute the gradient by solving the state and adjoint equation for the current parameter , and then substituing the current state , parameter and adjoint variables in the weak form expression of the gradient:
Then, we compute the Newton direction by iteratively solving . The Newton system is solved inexactly by early termination of conjugate gradient iterations via Eisenstat–Walker (to prevent oversolving) and Steihaug (to avoid negative curvature) criteria.
Finally, the Armijo line search uses backtracking to find such that a sufficient reduction in the cost functional is achieved. More specifically, we use backtracking to find such that:
# define parameters for the optimization tol = 1e-8 c = 1e-4 maxiter = 12 plot_on = False # initialize iter counters iter = 1 total_cg_iter = 0 converged = False # initializations g, a_delta = Vector(), Vector() R.init_vector(a_delta,0) R.init_vector(g,0) a_prev = Function(Va) print "Nit CGit cost misfit reg sqrt(-G*D) ||grad|| alpha tolcg" while iter < maxiter and not converged: # assemble matrix C C = assemble(C_equ) # solve the adoint problem adjoint_A, adjoint_RHS = assemble_system(a_adj, L_adj, bc_adj) solve(adjoint_A, p.vector(), adjoint_RHS) # assemble W_ua and R Wua = assemble (Wua_equ) Raa = assemble (Raa_equ) # evaluate the gradient CT_p = Vector() C.init_vector(CT_p,1) C.transpmult(p.vector(), CT_p) MG = CT_p + R * a.vector() solve(M, g, MG) # calculate the norm of the gradient grad2 = g.inner(MG) gradnorm = sqrt(grad2) # set the CG tolerance (use Eisenstat–Walker termination criterion) if iter == 1: gradnorm_ini = gradnorm tolcg = min(0.5, sqrt(gradnorm/gradnorm_ini)) # define the Hessian apply operator (with preconditioner) Hess_Apply = HessianOperator(R, Raa, C, state_A, adjoint_A, W, Wua, use_gaussnewton=(iter<6) ) P = R + gamma * M Psolver = PETScKrylovSolver("cg", amg_method()) Psolver.set_operator(P) solver = CGSolverSteihaug() solver.set_operator(Hess_Apply) solver.set_preconditioner(Psolver) solver.parameters["rel_tolerance"] = tolcg solver.parameters["zero_initial_guess"] = True solver.parameters["print_level"] = -1 # solve the Newton system H a_delta = - MG solver.solve(a_delta, -MG) total_cg_iter += Hess_Apply.cgiter # linesearch alpha = 1 descent = 0 no_backtrack = 0 a_prev.assign(a) while descent == 0 and no_backtrack < 10: a.vector().axpy(alpha, a_delta ) # solve the state/forward problem state_A, state_b = assemble_system(a_state, L_state, bc_state) solve(state_A, u.vector(), state_b) # evaluate cost [cost_new, misfit_new, reg_new] = cost(u, ud, a, W, R) # check if Armijo conditions are satisfied if cost_new < cost_old + alpha * c * MG.inner(a_delta): cost_old = cost_new descent = 1 else: no_backtrack += 1 alpha *= 0.5 a.assign(a_prev) # reset a # calculate sqrt(-G * D) graddir = sqrt(- MG.inner(a_delta) ) sp = "" print "%2d %2s %2d %3s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %8.5e %1s %5.2f %1s %5.3e" % \ (iter, sp, Hess_Apply.cgiter, sp, cost_new, sp, misfit_new, sp, reg_new, sp, \ graddir, sp, gradnorm, sp, alpha, sp, tolcg) if plot_on: nb.multi1_plot([a,u,p], ["a","u","p"], same_colorbar=False) plt.show() # check for convergence if gradnorm < tol and iter > 1: converged = True print "Newton's method converged in ",iter," iterations" print "Total number of CG iterations: ", total_cg_iter iter += 1 if not converged: print "Newton's method did not converge in ", maxiter, " iterations" print "Time elapsed: ", time.clock()-start
Nit CGit cost misfit reg sqrt(-G*D) ||grad|| alpha tolcg 1 1 1.12708e-05 1.12708e-05 1.33979e-11 1.56540e-02 3.79427e-04 1.00 5.000e-01 2 1 7.79732e-07 7.79695e-07 3.67737e-11 4.68278e-03 5.35002e-05 1.00 3.755e-01 3 1 3.10620e-07 3.10571e-07 4.91259e-11 9.71633e-04 7.13895e-06 1.00 1.372e-01 4 5 1.92183e-07 1.62405e-07 2.97780e-08 4.51694e-04 1.00276e-06 1.00 5.141e-02 5 1 1.86913e-07 1.57119e-07 2.97941e-08 1.02668e-04 6.12750e-07 1.00 4.019e-02 6 12 1.80408e-07 1.37719e-07 4.26890e-08 1.15975e-04 2.24111e-07 1.00 2.430e-02 7 5 1.80331e-07 1.38935e-07 4.13963e-08 1.23223e-05 4.17399e-08 1.00 1.049e-02 8 15 1.80330e-07 1.39056e-07 4.12734e-08 1.74451e-06 3.43216e-09 1.00 3.008e-03 Newton's method converged in 8 iterations Total number of CG iterations: 41 Time elapsed: 9.28313
nb.multi1_plot([atrue, a], ["atrue", "a"]) nb.multi1_plot([u,p], ["u","p"], same_colorbar=False) plt.show()
Copyright (c) 2016, The University of Texas at Austin & University of California, Merced. All Rights reserved. See file COPYRIGHT for details.
This file is part of the hIPPYlib library. For more information and source code availability see https://hippylib.github.io.
hIPPYlib is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License (as published by the Free Software Foundation) version 2.0 dated June 1991.