Performance of preconditioned iterative linear solvers for cardiovascular simulations in rigid and deformable vessels

Jongmin Seo; Daniele E Schiavazzi; Alison L Marsden

doi:10.1007/s00466-019-01678-3

. Author manuscript; available in PMC: 2020 Sep 15.

Published in final edited form as: Comput Mech. 2019 Feb 6;64:717–739. doi: 10.1007/s00466-019-01678-3

Performance of preconditioned iterative linear solvers for cardiovascular simulations in rigid and deformable vessels

Jongmin Seo ¹, Daniele E Schiavazzi ², Alison L Marsden ³

PMCID: PMC6905469 NIHMSID: NIHMS1520943 PMID: 31827310

Abstract

Computing the solution of linear systems of equations is invariably the most time consuming task in the numerical solutions of PDEs in many fields of computational science. In this study, we focus on the numerical simulation of cardiovascular hemodynamics with rigid and deformable walls, discretized in space and time through the variational multiscale finite element method. We focus on three approaches: the problem agnostic generalized minimum residual (GMRES) and stabilized bi-conjugate gradient (BICGS) methods, and a recently proposed, problem specific, bi-partitioned (BIPN) method. We also perform a comparative analysis of several preconditioners, including diagonal, block-diagonal, incomplete factorization, multigrid, and resistance based methods. Solver performance and matrix characteristics (diagonal dominance, symmetry, sparsity, bandwidth and spectral properties) are first examined for an idealized cylindrical geometry with physiologic boundary conditions and then successively tested on several patient-specific anatomies representative of realistic cardiovascular simulation problems. Incomplete factorization preconditioners provide the best performance and results in terms of both strong and weak scalability. The BIPN method was found to outperform other methods in patient-specific models with rigid walls. In models with deformable walls, BIPN was outperformed by BICG with diagonal and Incomplete LU preconditioners.

Keywords: Cardiovascular simulation, Iterative linear solvers, Preconditioning, Fluid-structure interaction

1. Introduction

Cardiovascular simulations are increasingly used in clinical decision making, surgical planning, and medical device design. In this context, numerous modeling approaches have been proposed ranging from lumped parameter descriptions of the circulatory system to fully three-dimensional patient-specific representations. Patient-specific models are generated through a pipeline progressing from segmentation of medical image data, to branch lofting, boolean union, application of physiologic boundary conditions tuned to match patient data, and hemodynamics simulation. In diseased vessels, e.g., characterized by localized stenosis or aneurysms, computational fluid dynamics (CFD) has been widely used to diagnose important clinical indicators, such as pressure drop or flow reduction [34]. Measures of shear stress on the vessel lumen have also been correlated with the risk of endothelial damage and thrombus formation [33, 26]. These quantities are determined by discretization in space and time of the incompressible Navier Stokes equations. Multiscale models have been developed to simulate the local flow field in thee-dimensional patient-specific anatomies, while accounting for the presence of the peripheral circulation through closed-loop circuit models providing time dependent boundary conditions [18, 7, 23, 25]. In addition, several approaches for fluid-structure interaction (FSI) have been suggested to account for vessel wall deformability [13, 20, 27]. Recently, hemodynamic models have been used in the solution of complex problems in optimization [24, 22] and uncertainty quantification [29, 30, 32, 31, 36].

Efforts to improve realism in numerical simulations, however, often lead to an increase in the computational cost. Implementation of the coupled-momentum method (CMM) for FSI increases the simulation run time by roughly twice compared to rigid wall assumptions [13], and the cost can be substantially higher for Arbitrary Lagrangian and Eulerian (ALE) FSI. On top of this, optimization and uncertainty quantification studies often require the solution of a large number of simulations to obtain converged solutions. These requirements all point to a pressing need to reduce the computational cost to enable future integration of these tools in the clinical setting.

Preconditioned iterative approaches are widely used to solve linear systems, Ay = b, resulting from discretizations using variational multiscale finite element methods. However, few studies in the literature have examined in detail how linear solver performance depends on the properties of the coefficient matrix, to provide a concrete grounding for the choice and development of more efficient solvers. In addition, even fewer studies have carried out this analysis in the context of computational hemodynamics, i.e., the specific geometries, boundary conditions, mesh and material properties used to create numerical approximations of blood flow in rigid and deformable vessels. In this study, we investigate reductions in computational cost achievable through solving the discretized linear system efficiently, as this cost is well known to dominate the execution time. The objective of this study is to perform a systematic comparative analysis of existing linear solver strategies, linking their performance with the distributed coefficient matrix characteristics of cardiovascular modeling.

Krylov subspace based iterative solvers are typically preferred for the solution of large linear systems from CFD, due to their superior scalability and memory requirements compared to direct methods [4]. Popular Krylov subspace iterative solvers include the conjugate gradient method (CG) for symmetric positive definite (SPD) coefficient matrices, and the generalized minimum residual method (GMRES) or the bi-conjugate gradient stabilized method (BICGS) in the non-symmetric case. Alternatively, a recently proposed bi-partitioned linear solver (BIPN) [11] leverages the block-structure in the coefficient matrix, separating contributions from the Navier-Stokes momentum and continuity equations. In BIPN, the coefficient matrix A_f arising from finite element spatial discretizations and the time discretization for the Navier-Stokes equations consists of four blocks,

A_{f} = [\begin{matrix} K & G \\ D & L \end{matrix}],

(1)

in which K and G stem from the momentum equation, D and L stem from the continuity equation and stabilization. BIPN solves the matrix block K using GMRES, while the rest is transformed to a Schur complement form as L – DK⁻¹G, approximated by a SPD matrix, $L_{*} + G_{*}^{T} G_{*}$ , in which the star subscript indicates the symmetric Jacobi scaling with diagonals of K and L. The Schur complement form is solved with CG, and the solution time for CG takes more than 90% of the total computing time in benchmark testing[11].

It is also well known that preconditioning plays a key role in accelerating the convergence of Krylov subspace methods [37, 28] by transforming the original linear system Ay = b to M⁻¹ Ay = M⁻¹ b (left preconditioning), AM⁻¹ z = b, y = M⁻¹ z (right preconditioning), or $M_{1}^{- 1} {AM}_{2}^{- 1} y = M_{1}^{- 1} b, x = M_{2}^{- 1} y$ (left and right preconditioning). In many cases, M is constructed in a way that M⁻¹ approximates A⁻¹. In general, an ideal preconditioner should be relatively cheap to apply and effective to reduce the overall solution time. In its simplest form, left, right or left-right Jacobi (diagonal) preconditioners are effective in shrinking the eigenvalue spectrum of diagonally dominant matrices. Preconditioners based instead on incomplete factorization (ILU) provide an approximate decomposition in the form $M = \overset{‒}{L} \overset{‒}{U}$ , where the sparsity pattern of A is preserved in the factors. ILUT preconditioners are slightly more general approaches allowing for adjustable inclusion of fill-ins, but require the user to specify an additional threshold parameter. We note that the efficiency of an ILU preconditioner results from a trade off between fewer Krylov iterations needed for convergence and the cost of incomplete factorization [28]. Application-specific preconditioners have also been proposed in cardiovascular hemodynamics to improve performance when the model outlets are coupled through a resistance boundary condition, an RCR circuit, or for more general multi-domain configurations. In what follows, we refer to an in-house implementation of this class of preconditioners as resistance-based preconditioner (RPC) [10, 11]. Additional preconditioning techniques for cardiovascular simulations with FSI are suggested in [21, 8]. Finally, algebraic multigrid preconditioners have also received significant recent interest [9].

Despite the availability of open-source implementations of iterative solvers and preconditioners, few studies have systematically compared the performance of these solvers for cardiovascuar models with rigid and deformable vessels. In addition, a thorough understanding of the factors affecting the performance of iterative linear solvers (e.g., diagonal dominance, condition number, sparsity pattern, symmetry, and positive-definiteness) is an important prerequisite for optimal choice of solver and for the development of new algorithms with improved performance.

In the current study, we first compare the performance of various iterative linear solvers and preconditioners for an idealized flow through a cylindrical vessel with a resistance outflow boundary condition. We then test our findings using three representative patient-specific cardiovascular models. The Trilinos software library [14], developed at Sandia National Laboratory is coupled with the SimVascular svFSI open source finite element code to provide linear solvers such as GMRES and BICGS, as well as a variety of preconditioners: diagonal (Diag), block-diagonal (BlockD), incomplete LU (ILU), thresholded incomplete LU (ILUT), incomplete Cholesky (IC), and algebraic multigrid (ML). We use Kylov linear solvers and Diag, BlockD, ILU, ILUT preconditioners from the AztecOO package, and IC is implemented via the IF-PACK package. The block-diagonal preconditioner scales the block matrix via the Trilinos Epetra Vbr class. The incomplete factorization methods use Additive Schwarz domain decomposition for the parallelization. The Trilinos ML package for algebraic multigrid is implemented on the AztecOO package. For detailed information on parallelization and preconditioning options, we refer readers to the Trilinos project [14]. BIPN and RPC are integrated and implemented directly in our flow solver with source code available through the SimVascular open source project (www.simvascular.org) [38].

This paper is organized as follows. In section 2, we review the formulation of the coefficient matrices resulting from finite element weak forms in fluid and solid mechanics, before discussing the performance of various linear solvers and preconditioners on a simple pipe benchmark in section 3. In section 4 we report the results of strong and weak scaling for BIPN with RPC and BICG with ILU, while in section 5 we examine the properties of the coefficient matrix. The effect of preconditioning on these properties is reported in section 6. In section 7 we compare performance of linear solvers in patient-specific models. We draw conclusions and discuss future work in section 8.

2. Linear systems in cardiovascular simulation

We begin by introducing the space-time discretization of the equations governing fluid and solid mechanics following an Arbitrary-Lagrangian-Eulerian (ALE) description of the interaction between fluid and structure [15, 3, 11, 39]. These equations are discretized with a variational multiscale finite element method, and are provided in the svFSI solver of the SimVascular open source project [38].

2.1. Linear system for fluid mechanics

Consider a domain $Ω_{f} \in R^{3}$ , occupied by a Newtonian fluid whose evolution in space and time is modeled through the incompressible Navier-Stokes equations in ALE form,

\begin{matrix} ρ \frac{\partial u}{\partial t} ∣_{\hat{x}} + ρ v \cdot \nabla u = ρ f + \nabla \cdot σ_{f} \\ \nabla \cdot u = 0 \end{matrix} in Ω_{f},

(2)

where ρ, u = u(x, t), and f are fluid density, velocity vector, and body force, respectively. The fluid stress tensor is σ_f = −pI + μ(∇u + ∇u^T) = −pI + μ∇^su, μ is the kinematic viscosity, p = p(x, t) the pressure, $v = u - \hat{u}$ is the fluid velocity relative to the velocity of the domain boundary $\hat{u}$ . Variables are interpolated in space at time tⁿ as

w (x) = \sum_{a \in I_{a}} N^{a} (x) w^{a}, q (x) = \sum_{a \in I_{a}} N^{a} (x) q^{a},

(3)

u (x, t = t^{n}) = u^{n} (x) = \sum_{a \in I_{a}} N^{a} (x) u^{a, n}, p (x, t = t^{n}) = p^{n} (x) = \sum_{a \in I_{a}} N^{a} (x) p^{a, n},

(4)

in which I_a, N^a, w^a, q^a, u^a, and p^a are the nodal connectivity set, interpolation functions at node a, test function weights, velocity and pressure at node a, respectively. In this study, we employ P1-P1 type (linear and continuous) spatial approximations of the fluid velocity and pressure. We consider a stabilized finite element discretization based on the variational multiscale method [2, 11], leading to the weak form of the Navier-Stokes momentum and continuity residuals

R_{m}^{a} (\dot{u}, u, v, p) = \sum_{e \in I_{e}} \int_{Ω^{e}} ρ N^{a} (\dot{u} - f + (v + u_{p}) \cdot \nabla u) d Ω + \sum_{e \in I_{e}} \int_{Ω^{e}} (\nabla N^{a})^{T} (- p I + μ \nabla^{s} u + ρ τ_{B} u_{p} \otimes (u_{p} \cdot \nabla u) - ρ u \otimes u_{p} + ρ τ_{C} \nabla \cdot u) d Ω - \int_{Γ_{h}} N^{a} h d Γ,

(5)

R_{c}^{a} (\dot{u}, u, p) = \int_{Ω} (N^{a} \nabla \cdot u - (\nabla N^{a})^{T} u_{p}) d Ω,

(6)

in which $R_{m}^{a}$ and $R_{c}^{a}$ are momentum and continuity residuals at node a, and h is the surface traction on the Neumann boundary Γ^h. The stabilization parameters are defined as

u_{p} = - τ_{M} (\dot{u} + v \cdot \nabla u + \frac{1}{ρ} \nabla_{p} - \frac{μ}{ρ} \nabla^{2} u - f), τ_{M} = {(\frac{4}{Δ t^{2}} + v \cdot G v + C_{I} {(\frac{μ}{ρ})}^{2} G : G)}^{- 1 ∕ 2}, τ_{B} = (u_{p} \cdot G u_{p})^{- 1 ∕ 2}, τ_{C} = (τ_{M} g \cdot g)^{- 1}, G_{i j} = \sum_{k = 1}^{3} \frac{\partial ξ_{k}}{\partial x_{i}} \frac{\partial ξ_{k}}{\partial x_{j}}, g \cdot g = \sum_{i = 1}^{3} g_{i} g_{i}, g_{i} = \sum_{k = 1}^{3} \frac{\partial ξ_{k}}{\partial x_{i}},

(7)

where C_I is a constant set to 3, Δt is the time step size, and ξ represents natural coordinates. Integration in time is performed using the unconditionally stable, second order accurate generalized-α method [16], consisting of four steps: predictor, initiator, Newton-Raphson, and corrector. Initial values for accelerations, velocities and pressures at time tⁿ⁺¹ are set in the prediction step as

{\dot{u}}^{a, n + 1} = \frac{γ - 1}{γ} {\dot{u}}^{a, n}, u^{a, n + 1} = u^{a, n}, p^{a, n + 1} = p^{a, n},

(8)

where γ = 0.5 + α_m – α_f, α_m = 1/(1 + ρ_∞), and α_f = (3 – ρ_∞)/(2 + 2ρ_∞) are the generalized-α method coefficients, while ρ_∞ is the spectral radius set to ρ_∞ = 0.2 in this study. In the initiator step, accelerations and velocities are computed at an intermediate stage n + α_m and n + α_f,

{\dot{u}}^{a, n + α_{m}} = (1 - α_{m}) {\dot{u}}^{a, n} + α_{m} {\dot{u}}^{a, n + 1}, u^{a, n + α_{f}} = (1 - α_{f}) u^{a, n} + α_{f} u^{a, n + 1} .

(9)

A Newton-Raphson iteration is performed based on Equations (5) and (6), using ${\dot{u}}^{n + α_{m}}$ , u^n+α_f, pⁿ⁺¹ from (9) by solving a linear system of the form

K Δ u + G Δ p = - R_{m} ({\dot{u}}^{n + α_{m}}, u^{n + α_{f}}, p^{n + 1}), D Δ u + L Δ p = - R_{c} ({\dot{u}}^{n + α_{m}}, u^{n + α_{f}}, p^{n + 1}),

(10)

where the blocks K, G, D, and L partition the tangent coefficient matrix with blocks for nodes a and b equal to

K^{a b} \approx \frac{\partial R_{m}^{a}}{\partial Δ u^{b}}, G^{a b} \approx \frac{\partial R_{m}^{a}}{\partial Δ p^{b}}, D^{a b} \approx \frac{\partial R_{c}^{a}}{\partial Δ u^{b}}, L^{a b} \approx \frac{\partial R_{c}^{a}}{\partial Δ p^{b}} .

(11)

We re-write this linear system in matrix form as

A_{f} y = - R_{f},

(12)

where

A_{f} = [\begin{matrix} K & G \\ D & L \end{matrix}], y = [\begin{matrix} Δ u \\ Δ p \end{matrix}], R_{f} = [\begin{matrix} R_{m} \\ R_{c} \end{matrix}],

(13)

with blocks K, G, D, L of size (3N_nd × 3N_nd), (N_nd × 3N_nd), (3N_nd × N_nd), (N_nd × N_nd), respectively. Here N_nd is the total number of nodes, while $Δ u \in R^{3 N_{n d}}$ and $Δ p \in R^{N_{n d}}$ contain nodal velocities and pressure increments. We note that the major focus of our study is on solving the linear system in equation (12). Once the momentum and continuity residual norms drop below a given tolerance, the unknowns at the next time step are determined through the corrections

{\dot{u}}^{a, n + 1} \leftarrow {\dot{u}}^{a, n + 1} + Δ u^{a}, u^{a, n + 1} \leftarrow u^{a, n + 1} + γ Δ t Δ u^{a}, p^{a, n + 1} \leftarrow p^{a, n + 1} + α_{f} γ Δ t Δ p^{a},

(14)

∀a ∈ I_a. Finally, detailed expressions for each block of the coefficient matrix from Eq. (5), Eq. (6), Eq. (11), and Eq. (14) are

K^{a b} = \sum_{e \in I_{e}} \int_{Ω^{e}} [ρ α_{m} N^{a} N^{b} I^{a b} + ρ {\tilde{α}}_{f} N^{a} (v + u_{p}) \cdot \nabla N^{b} I^{a b} + μ {\tilde{α}}_{f} (\nabla N^{a} \cdot \nabla N^{b} I^{a b} + \nabla N^{b} \otimes \nabla N^{a}) + ρ {\tilde{α}}_{f} τ_{B} u_{p} \cdot \nabla N^{a} u_{p} \cdot \nabla N^{b} I^{a b} + ρ τ_{M} u \cdot \nabla N^{a} (α_{m} N^{b} + {\tilde{α}}_{f} u \cdot \nabla N^{b}) I^{a b} + ρ {\tilde{α}}_{f} τ_{C} \nabla N^{a} \otimes \nabla N^{b}] d Ω,

(15)

G^{a b} = \sum_{e \in I_{e}} \int_{Ω^{e}} [- {\tilde{α}}_{f} \nabla N^{a} N^{b} + {\tilde{α}}_{f} τ_{M} u \cdot \nabla N^{a} \nabla N^{b}] d Ω,

(16)

D^{a b} = \sum_{e \in I_{e}} \int_{Ω^{e}} [{\tilde{α}}_{f} N^{a} \nabla N^{b} + {\tilde{α}}_{f} τ_{M} u \cdot \nabla N^{a} \nabla N^{b} + τ_{M} \nabla N^{a} α_{m} N^{b}] d Ω,

(17)

L^{a b} = \sum_{e \in I_{e}} \int_{Ω^{e}} [\frac{{\tilde{α}}_{f} τ_{M}}{ρ} \nabla N^{a} \cdot \nabla N^{b}] d Ω,

(18)

in which ${\tilde{α}}_{f} = γ Δ t α_{f}$ and I_e is the list of elements containing nodes a and b.

First we observe that the matrix K is diagonally dominant. Except for entries related to the stabilization terms, which are typically small, the most significant off-diagonal contribution is provided by the viscous term, which is also typically smaller than the acceleration and advection terms in cardiovascular flows. Second, the small magnitude of the stabilization terms suggests that G is similar to −D^T. We also observe that the matrices K and A_f are non-symmetric, while L is symmetric and singular since it has an identical structure to the matrices arising from the discretization of generalized Laplace operators. We also note that L is characterized by small entries compared to the other blocks, since it only consists of stabilization terms.

2.2. Linear system for solid mechanics

In the solid domain, we start by introducing measures of deformation induced by a displacement field d = x – X, i.e., the difference between the current and material configurations $x \in R^{3}$ and $X \in R^{3}$ , respectively

F = \nabla d + I, C = F^{T} F, E = \frac{1}{2} (C - I),

(19)

where F, C, E, represent the deformation gradient, the Cauchy-Green deformation tensor and the Green strain tensor. The Jacobian is also defined as J= det(F). We relate the second Piola-Kirchhoff stress tensor S with the Green strain tensor E through the Saint Venant-Kirchhoff hyperelastic constitutive model

S = λ tr (E) I + 2 μ E,

(20)

where $λ = \frac{ν E_{s}}{(1 + ν) (1 - 2 ν)}$ , $μ = \frac{E_{s}}{2 (1 + ν)}$ , E_s and ν represent the Young’s modulus and Poisson’s ratio, respectively. The equilibrium equation is

ρ_{s} \frac{\partial u}{\partial t} = ρ_{s} f + \nabla \cdot σ_{s} in Ω_{s},

(21)

where ρ_s and σ_s denote the density and solid stress tensor, respectively. This leads to the weak form

\int_{Ω_{s}^{0}} [ρ_{s}^{0} w (\dot{u} - f) + \nabla w : P] d Ω = 0,

(22)

where P = FS is the first Piola-Kirchhoff stress tensor, w is a virtual displacement and $Ω_{s}^{0}$ is the solid domain in the reference configuration. Discretization of (22) leads to the residual

R_{m}^{a} (\dot{u}, d) = \int_{Ω_{s}^{0}} [ρ_{s}^{0} N^{a} (\dot{u} - f) + F S \nabla N^{a}] d Ω .

(23)

Using the generalized-α method, the displacements at time tⁿ⁺¹ are predicted as

d^{a, n + 1} = d^{a, n} + u^{a, n + 1} Δ t + \frac{0.5 γ - β}{γ - 1} {\dot{u}}^{a, n + 1} Δ t^{2},

(24)

in which $β = \frac{1}{4} (1 + α_{f} - α_{m})^{2}$ . In the initiator step, the intermediate displacements are provided by

d^{a, n + α_{f}} = (1 - α_{f}) d^{a, n} + α_{f} d^{a, n + 1} .

(25)

Solving (23) with the Newton-Raphson method, we obtain the linear system

K_{s} Δ d = - R_{m} ({\dot{u}}^{n + α_{m}}, d^{n + α_{f}}),

(26)

where $K_{s}^{a b} \approx \frac{\partial R_{m}^{a}}{\partial Δ d^{b}}$ , with tangent stiffness matrix

K_{s}^{a b} = \int_{Ω_{s}^{0}} [ρ_{s}^{0} α_{m} N^{a} N^{b} I + {\hat{α}}_{f} (S \nabla N^{a}) \cdot \nabla N^{b} I + λ {\hat{α}}_{f} (F \nabla N^{a}) \otimes (F \nabla N^{b}) + μ {\hat{α}}_{f} (F \nabla N^{b}) \otimes (F \nabla N^{a}) + μ {\hat{α}}_{f} F F^{T} \nabla N^{a} \cdot \nabla N^{b}] d Ω,

(27)

and ${\hat{α}}_{f} = α_{f} β Δ t^{2}$ . At this point, d is corrected using

d^{a, n + 1} \leftarrow d^{a, n + 1} + β Δ t^{2} Δ d^{a}, \forall a \in I_{a} .

(28)

Again, we note that the focus of this work is on solving the linear system in equation (26) together with solving equation (12) when including FSI. Finally, we observe that most terms in (27) (i.e., the third, fourth, and fifth terms) contribute to the off-diagonals of K_s and that the matrix K_s is symmetric.

3. Linear solver performance on pipe flow benchmark

All tests discussed in this study are performed using the svFSI finite element solver from the SimVascular open source project, leveraging the message passing interface (MPI) library and optimized to run efficiently on large computational clusters [12]. We note this implementation assigns fluid and solid elements to separate cores and assumes unique interface nodes, i.e., common nodes on the fluid and solid mesh are expected to match. An FSI mesh with matching interface nodes is generated through the freely-available Meshmixer software, with details reported in [39].

We consider a simple pipe benchmark whose size and boundary conditions are chosen to represent the ascending aorta of a healthy subject, having 4 cm diameter, 30 cm length and 0.2 cm thickness, assuming a radius/thickness ratio of 10%. The inflow is steady with a parabolic velocity profile, having a mean flow rate of Q = 83 mL/s (5 L/min). A resistance boundary condition is used at the outlet equal to R = 1600 g/cm⁴/s, which produces a mean pressure of approximately ΔP = 100 mmHg, typical of the systemic circulation of a healthy subject. Simulations are performed with rigid and deformable walls (see Figure 1), using 1,002,436 tetrahedral elements for the fluid domain and 192,668 tetrahedral elements for the wall, generated in SimVascular with the TetGen mesh generator plugin [38, 19]. We measure the wall clock time on the XSEDE Comet cluster for simulations consisting of 10 time steps of 1 millisecond each, using 38 and 48 cores for rigid and deformable simulations, respectively. The XSEDE Comet cluster has 1,944 compute nodes with the Intel Xeon E5-2680v3 cores, 24 cores/node, 2.5GHz clock speed, 960 GFlop/s flop speed, and 120GB/s memory bandwidth. For more information about the Comet cluster, please refer to the XSEDE user portal. We use the restarting scheme for GMRES. We set the restart number of 200 which showed superior performance against smaller restart numbers (See Appendix A). We set the ILUT drop tolerance to 10⁻² and the fill-in level to 2. In our test, changing the drop tolerance to 10⁻⁴ and 10⁻⁶ did not significantly change the performance of the linear solver reported here. For the multigrid preconditioner we selected a maximum level of four, a Gauss-Seidel smoother, and the symmetric Gauss-Seidel for the subsmoother. We confirmed that this setting was superior to other choices for smoother and subsmoother (See Appendix B). Finally, as the node ordering affects the amount of fill-in produced by an ILU decomposition, we apply Reverse-Cuthill-McKee (RCM) reordering prior to the incomplete factorization. RCM has been shown to be effective among many reordering schemes in the solution of non-symmetric sparse linear systems (see, e.g., [5]). From our testing, ILUT with RCM reordering provided superior performance against ILUT without reordering and ILUT with METIS reordering (See Appendix C).

Fig. 1: — Schematic representations and mesh for cylindrical pipe benchmark, (top) a rigid model with parabolic inflow and outlet resistance boundary condition, (bottom) an FSI model with same boundary conditions. For each model we show a magnified view of the tetrahedral finite element mesh. The fluid and solid domains are colored in red and gray, respectively.

3.1. Rigid wall benchmark

We plot the linear solver performance measured by wall clock time in Figure 2. In Table 2, we report the number of iterations and the portion of total compute time consumed by solving the linear system. Three iterative solver tolerances, ϵ = 10⁻³, 10⁻⁶, 10⁻⁹, are tested and compared. Table 1 shows the effect of tolerance on the velocities at t = 10 ms, suggesting the velocity error norm is of the same order of the selected tolerance.

Fig. 2: — Compute times for linear solvers and preconditioners using a rigid pipe model with tolerances (top) ϵ = 10⁻³, (center) ϵ = 10⁻⁶, (bottom) ϵ = 10⁻⁹. For ϵ = 10⁻⁶, error bars are plotted by taking standard deviations from two repeated simulations. Differences between repeated simulations are caused by different computing nodes assigned by the scheduler on the Comet cluster.

Table 2:

The number of iterations and the portion of compute time consumed by the linear solver for the pipe benchmark problem with rigid walls. The number of linear solver iterations is counted for 10 time step calculations. t_T is the total compute time, t_LS is the compute time consumed by solving the linear system.

BIPN-RPC /ϵ = 10⁻³
N_bipn	t_Ls/t_T	N_gmres	t_gmres/t_LS	N_cg	t_cg/t_LS
129	89.5%	1475	5.3%	38582	77.6%
BIPN-RPC /ϵ = 10⁻⁶
N_bipn	t_LS/t_T	N_gmres	t_gmres /t_LS	N_cg	t_cg/t_LS
847	95.1%	22699	17.25%	91470	55.7%
BIPN-RPC /ϵ = 10⁻⁹
N_bipn	t_LS/t_T	N_gmres	t_gmres/t_LS	N_cg	t_cg/t_LS
1923	95.0%	247142	43.2%	791817	49.4%

ϵ = 10⁻³	GMRES		BICG
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	155544	98.8%	N/A	N/A
Block-D	147768	98.7%	30027	94.4%
ILU	2768	85.3%	3104	89.6%
ILUT	1493	88.0%	2360	92.1%
IC	3636	40.5%	3290	41.3%
ML	1080	51.9%	1054	66.7%
ϵ = 10⁻⁶	GMRES		BICGS
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	492752	99.9%	N/A	N/A
Block-D	491185	99.9%	110237	96.3%
ILU	6632	89.6%	6770	91.7%
ILUT	3382	89.8%	4045	92.9%
IC	8191	54.3%	7710	51.3%
ML	2526	64.6%	2376	76.6%
ϵ = 10⁻⁹	GMRES		BICGS
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	1252448	100%	N/A	N/A
Block-D	1251025	100%	N/A	N/A
ILU	10066	91.0%	9766	92.2%
ILUT	5132	90.6%	5468	93.0%
IC	13728	59.8%	11022	53.4%
ML	3992	69.6%	3711	80.4%

Open in a new tab

Table 1:

l²-norm of velocity errors from different residual norm tolerances. The solution error is obtained from nodal velocity errors after 10 time steps, using a reference simulation with a tolerance up to the machine precision, ϵ = 10⁻¹².

	ϵ = 10⁻¹	ϵ = 10⁻³	ϵ = 10⁻⁶	ϵ = 10⁻⁹
Err	0.2235	0.0028	1.618×10⁻⁷	1.107×10⁻⁹

Open in a new tab

Figure 2 shows that incomplete factorization preconditioners are fast and exhibit robust performance across all tolerances, irrespective of the underlying iterative linear solver. Despite similar performance for ILU, ILUT and IC preconditioners across all cases, the slightly worse performance of ILUT with respect to ILU seems to suggest that constructing a more accurate factor may lead to smaller run times than the savings in the factorization cost acheived by dropping additional fill-ins. GMRES with diagonal preconditioners (either diagonal or block-diagonal) appears to be significantly slower than other schemes, particularly as the tolerance ϵ becomes smaller. This degrading performance of standard GMRES for cardiovascular modeling is consistent with previous studies, showing that resistance boundary conditions are responsible for an increase in the condition number [10, 11]. While BICGS seems to perform better than GMRES with diagonal preconditioners, its performance becomes increasingly unstable with smaller tolerance ϵ. Furthermore, while algebraic multigrid preconditioners appear to be superior to diagonal preconditioning, they are inferior to BIPN or GMRES/BICGS with ILU. Finally, the performance of BIPN with RPC preconditioning is comparable to ILU for small tolerances (i.e., ϵ = 10⁻³) but significantly degrades as the tolerance value decreases.

From Table 2, we see that the time consumed by the linear system solvers constitutes the majority of the total compute time. In BIPN, the compute time of GMRES is significantly smaller than the compute time of CG. As the tolerance value decreases, the relative percentage of compute time for GMRES solve becomes larger. All Trilinos preconditioners show larger iteration numbers with decreasing tolerance. The relatively small percentage of linear solver compute time against the total compute time with IC and ML implies that building such preconditioners is expensive. This suggests that storing and reusing a preconditioner for several time steps could increase efficiency.

3.2. Deformable wall benchmark (FSI)

We illustrate the compute times for the deformable wall case in Figure 3 and summarize the number of iterations and percentage compute time of the linear solvers in Table 3. For FSI simulations with a tolerance ϵ = 10⁻³, BIPN with RPC shows more than an 8 fold increase in compute time compared to the rigid wall case, which also suggests the limitations of this approach for the deformable wall case. The increase of iteration number for GMRES as tolerance values decrease is notably higher than in the rigid wall case, and the percentage of compute time for the GMRES solve in BIPN is significantly higher, about 80% of the total compute time while the CG part continues to make up a smaller percentage. This suggests directions for future improvement of BIPN in the GMRES part for FSI simulations. Conversely, incomplete factorization preconditioners (both for GMRES and BICGS), exhibit good performance across all tolerances and a limited increase in compute time compared to the rigid case. Diagonal preconditioners show a comparable performance to incomplete factorization schemes at large tolerances, but the performance degrades for smaller tolerances. Among all algorithms implemented in Trilinos, the algebraic multigrid preconditioners is the slowest, while BICG with ILU appears to be the best solution scheme overall.

Fig. 3: — Compute times for linear solvers and preconditioners using a deformable wall pipe model (FSI) with tolerances (top) ϵ = 10^–3, (middle) ϵ = 10⁻⁶, (bottom) ϵ = 10⁻⁹. For ϵ = 10⁻⁶, error bars are plotted by taking standard deviations from two repeated simulations.

Table 3:

The number of iterations and the portion of compute time consumed by the linear solver for the pipe benchmark problem with deformable walls. The number of linear solver iterations is counted for 10 time step calculations. t_T is the total compute time, t_LS is the compute time consumed by solving the linear system.

BIPN-RPC /ϵ = 10⁻³
N_bipn	t_LS/t_T	N_gmres	t_gmres/t_LS	N_cg	t_CG/t_LS
608	78.6%	117377	77%	64865	15.6%
BIPN-RPC /ϵ = 10⁻⁶
N_bipn	t_LS/t_T	N_gmres	t_gmres/t_LS	N_cg	t_CG/t_LS
1580	83.9%	318272	79.4%	145183	13.4%
BIPN-RPC /ϵ = 10⁻⁹
N_bipn	t_LS/t_T	N_gmres	t_gmres/t_LS	N_cg	t_CG/t_LS
14872	83.3%	4087258	79.6%	874018	11.0%

ϵ = 10⁻³	GMRES		BICGS
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	11283	71.9%	12094	66.3%
Block-D	9127	65.1%	9438	59.9%
ILU	3900	61.2%	3655	64.2%
ILUT	3493	70.6%	3333	70.9%
IC	4089	17.2%	3873	20.02%
ML	3735	31.74%	3799	44.43%
ϵ = 10⁻⁶	GMRES		BICGS
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	33455	86.2%	45623	85.3%
Block-D	26311	81.7%	29589	77.3%
ILU	7699	70.1%	7338	74.9%
ILUT	5904	75.1%	5537	77.1%
IC	8241	28.7%	7819	32.0%
ML	7109	49.9%	7610	63.9%
ϵ = 10⁻⁹	GMRES		BICGS
PC	N_gmres	t_LS/t_T	N_bicgs	t_LS/t_T
Diag	71884	90.25%	100431	90.6%
Block-D	55392	88.2%	49695	82.1%
ILU	13801	75.1%	13093	80.0%
ILUT	9848	76.7%	9404	79.3%
IC	14615	42.2%	13858	43.4%
ML	12178	59.8%	9587	78.4%

Open in a new tab

4. Parallel scalability

Parallel scalability is investigated for two preconditioned linear solvers, BIPN-RPC and BICG-ILU, in terms of speedup (see, e.g., [6]), defined as the computing speed of multiple cores compared to a single core calculation, i.e., S_p = T₁/T_p, where T_p is the compute time on N_p cores. The ideal strong scalability performance between S_p and N_p is linear, however, sublinear scaling is expected due to communication cost.

4.1. Strong scaling

In this section, we monitor the compute time for a model with a fixed number of degrees of freedom, while progressively increasing the number of cores. We first test the strong scalability by varying the number of cores, testing 1, 2, 4, 8, 24, 48, and 96 cores for an ≈1 million element mesh (Mesh1 in Table 4) (Figure 4). We chose the number of cores as multiples of 24, to use all cores in any given node. We note, however, that one should use only between 2/3 and 3/4 of the total number of cores in a given node since the local memory bandwidth is often a bottleneck resulting in higher overhead and reduced speed improvement. In the rigid wall model, BIPN-RPC and BICG-ILU show similar performance across all core numbers as shown in Figure 2. The parallel speedup shows that both methods scale well up to 24 cores, leading to about 40,000 elements per core, while their parallel performance is reduced when running on more than 48 cores. In Figure 2, the zig-zag behavior of BIPN-RPC (FSI) reveals excessive inter-core communications and memory references. In the FSI problem, the total compute time of BIPN-RPC is larger, and the speed up is worse than BICG-ILU. BICG-ILU shows consistently good scalability for both rigid and FSI models whereas the scalability of BIPN-RPC degrades significantly in the FSI case.

Table 4:

Number of nodes, elements and non-zero entries in tangent stiffness matrix for selected meshes in scaling studies.

	Fluid
	# Nodes	# Tetrahedra	# Non-zeros
Mesh1	93,944	551,819	1,496,375
Mesh2	166,899	1,002,436	2,718,385
Mesh3	349,469	2,131,986	5,784,902
Mesh4	716,298	4,415,305	11,990,977
Mesh5	1,314,307	8,117,892	22,144,741
	Structure		Total
	# Nodes	# Tetrahedra	# Non-zeros
Mesh1	31,223	96,147	1,834,624
Mesh2	50,909	192,668	3,331,744
Mesh3	100,759	412,408	7,028,673
Mesh4	97,206	893,452	14,565,414
Mesh5	378,089	1,752,270	27,188,079

Open in a new tab

Fig. 4: — Strong scaling of BIPN-RPC and BICG-ILU for pipe benchmark with rigid and deformable walls on the 1M lumen mesh model (Mesh2). (top) compute time, *T_p*, versus number of cores, N_p, (bottom) speedup, S_p = T₁/*T_p*, versus N_p.

We conduct an additional scaling study on a refined mesh with 8M elements (Mesh5 in Table 4). We use 24, 48, 96, 192, and 384 cores and we use the N_p = 24 case as a reference for S_p. As shown in Figure 5, the BICG-ILU speedup scales almost linearly up to 192 cores, i.e., ~40,000 elements per core. In the 8M mesh, the compute time of BIPN-RPC becomes significantly larger due to a poor weak scalability, as discussed in the next section.

Fig. 5: — Strong scaling of BIPN-RPC and BICG-ILU for pipe benchmark on the 8M lumen mesh model (Mesh5). (top) T_p versus N_p, (bottom) S_p = T₂₄/T_p versus N_p.

4.2. Weak scaling

In this section, we solve models of increasing size and report the simulation time by keeping approximately the same number of elements per core. The number of nodes, elements and non-zeros in the coefficient matrix is summarized for each mesh in Table 4. We use 20, 40, 80, 160, 320 cores for meshes 1 to 5, resulting in 27,000 elements and about 4,500 nodes per a core.

BICG-ILU shows increased compute time but does not show any significant changes in scaling when increasing the number of cores. For the rigid model, BIPN-RPC shows poor scalability as the number of cores exceeds 160. This explains the loss of superior performance of BIPN-RPC against BICG-ILU in the strong scalability study with 8M mesh. Similar to the rigid case, BIPN-RPC shows performance loss after 160 cores for the FSI model.

5. Characteristics of unpreconditioned matrices

In this section, we investigate the properties of the coefficient matrices discussed in section 2.1 to better understand the performance results. With reference to the pipe benchmark, we visualize the matrix sparsity pattern and investigate its properties including bandwidth, symmetry, diagonal dominance and spectrum. We also investigate the characteristics of both global and local matrices. The global matrix contains all mesh nodes from all cores, while a local matrix contains only a subset of the nodes in the global matrix assigned to a single core upon partitioning. We report single-core, local matrix characteristics with N_lnd ~ 5000 nodes to represent the typical case of distributed discretizations consisting of ~ 25000 tetrahedral elements per core. This way we focus on detailed local information in a region of specific interest (e.g. resistance boundary), and also calculate properties of the matrix such as eigenvalues and condition numbers in a cost-effective way. We discuss properties for two groups of coefficient matrices, i.e., matrices associated with fluid flow and matrices associated with solid mechanics.

5.1. Matrix properties for fluid flows in rigid vessels

Sparsity pattern.

The structure of A_f is determined by element nodal connectivity, i.e., the non-zero columns for node a are from nodes belonging to the element star associated to a. The star of elements associated to a given node a is the set of all elements connected to a. An example of global sparsity pattern for the pipe benchmark unstructured mesh is reported in Figure 7. The node ordering starts from the outer surface and goes to the interior of the pipe, as seen from the reduced connectivity between the upper-left block and the remaining inner nodes. The density is less than 1 percent, and sparsity is more than 99.99 percent (Table 4).

Fig. 7: — Sparsity pattern of a connectivity matrix for the 1M pipe mesh. Among all matrix elements, only non-zero values are colored in the plot. (top) The *global* sparsity pattern for the whole pipe model with *N_nd* = 166, 899. (center) Reverse Cuthill-Mckee reordering of the *global* connectivity matrix. The upper-right inset is 100 magnification of the diagonal of the reordered matrix. The lower-left inset is 10,000 magnification. For each row, the left exterior plot shows the maximum bandwidth in that row. (bottom left) Frequencies for the number of non-zero elements in a row (N_nnz). (bottom right) A *local* sparsity pattern from a core with *N_lnd* = 5014.

A Reverse-Cuthill-McKee (RCM) bandwidth minimizing permutation of the same raw matrix shows a banded sparse structure illustrated in Figure 7. We report quantitative estimates for bandwidth and number of non-zeros, showing that bandwidth is approximately 1000 with a maximum of about 1500, and most nodes are connected to 15-16 other nodes.

Diagonal dominance.

A closer look at the magnitudes of entries in the global matrix A_f reveals a clear block-structure (Figure 8). The 4-by-4 block structure corresponds to the matrix blocks in equation (13). This shows that diagonal entries are larger than the off-diagonals in each block of K, suggesting that magnitudes of entries in the blocks of G, D, L are small compared to diagonals in K. This relates to the dominant contribution of the acceleration and advection terms, over the stabilization and viscous terms in (15) (see, e.g., [11]). The matrix A_f is, however, not diagonally dominant (i.e. ∣A_ii∣ < ∑_j=1 ∣A_ij∣). To show this, we quantified the relative magnitudes of off-diagonal and diagonal entries (Figure 9). Specifically, we counted the number of elements that have a certain percentage of absolute magnitudes compared to the diagonal values, showing that most off-diagonal values are less than 20% of the associated diagonal. To report quantitative estimates of diagonal dominance, we measure the mean of the relative magnitude of the sum of off-diagonal values to the diagonal,

D (K) = \frac{1}{N_{r}} \sum_{i = 1}^{N_{r}} [∣ K_{i i} ∣ ∕ (\sum_{\begin{matrix} j = 1 \\ i \neq j \end{matrix}}^{N_{c}} ∣ K_{i j} ∣)],

(29)

in which N_r and N_c is the number of rows and columns in K respectively. D increases with the diagonal dominance. D(A_f) for the global matrix is 0.514 and D(K) for the global matrix is 0.678.

Fig. 8: — A visual representation of the *global* sparse matrix Af with entries colored by absolute magnitude. (top) The full matrix A_f colored by magnitudes ranging 0 to 10⁻³. (bottom) A decomposed matrix colored by magnitudes of each sub-block. In K in the upper-left block, the colorbar ranges from 0 to 10⁻³. In G in the upper-right block and D, in the lower-left block, the colorbar ranges from 0 to 10⁻⁵. In L in the lower-right block the colorbar ranges from 0 to 10⁻⁶.

Fig. 9: — Measures of diagonal dominance. (top left) The magnitude of diagonal values in each row of A_f. (top right) The sum of absolute magnitude of off-diagonal values in each row of A_f. (bottom) Histogram of the number of matrix elements N(*A_ij*) with magnitudes that correspond to the certain percentage of the associated diagonal entry. Only elements that are more than 1% of diagonal entries are counted.

Symmetry.

We use an index, S, to quantify how close a matrix is to symmetric. We first obtain off-diagonal elements of A by subtracting its diagonal as $\tilde{A} = (A - diag (A))$ . We then decompose the $\tilde{A}$ into the symmetric part, ${\tilde{A}}_{s y m} = (\tilde{A} + {\tilde{A}}^{T}) ∕ 2$ , and the skew-symmetric part, ${\tilde{A}}_{s k e w} = (\tilde{A} - {\tilde{A}}^{T}) ∕ 2$ . The index S is defined as

S (A) = \frac{∣ {\tilde{A}}_{s y m} ∣ - ∣ {\tilde{A}}_{s k e w} ∣}{∣ {\tilde{A}}_{s y m} ∣ + ∣ {\tilde{A}}_{s k e w} ∣},

(30)

in which we use the 2-norm for ∣A∣. The index equals −1 for a perfectly skew-symmetric matrix and 1 for a perfectly symmetric matrix. As shown in Section 2.1, the matrix is nonsymmetric in the K, G, D blocks due to stabilization and convective terms, with S(A_f) equal to 0.9859. That is, S(A_f) is a nearly symmetric matrix in the analyzed regime (ideal aortic flow). Finally, L is a symmetric and semi-positive definite matrix with S(L) = 1.

Eigenvalues.

Spectral properties are widely used to characterize the convergence and robustness of iterative solvers. It is well known, for example, that the rate of convergence of CG depends on the spectral radius of a left-hand-side SPD matrix. Despite eigenvalues clustered around 1 leading to rapid convergence of iterative solvers for well-conditioned SPD matrices, the eigenvalues may not be solely responsible for the convergence rate of these solvers and other matrix characteristics may play a role [28]. Calculation of all eigenvalues (λ_i) of the global matrix for a typical cardiovascular model with order 1 million mesh elements is prohibitively expensive. In this paper we therefore report the spectrum of local matrices instead of the global matrix. For a smaller size system, we also demonstrate that the distribution of eigenvalues from local matrices is a good approximation to the distribution of eigenvalue of the global matrix (See appendix D).

In Figure 10, we plotted the spectrum of local A_f matrices from the pipe benchmark with rigid walls. The eigenvalues of A_f are complex with small magnitudes of the imaginary part up to O(10⁻⁸), while the magnitudes of the real part ranges from O(10⁻⁹) to O(10⁻¹). In Figure 10, there are three distinct groups of eigenvalues with different ranges of the real part. The first group contains those with the real part less than 10⁻⁵. The second group contains those with real part ranging between 10⁻⁵ and 10⁻². The spectrums of K and L in Figure 10 show that the third group, with eigenvalues larger than 10⁻⁵ in A_f, is attributed to the K block, while the eigenvalues of L, G and D are responsible for the group of smallest real eigenvalues in A_f. We list several minimum and maximum eigenvalues of a local matrix without resistance boundary condition (BC) in Table 3. The maximum eigenvalues of K and A_f appear to be the same, suggesting that block K dominates the high portion of the spectrum, with the smallest eigenvalues provided by the blocks L, G and D. This suggests that the large condition number of A_f ~ O(10⁶), obtained from MATLAB condest, relates to the inhomogeneous eigenvalue spectrum observed in the momentum, continuity and coupling blocks. Additionally, the small condition number of K justifies the idea behind the BIPN approach, i.e., to solve the K block separately, while expressing the other blocks in Schur complement form [11]. L is singular, and thus has zero eigenvalue and an extremely large condition number. Lastly, the resistance boundary condition is responsible for the few largest eigenvalues order of O(10⁻¹) in Figure 10, as we discuss in the next section.

Fig. 10: — Spectrum of (top) *local* fluid matrices A_f, (bottom left) L blocks, (bottom right) K blocks. All *local* eigenvalue spectrums from 38 cores are plotted together with different colors.

Effects of the resistance boundary condition.

A resistance boundary condition perturbs the condition number of the coefficient matrix A_f, and may be responsible for a significant increase in the solution time for the tangent linear system [10]. The boundary traction h, is given, in this case by

h (u, p, x, t) = - P_{i} n, x \in Γ_{h},

(31)

in which P_i is the pressure at surface i, evaluated as P_i = R_i Q_i, i.e., proportional to the flow rate Q_i across the surface

Q_{i} (t) = \int_{Γ_{i}} u \cdot n d Γ,

(32)

through the prescribed resistance R_i. Thus, the contribution of the resistance boundary condition to the coefficient matrix is

K^{b c} = \sum_{i = 1}^{n^{b c}} {\tilde{R}}_{i} S_{i} \otimes S_{i}, S_{i} = \int_{Γ_{i}} N_{a} n d Γ,

(33)

where n^bc is the number of resistance boundaries, and ${\tilde{R}}_{i} = γ Δ t R_{i}$ . K^bc is finally added to the K sub-matrix, resulting in the coefficient matrix $\tilde{K} = K + K^{b c}$ . Generalization from an outlet resistance to a coupled lumped parameter network model is accomplished using a slightly more general expression for K^bc, i.e.

K^{b c} = \sum_{k = 1}^{n^{b c}} \sum_{l = 1}^{n^{b c}} γ Δ t M_{k l} \int_{Γ_{k}} N^{a} n_{i} d Γ \int_{Γ_{l}} N^{b} n_{j} d Γ, M_{k l} = \frac{\partial P_{k}^{n + 1}}{\partial Q_{l}^{n + 1}},

(34)

where the resistance matrix M_kl is obtained by coupling pressures and flow rates at different outlets [25].

Addition of a resistance boundary condition alters the topology of the coefficient matrix due to the rank one contribution S_i ⊗ S_i. This, in practice, couples all velocity degrees of freedom on a given outlet, significantly affecting the performance of matrix multiplication for the K block and the fill-in generated by its LU decomposition. Thus, the vector S_i is stored separately, to improve the efficiency of matrix multiplication and for RPC preconditioning. Figure 8 shows how the global matrix entries are affected by the presence of a resistance boundary condition, i.e., large magnitude components in the z-directional velocity block (lower-right block in K) arise. To better highlight this effect, we show two local matrices with and without a resistance boundary condition in Figure 11. Addition of K^bc increases the contribution of off-diagonal entries moving the matrix further away from diagonal dominance.

Fig. 11: — A visual representation of (top) a *local* matrix A_f without resistance BC (bottom) a *local* matrix A_f,res with a resistance BC, R=1600g/cm⁴/s. Matrix elements are colored by their absolute magnitude. For both figures color ranges from 0 to 10⁻³.

The resistance BC perturbs the spectrum and increases the condition number of A_f. The largest few eigenvalues in the spectrum of A_f and K in Figure 10 are calculated from local matrices in partitioned domains interfacing the resistance boundary. The change of spectral properties of $\tilde{K}$ due to rank one contribution K^bc (see, e.g., [11]) is measured with maximum and minimum eigenvalues reported in Table 6. The maximum eigenvalue of $\tilde{K}$ is significantly larger than K, leading to a ~ O(10)-fold increase in the condition number of $\tilde{K}$ , thus increasing the spectral radius of the whole spectrum as shown in Figure 10. Additionally, our tests confirm that the first maximum eigenvalue A_f increases linearly with the assigned resistance. Thus, in a more general case, we expect several large eigenvalues to be added to A_f for models with multiple outlet resistances.

Table 6:

The 1-norm condition number estimates and five maximum and minimum eigenvalues (λ_i) of a local matrix ${\tilde{A}}_{f}$ and $\tilde{K}$ with a resistance BC.

${\tilde{A}}_{f}$ : Condition number=3.299×10⁷
λ_i	1^st	2^nd	3^rd	4^th	5^th
Max(×10⁻³)	86.525	3.253	3.244	3.239	3.185
Min(×10⁻⁸)	0.671	1.036	1.350	1.427	1.504
$\tilde{K}$ : Condition number=5.883×10³
λ_i	1^st	2^nd	3^rd	4^th	5^th
Max(×10⁻³)	86.525	3.253	3.244	3.239	3.185
Min(×10⁻⁵)	2.878	2.886	3.457	3.981	4.002

Open in a new tab