Skip to main content
NIHPA Author Manuscripts logoLink to NIHPA Author Manuscripts
. Author manuscript; available in PMC: 2022 Jul 1.
Published in final edited form as: Comput Biol Med. 2021 May 21;134:104507. doi: 10.1016/j.compbiomed.2021.104507

Simulation of 3D centimeter-scale continuum tumor growth at sub-millimeter resolution via distributed computing

Dylan A Goodin 1, Hermann B Frieboes 1,2,3,*
PMCID: PMC8277490  NIHMSID: NIHMS1707108  PMID: 34157612

Abstract

Simulation of cm-scale tumor growth has generally been constrained by the computational cost to numerically solve the associated equations, with models limited to representing mm-scale or smaller tumors. While the work has proven useful to the study of small tumors and micro-metastases, a biologically-relevant simulation of cm-scale masses as would be typically detected and treated in patients has remained an elusive goal. This study presents a distributed computing (parallelized) implementation of a mixture model of tumor growth to simulate 3D cm-scale vascularized tissue at sub-mm resolution. The numerical solving scheme utilizes a two-stage parallelization framework. The solution is written for GPU computation using the CUDA framework, which handles all Multigrid-related computations. Message Passing Interface (MPI) handles distribution of information across multiple processes, freeing the program from RAM and the processing limitations found on single systems. On each system, Nvidia’s CUDA library allows for fast processing of model data using GPU-bound computing on fewer systems. The results show that a combined MPI-CUDA implementation enables the continuum modeling of cm-scale tumors at reasonable computational cost. Further work to calibrate model parameters to particular tumor conditions could enable simulation of patient-specific tumors for clinical application.

Keywords: 3D tumor model, cancer simulation, distributed computing, parallelized computing, continuum models, mixture models, CUDA, MPI, openMP

Graphical Abstract

graphic file with name nihms-1707108-f0001.jpg

Introduction

Representation of tumor growth in clinically-relevant contexts has generally been explored via three main types of models: continuum models that simulate tissue-scale behavior, discrete models that define individual cells and their interactions, and hybrid models utilizing a combination of both approaches. These efforts have been traditionally constrained by the computational cost to numerically solve the associated equations, with the results limited to representing mm-sized or smaller tumors. For discrete models the challenge has been to simulate billions of cells and their interactions, while for continuum models the cost of representing cm-scale domains becomes computationally prohibitive. In particular, models based on continuum mixture theory to simulate tumor growth have been developed22, 23, 32, 43, 67 and analyzed18, 21, 30, building upon earlier work to represent tumor tissue as different phases of a mixture1-3, 6, 7, 13-16, 19, 26, 27, 46-49, 52, 59, 60, 62, 63. However, more complex continuum models have struggled to achieve high performance simulations at patient-scale (cm) resolution.

Lorenzo et al. used a continuum two-phase model to simulate a prostate tumor with 2.66cm3 volume from CT-scan37. Antonopoulos et al represented a 4.2cm3 domain for 3 simulated months with 2.2mm3 resolution5. While both models reached cm scale, multispecies representation and vascularization were not incorporated. Wise et al. developed an adaptive multigrid framework for simulating a continuum multispecies tumor model using a single-core computer process, finding that time required to simulate a single day of tumor evolution at 1*10−2 days per time step increases from ~12min during early time steps to ~400min by end of simulation66. In29 the model of67 was coupled with a lattice-free random walk angiogenesis model4, 39, 44, 45. Recently, a mixture model with continuum 3D representation of tumor, vasculature, and extracellular matrix (ECM) was presented in40, 41. Open Multi-Processing (OpenMP) parallelization benefits were offset in40 by increased model complexity: early model performance was 156min per simulated day to ~280min per simulated day for 1*10−2 days per time step. In these models, coupling of tumor and vasculature in a biologically realistic 3D representation to simulate clinically-relevant tumor growth incurs a high computational cost. Consequently, the numerical implementation to solve the coupled equations has hindered these models from reaching practical application, especially in terms of simulating patient tumor response to potential courses of treatment in a timely manner to drive clinical decision-making.

Outside of the context of continuum models, several parallelized implementations have been developed over the past decade to improve performance. In64, a tumor model parallelization saw a 5.2x performance increase over a single-process approach using eight processors. OpenMP implementations have improved tumor modeling performance, as shown in31, 65. An early effort at parallelizing a Cellular Potts model used Message Passing Interface (MPI) but remained a 2D simulation20. Models have benefitted from multiple approaches, including an MPI-based parallel solver named NAStJA10-12 and Compute Unified Device Architecture (CUDA) based solvers56, 57. A near 30x uplift over CPU-based implementations using a CUDA-based solver was seen in56. Likewise, cellular automata tumor modeling has benefited from CUDA and CPU-based parallelization approaches53, 54, 58. A tumor simulation using finite element methods leveraged an MPI framework to attain ~4x performance improvement by spanning the simulation across 16 processes24. Performance gains for a finite-element method were also realized using Galois, a software package that employs an amorphous data-parallelism model38. Recently, a hybrid model was parallelized using the framework to simulate ~1 cm3 melanoma evolution35. Of note, Antonopoulos et al.’s continuum model in MATLAB emphasized macroscopic tumor phenomena, simulating a cubic 4.2cm length domain at 2mm3 resolution. By simulating fewer equations at a lower resolution than 40, the model was capable of simulating ~3 months of tumor evolution in 10-12min 5.

Complementing these previous efforts, this study presents a distributed computing implementation of the mixture model in40, 41 via a combined MPI-CUDA implementation to simulate cm-scale vascularized 3D tumor growth tissue at sub-millimeter resolution.

2. Materials and Methods

2.1. Model of Tumor Growth

We fully parallelize the continuum 3D model presented in40, 41, which used openMP. Briefly, the model simulates evolution of a single tumor cell phenotype in an environment with host cells and ECM. Tumor tissue vies for resources against healthy tissue while balancing the need for nutrients, metabolites, and ionic species, including oxygen, carbon dioxide, lactate, bicarbonate, sodium ions, chloride, and H+ ions. Crowding in a limited tissue space is abstracted into solid mass pressure and pressure from surrounding fluids. These pressures drive velocity in the solid tissue mass and create buildup of elastic energy on the surrounding ECM. Matrix degrading enzymes and myofibroblast concentrations increase due to remodeling of surrounding ECM to compensate for strain induced by tumor growth.

During tumor growth, tissue distal from vasculature can be deprived of resources. The tumor releases angiogenic factors to encourage growth of surrounding vasculature towards hypoxic tissue. Increased vessel leakiness has been well-documented from such relatively quick vasculature changes; the body compensates for edema by increasing lymphatic growth55. Therefore, the model simulates lymphatic growth with independent terms to the vasculature, although both are closely related mathematically and physiologically. However, vasculature effectiveness is limited physiologically by the diffusion rate of oxygen. Thus, interior hypoxic regions in sufficiently large tumors will operate at varying levels of anaerobic glycolysis, building up lactic acid. In a sufficiently hypoxic state, tumor cells become apoptotic or necrotic, represented as dead cell volume fraction.

Numerically, this model is solved using a geometric multigrid solver. At its finest multigrid level, the solver uses evenly-spaced points to define model solution resolution. By increasing the number of points per side of the cubic domain and using a point-to-point distance <100 μm, sub-mm precision is retained while increasing the domain size beyond a centimeter on a side. At each point on a cubic domain, a solution for model variables is generated, with solution generation occurring after θ units of simulated time elapse. Key equations in nondimensionalized form and the numerical solver are summarized further in Supplement.

2.2. Limitations of openMP-based solver

Three limitations of openMP-based solver in40, 41 include:

  1. When tested using 1283 grids, maximum performance was obtained using only 8 cores out of 32 on a 32-core processor on the University of Louisville Cardinal Research Cluster (CRC), potentially due to insufficient memory bandwidth. Further testing on an AMD 2990WX exhibited more promising results, indicating that nascent CPUs may fare better from openMP. However, limitations to core counts would further constrain gains.

  2. openMP is a shared-memory architecture that runs on non-distributed systems, limiting performance gains to a single PC, workstation, or High Performance Computing (HPC) node.

  3. Many PCs have insufficient RAM to hold larger tumor model spaces. Table 1 summarizes expected RAM footprint for varying model sizes.

Table 1 –

Memory footprint for varying model sizes using model in40, 41.

Level Size
2563 5123
Points on a Side 130 258
Maximum Level size Simulated (#Points on a side) 256 512
Upper Bound RAM Required per process with eight processes on the finest level with an equal distribution of level data (GB) 3.3 25.5
Total RAM Required for single process on the finest level (GB) 13.6 107.6
Maximum spherical tumor diameter that could be simulated with 50 μm point resolution (mm) 12.8 25.6

2.3. Distributed computing solution

To simulate tumors at patient tissue cm-scales, model in40, 41 requires sufficient computational resources to function at a 5123 sized domain and, according to Table 1, over 100GB RAM are necessary. Because single-socket computers do not typically possess this much RAM, a new solution generator is required for long-term parallel computing.

For this purpose, this study implements the numerical solving scheme of41 using a two-stage parallelization framework. First, numerical computations were rewritten for GPU computation using CUDA. This framework handles all Multigrid-related computations, including Gauss-Seidel red-black smoothing, restriction, prolongation, and error correction. MPI handles distribution of information across multiple processes, freeing the program from the RAM and processing limitations found on single-system parallelization frameworks. On each system, Nvidia’s CUDA library allows for faster processing of model data using GPU-bound computing on fewer systems. Thus, the new model framework is a two-part MPI-CUDA model.

The type of simulation being considered, generally known as HPC, requires consistent communication between multiple data processors Architectures configured for Big Data, in which data processors are designed to perform tasks at a coarser resolution, are unideal for datapoint-level communication8. Further, common Big Data platforms, such as Hadoop and Apache Spark, rely on either disk-based queries or exhibit possess significantly more overhead than comparable MPI-based HPC implementations, respectively, making MPI a more viable distributed computing framework for our purposes8, 17, 25, 51.

Overall algorithm in MPI-CUDA tumor model is identical to model in 40, 41, save that the conditions for block generation have changed. Under the previous framework efficiency was defined as η=#PointsinF+1t,r1#PointsinB+1 where the set of all flagged points in level at time step t and solver iteration r − 1 is represented as Ft,r1 and the set of all points within blocks in level is represented by B. To prolongate to a new level, η had to be lower than a pre-defined cutoff efficiency. In the new framework, the decision process is simplified to an all-or-nothing behavior where a single flagged point on will cause the solver to operate over the entirety of the domain on level + 1 (i.e., Ω + 1). This behavior can be interpreted as creating a block B + 1 whose size is determined by prolongating block B = Ω using the prolongation operator function P+1(X) for some set of points X on . This decision can be summarized as Ft,r1B+1=P+1(B)=Ω+1. Memory management is thus greatly simplified, since the solver either finishes at level or processes level for a given time step. Consequently, this decision also increases workload on levels where only a subset of Ω requires smoothing.

While this method simplifies memory management, it can sacrifice solution accuracy. Residual error is calculated as:

RL(ψt,r)B=1BpBvV(Rp,vLp,v)2 (1)

where RHS and LHS solutions are Rp,v and Lp,v, respectively, for all points p in block B+1 and variables v in the set of all tumor model variables V. R and L are the RHS and LHS model terms on ℓ, respectively and ψt,r is the variable solution on ℓ at time step t and solver iteration r. When size of B+1 is not fit to the flagged points, sensitivity to local error is decreased. Thus, model error will be artificially high. This was corrected by redefining p to fit the set of all flagged point Ft,r1:

RL(ψt,r)Ft,r1=1Ft,r1pFvV(Rp,vLp,v)2 (2)

This method allows for easy memory transfers from CPU to GPU while retaining solution accuracy.

2.4. Model architecture

Flow of information during execution differs from the previous architecture. MPI implementation has two classes of processes:

  1. Administrative process (AdP). Responsibilities include construction of model domain and decisions pertaining to solution convergence. There is only one process designated as AdP within MPI-CUDA runtime.

  2. General Computation Process (GCP). GCPs take up a non-overlapping cubic region in Ω. Each can operate on more than one level as designated by AdP at start of model execution.

Algorithm 1 summarizes the process for any computation function X that is neither restriction nor prolongation. Before synchronization, each GCP must unload its corresponding GPU Pm containing unsynchronized data before executing X on GPU Pn. Preceding execution of function X on level ℓ, all data across GCPs is synchronized to avoid race conditions. Of note, in Algorithm 1 the binding rules for GPUs Pm and Pn are left to the implementer. Ideally, processes are bound in a non-overlapping fashion to a single GPU. That is, two GCPs g and h are the same if and only if mg = ng = mh = nh, but hardware limitations may require an overlapping allocation in which multiple MPI processes share GPU resources.

Algorithm 1 –

Run function X on GCP g on level ℓ.

RunFunction(X, g, ℓ, m, n){Select GPU PmIf GPU Pmcontains unloaded data addressed to GCP g{UnloadΩdata from PmSynchronizeΩwith al GCPs on level ℓ}Load level Ldata associated with GCP g onto GPU PnRun X on Pn}

Processes are applied to level sequentially filling a single region of the model in a manner depicted by Figure 1, in which level , level +1, and level +2 operate over the same domain object, represented by the triangle. Level ℓ contains a single process. Adapting a method of hierarchical process filling proposed by50, on level +1 three additional processes are required to process level +1. All four processes, including region 1 on level +1, restrict to region 1. Same relationship exists between levels +2 and +1. One-eighth of domain covered by a single GCP unit in level ℓ is retained locally while other 7 parts of Ω+1 are sent to seven other GCPs. Thus, amount of work increases linearly with number of levels, since processes on each level after and including level ℓ have same domain size50. This also means that each GCP on a previous level must operate on the final level max. Scaling this approach for 3D, total amount of processes required is:

Processes Required={(1,n0>m0);8(m0n0),n0m0} (3)

where n0, m0N, the finest level ℓmax has 2(m0) points on a side, and each process holds 2(n0) points per side per level with maximum RAM usage. Thus, for n0=m0−1 ⇒ #Processes = 8(m0−(m0−1)) = 8 processes. Because a portion of computational work remains on every finer level after a process is first introduced, processes are utilized to a greater degree over a non-hierarchical filling method.

Figure 1 –

Figure 1 –

Multilevel Nodal Geometry on sequential levels ℓ, ℓ+1, and +2. Processes on level ℓ, ℓ+1, and +2 operate over equally sized datasets regardless if operating on Ω, Ω+1, and Ω+2, respectively. This is because the simulated distance between points is halved on level +1 and halved again on level +2, thus the density of information keeps pace with the addition of more GCPs. This approach is extended in this work to a three-dimensional simulation domain. During restriction, GCPs locally restrict their domain data and consolidate their information in the −i and −j direction to nodes marked with G or G+1. Prolongation reverses this process by transferring G data along the +i and +j direction.

At beginning of model execution, a single AdP is designated. AdP starts by defining process boundaries determined by the maximum sized domain that each GCP can contain. To agree with the domain Ω, cubic domain ΩD for each GCP has a side length 2k, where k0 + 0. Value of k can be specified at runtime or be empirically derived by hardware availability. Resulting size is the fundamental size for each GCP. Consequently, coarsest level 0 may define a domain Ω0 that is larger than a single GCP.

For process n operating over a subset of Ω, denoted Ωn, a set of GPUs is paired with process n to process Ωn. For this study, we assume Ωn is cubic. If required by hardware constraints, Ωn is subdivided into subdomains ωj that are sufficiently reduced to fit in GPU RAM. Subdomains have following properties for m subdomains on level :

  1. ωjΩn,j{1,,m}

  2. ω1ω2ωm=

  3. ω1ω2ωm=Ωn

  4. ωj,j{1,,m}

If a single GPU has enough RAM to hold Ωn, then m = 1. Because of stencil operations, a one-point shell layer around each subdomain is required. Next, GPUs receive relevant constant terms from the model, e.g., point spacing on level ℓ and domain dimensions. Finally, function X is called on all GPUs. After computation, data are unloaded as required to allow data to synchronize between all Ωn on Ω. Due to memory transfers from GPU to CPU, this process constitutes the bulk of this method's overhead.

2.5. Data synchronization

When syncing data across GCPs, there are three vectors that must be defined: (1) a syncing vector, S, (2) a process vector, N, that points from self to an adjacent GCP, and (3) a data vector, D, for directing synchronized information to the correct cubic feature (i.e., face, edge, or corner). Because processes are arranged as cubes in a Cartesian grid, there are 26 possible syncing directions for each GCP. Described by graph theory, each GCP forms a star graph S26 with its neighbors. Any MPI send-and-receive operation is a two-step process, in which any link (u, vm) for m ∈ {1,2, … ,26} from center node of star graph u to vertex vm must be traversed in both directions. For maximum performance, perfect matching is desirable, meaning that on level , half of GCPs are sending data and half of the GCPs are receiving data during the synchronization command. In addition, at any given moment of synchronization, any chain of successive links on ℓ must be acyclic to prevent hanging. Synchronization process in this model, therefore, has two objectives: (1) creation of a unified timing structure that ensures synchronization across all nodes on level ℓ without program hanging and (2) derivation of D and N at each link in star graph.

On each GCP every value in a 3x3x3 syncing stencil is cycled through in a preset order. With the center process of the stencil as the center of a GCP’s domain, each stencil cell represents a cubic feature. A syncing vector S points from the origin to the cubic feature represented by an index of the stencil, representing a link on the star graph. MPI synchronization commands used in this framework do not resume execution until sending and receiving operation is completed. Thus, by cycling through all possible syncing vectors in a set order on all GCPs, every S at a given step of the syncing process will be parallel, ensuring that the vector field of all syncing vectors has zero curl and, hence, fulfilling objective 1. For a given S, the GCPs send data in a checkerboard pattern, with one half of the GCPs acting as senders and the other half as receivers. For a sending GCP s operating in Ωs and a receiving GCP r operating in Ωr, s sends the cubic feature indicated by S=Ds=Ns to the receiver whose Dr=Nr=S. Then, the sending/receiving roles are reversed so that a cubic feature Ωr is sent to Ωs, giving both s and r the data required to update their respective cubic feature. This process is repeated for all S such that any GCP n on ℓ can perform stencil operations anywhere in Ωn.

While D and N are parallel to S for interior synchronization events, syncing events on the border of Ω cubic features that do not correspond to the syncing stencil. In these situations, vectors N and D are derived from projections of S, thus linking objective 2 to objective 1. This allows the model to consistently synchronize information across all GCPs on ℓ without interaction from AdN and without forming cyclic subgraphs.

In the case of restriction, information must be consolidated from GCPs that exist on levels greater than or equal to +1 to GCPs that operate on both level ℓ and level +1. As represented in Figure 1, the filling method creates 2x2 squares of GCP domains. Each square contains a single GCP (G) whose operating domain spans partitions of both ℓ and +1. G’s domain is at the minimum (i,j) corner of the 2x2 square. Restriction is performed locally on each GCP on +1, and the results are consolidated along the j-axis first followed by the i-axis at the corresponding G. For each 2x2 square, this process moves all restriction information to each G while parallelizing the restriction process. Likewise, prolongation involves distributing level ℓ data to all the corresponding GCPs on level +1. Distribution process reverses the consolidation process by distributing first from the G along the i-axis and then the j-axis. Prolongation calculations are then done locally on all GCPs on level +1. On level +2 the restriction and prolongation processes scale to include nodes from both level ℓ (G) and nodes on level +1 (G+1). For this 3D model the preceding restriction and prolongation processes were scaled to a 2x2x2 cube region for each G.

2.6. Performance timing

All timing results for openMP vs. CUDA test and MPI tests were obtained using time.h clock statements and operated on a reference homogenous tumor shape with heterogeneous vasculature created for this model runtime. Computer used for comparing openMP to MPI-CUDA framework has AMD 2990WX 32-core processor, two Titan RTX GPUs with computation load placed allocated to the non-display GPU, and 128GB of DDR4 RAM at 2666MHz. Both GPUs were set to WDDM mode. CUDA test case consists of two MPI processes: one AdP and one GCP. When running on a single PC, negligible overhead occurs from process communication thus a two process MPI-CUDA runtime is akin to a single process CUDA task, differing only in slight overhead due convergence decisions and process initialization steps. Furthermore, MPI-CUDA runtime was configured to run on a single, non-display Titan RTX GPU. Parameters other than time step size and tolerance were same as41. Time step size and tolerance for openMP and CUDA-only tests was θ = 5*10−3, and τmax = 1*10−3, respectively. Lastly, each University of Kentucky’s Lipscomb Compute Cluster (LCC) node used for MPI tests comprises two 20-core Intel Xeon 6230 processors, 192GB of RAM, and four Nvidia V100 32GB GPUs.

2.7. Simulation of cm-scale tumors with sub-mm resolution

Two large tumors were simulated: (1) ~1 cm tumor in a 2563 domain, and (2) ~2 cm tumor in a 5123 domain. The two simulated tumors are identical in shape to the performance tests, being homogeneously defined with an initial volume fraction ϕ~V=0.65. The shape was defined using a combination of sinusoidal functions and bivariate normal distributions. The initial shape and other initial conditions are included in Supplement. This domain size was derived from the diffusivity of oxygen (10−5 cm2/s 28, 42). Both 2563 and 5123 simulations operated with resolution of 50 μm lengths (1.25 * 10−4 mm3). Eight nodes on the LCC with eight CUDA-solving processes per node were used for both simulations. Screenshots were taken for the initial state, 167 time steps into the simulation (5 simulated days) and 267 time steps into the simulation (8 simulated days). Table 2 lists computational solver parameters used for both domains.

Table 2 –

Computational parameters from41 used for tumor simulations. Initial values are set pre-model runtime. Values not listed are same as41.

Parameter Description Value Assigned for cm-Scale Runs
2563 5123
global Finest level that always spans Ω 2
max Finest grid used for Ω 4
σ Tolerance reduction factor from level ℓ to ℓ+1 1.10
θ Time step size (days) 3*10−2
γ Cycle Index (1 for V-cycle, 2 for W-cycle) 1
τmax Solution Tolerance for level ℓmax 2*10−3
v0, v1, v2, vb Preset, number of smoothing steps 4,2,2,2
rmax Maximum number of Smoothing Steps before Divergence Exception is raised 15 45
C Maximum gradient difference allowed for Universal Gradient Test for the FLAG routine on ϕV. 0.05
d Number of cells inward from the bound to include in near-boundary extra smoothing steps (for zero-indexed level ℓ) 2

To increase model metabolite stability in the charge-balance equation, sodium concentrations were introduced throughout the model domain. A small increase in concentration of carbon dioxide, lactate, bicarbonate, and H+ at the borders was applied to ensure convergence. As shown in Figure 7, the distance traveled by molecules before being absorbed by blood vasculature is significantly smaller than the distance between the tumor and the domain borders. This change, then, does not significantly affect the outcome by later points in the simulation. Before vasculature formation, the molecular species travel closer to the model borders, as noted in the smaller 2563 case. To more accurately simulate a larger tumor mass and to provide sufficient oxygen and glucose to the model domain, more mature vascularization was created at the model borders and within the domain. Biological variables and parameters for long term 2563 and 5123 simulations are in Supplement.

Figure 7 –

Figure 7 –

Evolution of metabolism-related variables for ~2cm diameter tumor in 5123 domain. (A) Glucose. (B) Lactate. (C) Hydrogen ion (H+). (D) Bicarbonate

2.8. Statistics

All timing results were obtained for at least n=3 simulation runs, with error bars representing 95% confidence interval (CI) of average value.

Results

3.1. Comparison of openMP and single MPI-CUDA process: CUDA contribution to model speedup

Tumor simulated for openMP vs. CUDA-only test was a 1283 domain with initial condition in Supplementary Figure 1A. Domain size was cubic with 4mm side length; thus, resolution of simulation was 31.25μm (from diffusivity of oxygen, 10−5 cm2/s 28, 42). Parameter values were as in40, 41. Each coarser level ℓ-1 had twice the distance between points as its corresponding finer level ℓ. From Figure 2A, a 14.7x performance increase of CUDA over openMP was seen for first-time step. In the second time step, our CUDA framework was 7.9x times faster than openMP. Due to corrections made in MPI-CUDA framework, this approach converged with fewer cycles than original openMP implementation. It is probable that convergence would be improved in openMP-based code if flux term changes were applied. However, we verified performance improvement by evaluating time per smoothing step. Figure 2B shows 10.7x improvement over original openMP implementation. Further, because of adaptive grid methods described in41 that were used in openMP Multigrid algorithm, only a subset of the domain was solved over on the finest two levels of simulation; thus, openMP spends more time doing less computational work than our CUDA framework. Finally, because AMD 2990WX possess 32 cores, it is difficult to find current single socket computers capable of equivalent performance. Although it is possible that AMD 2990WX memory bandwidth may not fully utilize all CPU cores as effectively on a multidie CPU system, the distinction is unlikely to close the performance gap. Consequently, it is reasonable to expect that performance improvements of CUDA over openMP will scale across other platforms.

Figure 2 –

Figure 2 –

Performance comparison for first two time steps between openMP and single-process MPI-CUDA instance in 1283 domain. (A) Execution time; (B) V-cycle solvers. Error bars are 95%.

Model accuracy was ensured by comparing model input to openMP numerical solutions in40, 41. To ensure model consistency across varying numbers of MPI thread counts, initial conditions and end-state after two time steps were compared between separate runs using SHA-256 hash. All algorithm behavior is represented in first two-time steps; thus, comparing first two time steps is sufficient to confirm solution integrity. This hash was created with printouts of volume fractions, pressures, metabolites, growth factors, and other model variables. Matching hashes implied that integrity of solving process was not impacted during development or by varying thread counts. This analysis also confirmed that MPI synchronization produced equivalent results for 1, 8, and 64 GCPs.

3.2. MPI Contribution to Model Speedup

To confirm that MPI increases performance over a single GCP MPI-CUDA instance, the LCC was used. Using same initial condition (Supplementary Figure 1B) and resolution of 31.25μm, domain size was increased to a cube with 2563 interior points (2583 including border points). Thus, domain size was cubic with 8mm side length. All computational parameter values differing from40, 41 are in Table 3.

Table 3 –

Computational parameter values for CUDA-MPI tests. All other parameters were retained from 40, 41.

Computational
Parameter
Description openMP vs. MPI-
CUDA Single Process
MPI-CUDA
LCC
nmax Edge length of finest cubic domain Ωmax 128 256
Domain Size Side length of Ωmax (mm) 0.4 0.8
σ Tolerance reduction factor from level ℓ to ℓ + 1 4.0
τmax Solution Tolerance for level ℓmax 0.001

Triplicate tests were performed at three different numbers of MPI processes:

  1. Two process MPI-CUDA case: one AdP and one GCP. This test was akin to that used on AMD 2990WX test and baseline for cases 2 & 3.

  2. Nine process MPI-CUDA case: one AdP and eight GCPs. For an eight GCP setup the finest model domain was split into octants, with each process operating over a single octant. In this setup, each LCC Restriction operation from max to max − 1 maps eight GCPs to a single GCP for processing. One LCC node was used with two MPI-CUDA processes assigned to two of the four available cluster GPUs.

  3. Sixty-five process MPI-CUDA case: one AdP and 64 GCPs. For this setup, an extra layer of 64 nodes is added at the finest level. Restricted information is sent to layer − 1 containing eight of the 64 GCPs. Further restrictions to levels − 2 and coarser behave identically to case 2. Four LCC nodes were used with four MPI-CUDA processes assigned to each of the 16 available GPUs.

Single process case used same test case in openMP vs. CUDA test but scaled to larger domain. Eight-GCP and 64-GCP cases were distributed to two and four nodes, respectively. To measure effect of process density of each GPU on performance, for two groups, (1) 8 GCPs and one AdP and (2) 64 GCPs and one AdP, the number of nodes available was doubled; thus, 8-GCP-Low-Density test had two nodes with four GCPs per node while 64-GCP-Low-Density test had eight nodes with eight GCPs per node.

Averaging the ratios in timing results when moving from 8 GCPs to 64 GCPs for time steps 2 and 3 in Figure 3, a total improvement of 5.3x is observed. Multi-process allocation can bottleneck due to competition for memory bandwidth and simulation speeds. In eight-GCP case, four GPUs held two GCPs each; 16 GPUs held four processes each in the 64-GCPs case. To quantify performance lift by redistributing processes across more GPUs, two extra runs were performed with eight GPUs running one process apiece, increasing performance by 1.2x over the original case. This decreased the performance impact of switching to 64 processes from 2.8x in four GPU case to 2.4x in the eight GPU case. Similar to 8-process case, moving to 64 processes at 2 processes per GPU (for a total of 32 GPUs) increased performance 1.3x over 64-process runtime with 4 processes per GPU (16 GPUs). In total 64-GCP-Low-Density distribution outperforms CUDA-only distribution (1-GCP) by 6.7x. Combined with gains with CUDA over openMP, MPI-CUDA framework has capacity to simulate larger tumor masses in a distributed manner at speeds not possible under the previous framework. Furthermore, larger scale simulations benefit from increased resource availability, making 64 GCPs the selected distribution method for 2563 domain and 5123 domain simulations. However, there are diminished gains for scaling across more nodes, suggesting that this approach may weakly scale to project size.

Figure 3 –

Figure 3 –

Mean performances per time step for MPI-CUDA processing a 2563 domain. Error bars are 95%.

3.3. Simulation of cm-scale tumors

The 2563 simulation that ran on 32 V100 GPUs took about 2h real time to simulate 5 simulated days with an average time per time step of 43.2s. An additional 93.8min were required to reach 8 simulated days. For 5123 simulation, same 32 V100 GPU setup took ~31.5h to reach 5 simulated days at average rate of 11.3min per time step. Additional 26.5h were required to reach 8 simulated days. Here, average rate per time step increased to 15.9min per time step. Figure 4 shows 5123 domain simulation of ~2cm diameter tumor at 5 and 8 simulated days. Viable and dead tissues are evident. Pronounced release of tumor angiogenic factors (TAF) is triggered by hypoxia, which leads to angiogenesis and growth of blood vasculature (Figure 5). However, both blood and lymphatic vasculature concentrations decreased overall. Cellular respiration leveraged increased oxygen supply, thereby raising carbon dioxide concentration. ECM concentration remained relatively stable (Figure 6). As tumor mass compressed internally, matrix degrading enzymes (MDE) concentration shifted away from periphery, explaining local ECM loss at i=1.25 cm plane. Decline can also be attributed to lower concentration of myofibroblasts in inner tumor. Because myofibroblasts are created by the model within ECM and become necrotic at low oxygen levels, their concentration remained relatively stable from 5 to 8 simulated days. Meanwhile, a layer of higher viability tumor mass formed near vasculature in the peritumoral space. Negative pressure from tumor and ECM necrosis shifted the viable tumor layer towards interior regions, distancing this viable tissue from blood vasculature. This layer became necrotic and is present at both 5 simulated days and 8 simulated days. While tumor growth factor (TGF) concentration rose in peritumoral range over the 3 simulated day period, encouraging increased tumor proliferation at periphery, MDE concentration decreased locally at i=1.25 cm plane.

Figure 4 –

Figure 4 –

Evolution of ~2cm diameter tumor in 5123 domain at simulated (A) 5 days and (B) 8 days. Viable, dead, and tumor angiogenic factors (TAF) are shown (plane jk).

Figure 5 –

Figure 5 –

Tumor vessel evolution for ~2cm diameter tumor in 5123 domain. (A) Blood vasculature. (B) Lymphatic vasculature. (C) Oxygen (D) Carbon dioxide

Figure 6 –

Figure 6 –

Tumor matrix evolution for ~2cm diameter tumor in 5123 domain. (A) Extracellular matrix (ECM). (B) Matrix degrading enzymes (MDE). (C) Myofibroblasts. (D) Tumor growth factors (TGF).

The highly hypoxic nature of the tumor resulted in a persistently high H+ and lactic acid concentration from bicarbonate buffer and anaerobic glycolysis (Figure 7). Glucose, being necessary for both aerobic and anaerobic glycolysis, is scarcer in the internal tumor portions and continues to decrease in peritumoral region over time. Carbon dioxide, being formed by aerobic glycolysis, was consumed, in part, by the bicarbonate buffer, increasing bicarbonate prevalence.

Simulations of ~1 cm tumor in 2563 domain generally yielded similar results as the larger tumor in 5123 domain using same parameter values (Figure 8). After an initial drop in mass due to lag in angiogenic response, both tumors assumed a growth pattern by 8 simulated days. Despite an overall decrease in density in the interior portions of the tumor, the 5123 ϕ~V=0.1 isosurface exhibited a growth rate of 2.6% volume per day. The more robust (blood and lymphatic) vascularization of the smaller tumor is evident at this timepoint earlier than in larger tumor, as are higher TAF and MDE concentrations.

Figure 8 –

Figure 8 –

Rate of change of tumor variables (%change/day) at 8 simulated days for (A) 2563 domain and (B) 5123 domain. From left to right: total tumor; viable tumor; dead tumor; TAF (tumor angiogenic factors); blood vasculature; lymphatic vasculature; O2 (oxygen); CO2 (carbon dioxide); ECM (extracellular matrix); myoFB (myofibroblasts); MDE (matrix degrading enzymes); TGF (tumor growth factors); GLC (glucose); LAC (lactate); H+ (hydrogen ion); HCO3 (bicarbonate).

Discussion

This study implements a distributed computing (parallelized) implementation of the mixture model in40, 41 to simulate 3D continuum tumor growth at cm-scale at sub-mm resolution. Compared to previous work, the model here accounts for a richer set of biological phenomena, simulating ECM-tumor interaction, blood and lymphatic vasculature evolution, metabolic consequences of anaerobic respiration, acidity induced via the bicarbonate buffer, and secretion of diffusible factors in response to hypoxic conditions. The results highlight how identical parameter sets perform at in different domain sizes, 2563 and 5123, suggesting future work required to fine-tune parameter sets best suited for large-scale growth.

The CUDA-MPI approach improves the model performance over the previous openMP approach41 by ~50x. This value comes from accumulating the benefits seen across the smoothing test from openMP vs. CUDA (Figure 2B) and the step time duration between 1-GCP vs. 64-GCP test (Figure 3). Using the 1-GCP vs. 64-GCP-Low-Density increases total performance improvement to ~70x that of the previous openMP approach. Because of differing testing approaches and non-equivalent hardware, these performance improvements cannot be directly compared to other tumor modeling approaches. Nevertheless, cumulative improvements demonstrated here are in similar league to those in24, 56, 57, demonstrating immediate benefits of using CUDA-MPI over openMP. To our knowledge, these results mark the first time a multigrid 3D continuum tumor model has been fully parallelized. Based on the performance metrics obtained, we anticipate future work to simulate tumor sizes as in35. These improvements are also consistent with Navier-Stokes Multigrid simulations where CPU-side parallelization across 64 processors saw a 50x improvement in speed9. A different Navier-Stokes solver using an MPI-CUDA framework achieved a 21x performance uplift over an 8-core Intel Xeon baseline using 2 GPUs and a 130x performance uplift using 128 GPUs34. Huang et al. created an MPI-CUDA framework to implement a Sparse Equations and Least Squares method for use in seismic tomography. Their results report a 37x performance uplift using 60 CPU cores relative to a single core baseline; using 60 GPUs nearly doubled performance over their 60 CPU results33. These values suggest that our model’s performance benefits are on the same order of magnitude as similarly parallelized problems.

Continuum tumor mixture models, due to the numerous interwoven phenomena simulated, have many guiding equations, leading to multiple variables and quantities to evaluate and compute. As such, the memory required per point in model domain at level ) may be significantly higher than the raw variable count suggested in the case of Navier-Stokes equations. RAM constraints on GPUs become increasingly difficult to navigate as biological precision and generalizability are pursued by more specialized model equations. While MPI can involve more GPUs and lower the per-GPU RAM requirements, we recognize that this study used Titan RTX and V100 GPUs, both of which possess over 20 GB of RAM. GPUs with lower RAM capacity individually and in aggregate would with current technology have difficulty running highly-detailed continuum tumor models, potentially relegating such models to high-end desktop PCs and, in the case of cm-scale modeling, to larger compute clusters.

While the parallelization performed on the tumor model is significant, further development is required before deployment in a clinical environment. First, a more finely-tuned parameter set could help achieve persistent intra-tumoral increasing concentrations, such as ECM, required to simulate large-scale particular cancer types. Second, this process structure has no fault tolerance. If a single GCP were to fail to respond, the program would exit without completing the model. Fault tolerance has already been implemented in Big Data cluster libraries, such as Hadoop and Apache Spark8, 51. Thus, future implementation may draw from techniques used by these Big Data frameworks.

With the addition of Multigrid technologies such as adaptive grid meshes, computational workload would be reduced and would increase model performance. For some problem sizes, a CUDA-MPI framework may not be optimal due to the overhead of passing data for processing to GPUs. Indeed, an openMP/MPI framework has outperformed CUDA-MPI tasks when operating on a smaller mathematical model36. It is suspected that a tradeoff point exists, which could be a subject for future research. Evaluation of openMP-MPI vs. CUDA-MPI at varied grid levels may lead to further optimizations of mixed grid sizes.

Additionally, because of parallel synchronization constraints, the adaptive grid mesh method previously used in40, 41 was discarded in favor of a modified residual calculation procedure as detailed in Methods. An adaptive grid mesh implementation would require adaptive process assignments to subsets of non-global domains and, if implemented at MPI level, may better utilize processing resources. In many cases only a single V-cycle was required to converge to selected tolerance. A minority of time steps, especially time steps directly after and including initial time step, required more smoothing iterations to achieve tolerance; it is likely that using a different multigrid cycle, such as the F-cycle, would improve convergence performance in those cases61. Some other minor performance improvements can be made, such as consolidating the AdP with a GCP to reduce thread count by one. From a GPU standpoint, with RAM counts on GPUs increasing significantly over the past half-decade, 2563 currently and 5123 in the future will likely become entirely GPU-side computations within a couple GPU generations. Further refinement could thus reduce MPI’s contribution by removing most memory transfers.

Because of its low cost of failure and ideal reproducibility, in silico simulation of clinically-relevant tumor-sized growth could help to analyze patient treatment, especially when coupled to tumor-specific parameters. Flexibility afforded by parameters leveraged in this model may yield a platform for accommodating a wide range of characteristics, anticipating tumor evolution and forecasting on patient potential outcomes. Further, discovering which model parameters influence positive clinical outcome could posit opportunities for novel clinical approaches and provide a basis for further exploration. A faster turnaround would offer a more responsive methodology of engaging with oncological hypothesis testing and to focus in vitro and in vivo experimental effort.

With the complexity and scale of the model, the number of parameters makes assumptions inevitable. Akin to the reliance of machine learning on high data acquisition for training sets, determining patient-specific parameter values will require integrating in silico evaluation with relevant clinical data. This requirement is exacerbated as more detailed biological phenomena are considered. For example, introducing immunotherapies and immuno-onco interactions will require additional parameters, meaning that balancing performance with model complexity will continue to affect larger-scale continuum tumor modeling. Despite limitations, this study presents a first step in achieving centimeter-scale 3D continuum tumor simulations with sub-millimeter resolution, with future work envisioned to move this approach closer to clinical application.

Supplementary Material

Supplementary File

Highlights.

  • Simulation of clinically-relevant tumor growth is constrained by computational cost

  • This study presents distributed computing solution of mixture model of tumor growth

  • MPI-CUDA implementation simulates 3D cm-scale vascularized tissue @ sub-mm resolution

  • Results enable continuum modeling of cm-scale tumors at reasonable computational cost

  • Parameter calibration to patient conditions could enable clinically-viable simulation

Acknowledgements

HBF acknowledges partial support by National Institutes of Health/National Cancer Institute grant R15CA203605 and Department of Defense/U.S. Army Medical Research grant W81XWH2110012. This work was conducted in part using resources of the Research Computing group and Cardinal Research Cluster (CRC) at University of Louisville. Authors thank for assistance Harrison Simrall with the CRC and Steven Goodin with matrix unrolling techniques. Authors acknowledge University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for support and use of Lipscomb Compute Cluster (LCC) and associated computing resources; this material is based upon work supported by National Science Foundation grant 1925687.

Abbreviations

3D

Three-dimensional

AdP

Administrative process

CPU

Central processing unit

CT

Computed tomography

CUDA

Compute unified device architecture

ECM

Extracellular matrix

GCP

General computation process

GPU

Graphics processing unit

MDE

Matrix degrading enzyme

MPI

Message passing interface

openMP

Open multi-processing

PC

Personal computer

RAM

Random access memory

TAF

Tumor angiogenic factors

TGF

Tumor growth factors

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

CONFLICT OF INTERESTS STATEMENT

The authors declare no known conflicts of interest.

Competing Interest: The authors have no competing interests to disclose.

References

  • 1.Ambrosi D, Duperray A, Peschetola V and Verdier C. Traction patterns of tumor cells. J Math Biol 58: 163–181, 2009. [DOI] [PubMed] [Google Scholar]
  • 2.Ambrosi D and Preziosi L. Cell adhesion mechanisms and stress relaxation in the mechanics of tumours. Biomech Model Mechanobiol 8: 397–413, 2009. [DOI] [PubMed] [Google Scholar]
  • 3.Ambrosi D and Preziosi L. On the closure of mass balance models for tumor growth. Math. Mod. Meth. Appl. Sci 12: 737–754, 2002. [Google Scholar]
  • 4.Anderson AR and Chaplain M. Continuous and discrete mathematical models of tumor-induced angiogenesis. Bulletin of Mathematical Biology 60: 857–899, 1998. [DOI] [PubMed] [Google Scholar]
  • 5.Antonopoulos M, Dionysiou D, Stamatakos G and Uzunoglu N. Three-dimensional tumor growth in time-varying chemical fields: a modeling framework and theoretical study. BMC Bioinformatics 20: 442, 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Araujo R and McElwain D. A mixture theory for the genesis of residual stresses in growing tissues I: A general formulation. SIAM J. Appl. Math 65: 1261–1284, 2005. [Google Scholar]
  • 7.Araujo R and McElwain D. A mixture theory for the genesis of residual stresses in growing tissues II: Solutions to the biphasic equations for a multicell spheroid. SIAM J. Appl. Math 66: 447–467, 2005. [Google Scholar]
  • 8.Asaadi H, Khaldi D, Chapman B and C. Ieee International Conference on Cluster Computing. A comparative survey of the HPC and big data paradigms: Analysis and experiments. Proceedings - IEEE International Conference on Cluster Computing, ICCC 423–432, 2016. [Google Scholar]
  • 9.Benedusi P, Hupp D, Arbenz P and Krause R. A Parallel Multigrid Solver for Time-Periodic Incompressible Navier–Stokes Equations in 3D. In: Numerical Mathematics and Advanced Applications ENUMATH 2015, edited by Karasözen B, Manguoğlu M, Tezer-Sezgin M, Göktepe S and Uğur O Springer, Cham., 2015, pp. 265–273. [Google Scholar]
  • 10.Berghoff M, Kondov I and Hotzer J. Massively Parallel Stencil Code Solver with Autonomous Adaptive Block Distribution. IEEE Transactions on Parallel and Distributed Systems 29: 2018. [Google Scholar]
  • 11.Berghoff M, Rosenbauer J and Schug A. Massively Parallel Large-scale Multi-model Simulation of Tumor Development. 2019.
  • 12.Berghoff M, Rosenbauer J and Schug A. Massively Parallel Large-Scale Multi-Model Simulation of Tumour Development Including Treatments. In: NIC Symposium 2020John von Neumann-Institut für Computing, 2020. [Google Scholar]
  • 13.Breward CJ, Byrne HM and Lewis CE. A multiphase model describing vascular tumour growth. Bull Math Biol 65: 609–640, 2003. [DOI] [PubMed] [Google Scholar]
  • 14.Breward CJ, Byrne HM and Lewis CE. The role of cell-cell interactions in a two-phase model for avascular tumour growth. J Math Biol 45: 125–152, 2002. [DOI] [PubMed] [Google Scholar]
  • 15.Byrne H, King J, McElwain D and Preziosi L. A two-phase model of solid tumour growth. Appl. Math. Letters 16: 567–573, 2003. [Google Scholar]
  • 16.Byrne H and Preziosi L. Modelling solid tumour growth using the theory of mixtures. Math Med Biol 20: 341–366, 2003. [DOI] [PubMed] [Google Scholar]
  • 17.Canon S, Gittens A, Racah E, Ringenburg M, Gerhardt L, Kottalam J, Liu J, Maschhoff K, Devarakonda A, Chhugani J, Sharma P, Yang J, Demmel J, Harrell J, Krishnamurthy V, Mahoney MW, Prabhat and U. S. A. D. D. Ieee International Conference on Big Data Washington Dc. Matrix factorizations at scale: A comparison of scientific data analytics in spark and C+MPI using three case studies. In: 2016 IEEE International Conference on Big Data (Big Data)2016, pp. 204–213. [Google Scholar]
  • 18.Cavaterra C, Rocca E and Wu H. Long-Time Dynamics and Optimal Control of a Diffuse Interface Model for Tumor Growth. Applied Mathematics & Optimization 2019. [Google Scholar]
  • 19.Chaplain MA, Graziano L and Preziosi L. Mathematical modelling of the loss of tissue compression responsiveness and its role in solid tumour development. Math Med Biol 23: 197–229, 2006. [DOI] [PubMed] [Google Scholar]
  • 20.Chen N, Glazier JA, Izaguirre JA and Alber MS. A parallel implementation of the Cellular Potts Model for simulation of cell-based morphogenesis. Computer physics communications 176: 670–681, 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 21.Colli P, Gilardi G, Rocca E and Sprekels J. Optimal distributed control of a diffuse interface model of tumor growth*. Nonlinearity 30: 2518, 2017 [Google Scholar]
  • 22.Cristini V, Frieboes HB, Li X, Lowengrub J, Macklin P, Sanga S, Wise SM and Zheng X. Nonlinear modeling and simulation of tumor growth. In: Selected topics in cancer modeling. Boston: Birkhäuser, 2008, pp. 1–69. [Google Scholar]
  • 23.Cristini V, Li X, Lowengrub JS and Wise SM. Nonlinear simulations of solid tumor growth using a mixture model: invasion and branching. J. Math. Biol 58: 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Dong S, Yan Y, Tang L, Meng J and Jiang Y. Simulation of 3D tumor cell growth using nonlinear finite element method. Computer Methods in Biomechanics and Biomedical Engineering 19: 807–818, 2016. [DOI] [PubMed] [Google Scholar]
  • 25.Dongarra J, Tourancheau B, Kamburugamuve S, Wickramasinghe P, Ekanayake S and Fox GC. Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink. The International Journal of High Performance Computing Applications 32: 61–73, 2018. [Google Scholar]
  • 26.Franks SJ, Byrne HM, King JR, Underwood JC and Lewis CE. Modelling the early growth of ductal carcinoma in situ of the breast. J Math Biol 47: 424–452, 2003. [DOI] [PubMed] [Google Scholar]
  • 27.Franks SJ, Byrne HM, Mudhar HS, Underwood JC and Lewis CE. Mathematical modelling of comedo ductal carcinoma in situ of the breast. Math Med Biol 20: 277–308, 2003. [DOI] [PubMed] [Google Scholar]
  • 28.Frieboes HB, Edgerton ME, Fruehauf JP, Rose FR, Worrall LK, Gatenby RA, Ferrari M and Cristini V. Prediction of drug response in breast cancer using integrative experimental/computational modeling. Cancer research 69: 4484–4492, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Frieboes HB, Jin F, Chuang Y-L, Wise SM, Lowengrub JS and Cristini V. Three-dimensional multispecies nonlinear tumor growth—II: tumor invasion and angiogenesis. Journal of theoretical biology 264: 1254–1278, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Frigeri S, Grasselli M and Rocca E. On a diffuse interface model of tumour growth. European Journal of Applied Mathematics 26: 215–243, 2015. [Google Scholar]
  • 31.Ghaffarizadeh A, Friedman SH and Macklin P. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations. Bioinformatics (Oxford, England) 32: 1256–1258, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Hawkins-Daarud A, Prudhomme S, van der Zee KG and Oden JT. Bayesian calibration, validation, and uncertainty quantification of diffuse interface models of tumor growth. J. Math. Biol 67: 1457–1485, 2013. [DOI] [PubMed] [Google Scholar]
  • 33.Huang H, Wang L, Lee EJ and Chen P. An MPI-CUDA Implementation and Optimization for Parallel Sparse Equations and Least Squares (LSQR). Procedia Computer Science 9: 76–85, 2012. [Google Scholar]
  • 34.Jacobsen D, Thibault J and Senocak I. An MPI-CUDA Implementation for Massively Parallel Incompressible Flow Computations on Multi-GPU Clusters. In: 48th AIAA Aerospace Sciences Meeting Including the New Horizons Forum and Aerospace Exposition. Orlando, FL: 2010. [Google Scholar]
  • 35.Klusek A, Los M, Paszynski M and Dzwinel W. Efficient model of tumor dynamics simulated in multi-GPU environment. International Journal of High Performance Computing Applications 33: 489–506, 2019. [Google Scholar]
  • 36.Lončar V, Young-S LE, Škrbić S, Muruganandam P, Adhikari SK and Balaž A. OpenMP, OpenMP/MPI, and CUDA/MPI C programs for solving the time-dependent dipolar Gross-Pitaevskii equation. Computer physics communications 209: 190–196, 2016. [Google Scholar]
  • 37.Lorenzo G, Scott MA, Tew K, Hughes TJR, Zhang YJ, Liu L, Vilanova G and Gomez H. Tissue-scale, personalized modeling and simulation of prostate cancer growth. Proceedings of the National Academy of Sciences of the United States of America 113: E7663–E7671, 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 38.Łoś M, Kłusek A, Hassaan MA, Pingali K, Dzwinel W and Paszyński M. Parallel fast isogeometric L2 projection solver with GALOIS system for 3D tumor growth simulations. Computer Methods in Applied Mechanics and Engineering 343: 1–22, 2019. [Google Scholar]
  • 39.McDougall SR, Anderson AR, Chaplain MA and Sherratt JA. Mathematical modelling of flow through vascular networks: implications for tumour-induced angiogenesis and chemotherapy strategies. Bull Math Biol/ 64: 673–702, 2002 [DOI] [PubMed] [Google Scholar]
  • 40.Ng CF and Frieboes HB. Model of vascular desmoplastic multispecies tumor growth. Journal of theoretical biology 430: 245–282, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 41.Ng CF and Frieboes HB. Simulation of Multispecies Desmoplastic Cancer Growth via a Fully Adaptive Non-linear Full Multigrid Algorithm. Frontiers in Physiology 9: 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Nugent LJ and Jain RK. Extravascular diffusion in normal and neoplastic tissues. Cancer research 44: 238–244, 1984. [PubMed] [Google Scholar]
  • 43.Oden JT, Hawkings A and Prudhomme S. General diffuse-interface theories and an approach to predictive tumor growth modeling. Math. Models Methods Appl. Sci. 2 20: 477–517, 2010. [Google Scholar]
  • 44.Plank MJ and Sleeman BD. Lattice and non-lattice models of tumour angiogenesis. Bull Math Biol 66: 1785–1819, 2004. [DOI] [PubMed] [Google Scholar]
  • 45.Plank MJ and Sleeman BD. A reinforced random walk model of tumour angiogenesis and anti-angiogenic strategies. Math Med Biol 20: 135–181, 2003. [DOI] [PubMed] [Google Scholar]
  • 46.Please C, Pettet G and McElwain D. Avascular tumour dynamics and necrosis. Math. Models Appl. Sci 9: 569–579, 1999. [Google Scholar]
  • 47.Please C, Pettet G and McElwain D. A new approach to modeling the formation of necrotic regions in tumors. Appl. Math. Lett 11: 89–94, 1998. [Google Scholar]
  • 48.Preziosi L and Tosin A. Multiphase and multiscale trends in cancer modelling. Math. Model. Nat. Phenom 4: 1–11, 2009. [Google Scholar]
  • 49.Preziosi L and Tosin A. Multiphase modelling of tumour growth and extracellular matrix interaction: mathematical tools and applications. J Math Biol 58: 625–656, 2009. [DOI] [PubMed] [Google Scholar]
  • 50.Reiter S, Vogel A, Heppner I, Rupp M and Wittum G. A massively parallel geometric multigrid solver on hierarchically distributed grids. Computing and Visualization in Science 16: 151–164, 2013. [Google Scholar]
  • 51.Reyes-Ortiz JL, Oneto L and Anguita D. Big Data Analytics in the Cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. Procedia Computer Science 53: 121–130, 2015. [Google Scholar]
  • 52.Roose T, Netti PA, Munn LL, Boucher Y and Jain RK. Solid stress generated by spheroid growth estimated using a linear poroelasticity model. Microvasc Res 66: 204–212, 2003. [DOI] [PubMed] [Google Scholar]
  • 53.Salguero AG, Tomeu-Hardasmal AJ and Capel MI. Dynamic Load Balancing Strategy for Parallel Tumor Growth Simulations. Journal of Integrative Bioinformatics 16: 2019. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 54.Salguero AG, Tomeu AJ, Capel MI, B. th International Conference on Practical Applications of Computational and P. t. Bioinformatics. Parallel cellular automaton tumor growth model. Advances in Intelligent Systems and Computing 803: 175–182, 2019. [Google Scholar]
  • 55.Swartz MA and Lund AW. Lymphatic and interstitial flow in the tumour microenvironment: linking mechanobiology with immunity. Nature reviews. Cancer 12: 210–219, 2012. [DOI] [PubMed] [Google Scholar]
  • 56.Tapia JJ, D'Souza R, Ieee M International Conference on Systems and S. M. C. S. A. T. X. U. S. A. Cybernetics. Data-parallel algorithms for large-scale real-time simulation of the cellular potts model on graphics processing units. Conference Proceedings - IEEE International Conference on Systems, Man and Cybernetics 1411–1418, 2009. [Google Scholar]
  • 57.Tapia JJ and D'Souza RM. Parallelizing the Cellular Potts Model on graphics processing units. Computer physics communications 182: 857–865, 2011. [Google Scholar]
  • 58.Tomeu AJ, Salguero AG and Capel MI. Speeding Up Tumor Growth Simulations Using Parallel Programming and Cellular Automata. IEEE Latin America Transactions 14: 2016. [Google Scholar]
  • 59.Tosin A Multiphase modeling and qualitative analysis of the growth of tumor cords. Networks Heterogen. Media 3: 43–84, 2008. [Google Scholar]
  • 60.Tracqui P Biophysical models of tumor growth. Rep. Prog. Phys 72: 056701, 2009. [Google Scholar]
  • 61.Trottenberg U, Oosterlee CW and Schuller A. Multigrid. Elsevier, 2000. [Google Scholar]
  • 62.Ward JP and King JR. Mathematical modelling of avascular-tumour growth. IMA J Math Appl Med Biol 14: 39–69, 1997. [PubMed] [Google Scholar]
  • 63.Ward JP and King JR. Mathematical modelling of avascular-tumour growth. II: Modelling growth saturation. IMA J Math Appl Med Biol 16: 171–211, 1999. [PubMed] [Google Scholar]
  • 64.Wcisło R and Dzwinel W. Particle Model of Tumor Growth and Its Parallel Implementation. In: Parallel Processing and Applied Mathematics : 8th International Conference, PPAM 2009, Wroclaw, Poland, September 13-16, 2009. Revised Selected Papers, Part I Berlin, Heidelberg: : Springer Berlin Heidelberg, 2010, pp. 322–331. [Google Scholar]
  • 65.Wcisło R, Gosztyła P and Dzwinel W. N-body parallel model of tumor proliferation. In: Proceedings of the 2010 Summer Computer Simulation Conference2010, p. 160–167. [Google Scholar]
  • 66.Wise SM, Lowengrub JS and Cristini V. An adaptive multigrid algorithm for simulating solid tumor growth using mixture models. Mathematical and Computer Modelling 53: 1–20, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 67.Wise SM, Lowengrub JS, Frieboes HB and Cristini V. Three-dimensional multispecies nonlinear tumor growth—I: model and numerical method. Journal of theoretical biology 253: 524–543, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File

RESOURCES