Abstract
High-fidelity numerical simulations produce massive amounts of data. Analyzing these numerical data sets as they are being generated provides useful insights into the processes underlying the modeled phenomenon. However, developing real-time in-situ visualization techniques to process large amounts of data can be challenging since the data does not fit on the GPU, thus requiring expensive CPU-GPU data copies. In this work, we present a scheduling scheme that achieve real-time simulation and interactivity through GPU hyper-tasking. Furthermore, the CPU-GPU communications were minimized using an activity-aware technique to reduce redundant copies. Our simulation platform is capable of visualizing 1.7 billion protein data points in situ, with an average frame rate of 42.8 fps. This performance allows users to explore large data sets on remote server with real-time interactivity as they are performing their simulations.
Keywords: high-performance agent-based modeling, real-time in situ visualization, heterogeneous platform scheduling, wound healing, vocal fold model
1. Introduction
The attempt to understand real-world phenomena through modeling has been ubiquitous. The prevalence of computing power has given researchers and scientists tools to realize increasingly complex models. As models gain higher degree of fidelity, the complexity and amounts of data involved grow. To make sense of these models, the analysis and visualization of the output data as they are being generated become crucial. However, efficiently handling such amounts of data interactively requires a carefully designed mechanism to make the most effective use of the available resources. The resulting mechanism should exhibit two desirable characteristics: (1) real-time performance, (2) high level of interactivity, which enable computational steering and interactive data exploration.
Graphics processing units (GPUs) are typically used for rendering as they are specialized accelerators designed for optimized graphics operations. Since GPUs were not designed to serve as stand-alone devices, they need a host, namely central processing units (CPUs), to operate. This host-device setup is a common configuration in modern personal machines. This work, thus, focuses on a single-node with multi-core CPU and multiple GPUs platform configuration.
The bottleneck of applications using GPU accelerators is usually the communication between the host and the device, and the relatively limited amount of memory available on the GPUs. First, any data to be stored and used on the GPUs would have to be transferred from the CPU host through a peripheral component interconnect express (PCIe) bus which is very slow (bandwidth of approximately 2.0 to 8.0 GB/s) compared to the GPU memory bandwidth (typically hundreds of GB/s). Second, GPU global memory is usually much smaller compared to that of the CPU host. Thus, in large data applications, memory transfer overheads become formidable due to the on-demand data transfers from host to device.
While multiple simulation approaches apply well to biological applications, this work focuses on agent-based modeling (ABM) due to its bottom-up structure that makes cell modeling and incremental knowledge aggregation relatively simple. The ABM approach has been widely used to quantitatively simulate complex dynamical systems [1]. Many researchers have gained insights into their system of interest using ABMs [2], [3]. As model complexity grows, improvements through high-performance computing (HPC) techniques to ABM have been reported in a number of publications [4], [5], [6], [7], [8], [9]. Unfortunately, despite the significance of data visualization in output data interpretation, only a few of these works presented improvements on the visualization components. Our goal is to fill in this gap by presenting a combination of techniques that enable effective and interactive in situ visualization of the simulated data concurrently with the simulation.
Our previous work [10] introduced a scheduling scheme to fully utilize the power of both CPU and GPUs when performing simulation computations. Although the performance of the simulation computation outperformed other related work, the visualization interactivity was very low as the output data were distributed among the host and different devices. In this study, we addressed this shortcoming by reconstructing the scheduling scheme to accommodate the visualization process by hyper-tasking the GPUs and allowing all GPUs to participate in the rendering process through the compression of their local data. Furthermore, we developed a host-device data-transfer minimization technique to decrease the overhead of CPU-GPUs data copies of structural data.
2. Background
2.1. Software and Hardware Environment
Our base ABM framework was written in C++. The parallelization on multi-core CPUs was written with OpenMultiprocessing (OpenMP). OpenMP is a portable application programming interface (API) for multi-threading on shared memory platforms. The communication with GPUs was implemented using Compute Unified Device Architecture (CUDA) version 8.0.44. All graphics were implemented with legacy OpenGL (Open Graphics Library). The visualization was made in situ [11] using VirtualGL [12] and TurboVNC [13]. The pre-processing of data to predict best parameters for our host-device data minimization algorithm was developed in Python 3.7.
The program was compiled using the intel C++ compiler, icpc version 15.0.0, with release default optimization level. The source code of the vocal fold ABMs along with all optimizations described in this work can be found at https://github.com/HPC-ABM/vf-version_7_3.
All results reported in this work were obtained from program execution and benchmark on a single compute node consisting of a 44-core Intel(R) Xeon(R) CPU E5–2699 v4 @ 2.20 GHz host with 128 GB of main memory and two accelerators, NVIDIA Tesla M40. Each NVIDIA Tesla M40 graphics card consists of 3072 CUDA cores and 22.4 GB of global memory with compute capability 5.2.
2.2. Agent-Based Modeling (ABM)
Agent-based modeling is a widely-used approach to quantitatively simulate complex dynamical systems [1]. Each ABM is defined by the autonomous agents whose decisions and actions are governed by user-defined deterministic and stochastic rules. As opposed to equations-based approaches, ABM is decentralized. That is, the system behavior is determined by the collective property derived from individual agents in the system. Thus, ABM simulations aim to gain insight of complex systems by yielding emergent behaviors from a group of simple individuals. The basic components of ABM are:
Agent — Autonomous objects that perform action and interact with other agents and their environment
Agent Rules — Set of pre-defined deterministic/stochastic behaviors of agents
ABM World — The environment that houses the agents. This space can be continuous or discrete
Agents can represent any individual from cities in economic models to inflammatory cells in systems biology models. These agents ‘think’ and make decisions based on the ABM rules. These decisions turn into actions that affect other agents and their environment (ABM world).
2.3. Case Study — Vocal Fold Tissue Inflammation and Repair
Inflammation and wound healing is a complex process that our body uses to restore damaged tissue. The process involves a number of interacting chemical, biological and mechanical phenomena [14]. The spatio-temporal scale of some of these phenomena vary substantially from others. Despite the tremendous efforts spent in various studies [14], understanding the precise underlying mechanisms of wound healing process still remains an open challenge. Computational models can serve as a tool to verify current understanding of the problem with the real phenomena. Interactive visualization concurrent to the execution of the simulation can help refine the agent-rules of the model.
Due to availability of experimental data [15], [16], the ABM rules, constants, and parameters used in the models were derived mainly from empirical vocal fold experiments and literature reviews [17], [18], [19], [20], [21], [22], [23], [24]. The main components of the vocal fold (VF) ABM are biological cells, chemical signals and extracellular matrix (ECM) proteins. The biological cells serve as the main actors of the model who communicate via chemical signals, and perform healing via secretion of ECM proteins. The roles of specific cell types and ECM proteins, along with their integration into the VF ABM is described in our previous work [10].
3. Related Work
3.1. HPC ABMs for Biological Applications
High-fidelity ABMs in biology (Bio-ABM) often involve with large amounts of data. Multiple high-performance computing (HPC) ABM tools have been developed to address the challenges in processing these large data sets. For example, FLAME [8] is an implementation of an ABM framework for parallel architectures based on stream X-machines. FLAME has been used to speed up the simulation of ecological systems in various fields including systems biology [25]. FLAME was further extended with a GPU support [26]. SugarScape on steroid [4] is another example of ABM acceleration on GPU platforms. These tools have demonstrated their applicability to biological system simulations such as tissue wound and disease modeling [25], [27], [28]. In the realm of distributed computing, Repast HPC [5] was developed as an MPI extension to its predecessors, Rapast and Repast Symphony [29]. Repast HPC was adopted to accelerate the simulation of bone tissue growth [6].
Due to the popularity of ABMs in biological applications, some HPC ABM tools have also been developed specifically for biological applications. An example includes AgentCell, a Repast-based framework for single-cells and bacterial population [30]. The AgentCell framework provides support for running multiple non-interacting single-cell instances concurrently on massively parallel computers. Chaste [31] is an example of bio-ABM framework utilizing HPC platforms via various existing high-performance libraries like PETSs and (par)METIS for parallel linear algebra and mesh distribution. More examples include HPC ABM frameworks for multi-core CPUs such as CompuCell3D [7], CellSys [32], and Morpheus [33]. In these frameworks, parallelization was performed with OpenMP to speed up the performance on single-node multi-core CPUs. In addition, other techniques have been proposed to accelerate specific biological models on multi-core CPUs or GPUs [34], [9], [35]. The aforementioned HPC ABM techniques either target CPUs or GPUs. The inability to exploit both CPUs and GPUs simultaneously results in a sub-optimal resource utilization. Our previous work [36], [10] addressed this issue by assigning appropriate computational tasks to the CPU while it waits for the GPUs to complete their computations. However, these previous works only focused on the simulation computation. This leaves an opportunity in visualization optimization an open problem.
3.2. 3D Visualization in Bio-ABMs
The challenge in developing HPC bio-ABM has been relatively well-explored. However, the same does not apply to the visualization aspect of ABM frameworks. Due to the scales of the data size being generated, the conventional method of post hoc visualization has become increasingly infeasible. The post hoc method refers to a visualization process where the visualization is performed after the numerical simulation has completed. The output data are written to disk during the simulation and retrieved later for visualization. This visualization method puts an enormous load on the disk and network. For this reason, another type of visualization, In situ visualization, has gained the interests of researchers [11]. In situ visualization allows the outputs to be analyzed on the same machine that produced them. The ability to perform on-site data analysis reduces the amount of data movements between the server and remote users. This property makes in situ visualization an ideal way to visualize simulations that produce large data sets such as our case. Paraview Catalyst [37] and work reported in [38] are examples of libraries developed to enable in situ processing of simulation output on popular existing visualization frameworks such as Paraview [39] and VisIt [40]. A bitmap-based and a quadtree-based ABM approach [41], [42] were proposed respectively to analyze the numerical output in situ and reduce non-essential simulation data.
4. Scheduling
Simulation tasks are either computational or visualization. In general, most simulation computational tasks can be further classified into two categories: coarse-grain or fine-grain. A task is labeled coarse-grain if it involves complex computations and access relatively small amounts of data. These coarse-grain tasks are suitable for CPUs. On the other hand, GPUs excel at executing tasks that involve simple computations that get applied large amounts of data. These tasks are then classified as fine-grain. The scheduling schemes described in our earlier work [36], [10] outperformed other related ABM works in terms of computational throughput. However, since the computational tasks were distributed among the host and different accelerators, visualization was performed at the end of each iteration after all data were copied back to the host and synchronized. The visualization framerate was, thus, bounded by the execution time of the computation and synchronization of the whole iteration.
In this work, the scheduling scheme proposed in [10] was restructured to mitigate the bound on user-simulation interactivity by (1) scheduling GPU tasks at a finer level to enable (2) GPU hyper-tasking (GHT). The benefits of GHT are two folds. First, the GPUs responsible mainly for fine-grain computations (GPUcompute) can hyper-task between fine-grain computations and ray-casting (visualization subtask). This allows these non-screen-attached GPUs to aid in visualizing their local data and minimize peer communications. Second, GHT allows the rendering GPU (GPUvis) to lighten the computational loads of its peers by executing leftover fine-grain subtasks, while maintaining high level of visualization interactivity.
Fig. 1 demonstrates the workflow of the proposed scheduling scheme. This scheme was executed with p CPU threads, where p denotes the number of available CPU cores. A total of p − NGPU threads get used for coarse-grain computations. This part remains unchanged from our previous work [36], [10]. In this work, the rest of the threads are not simply launching fine-grain tasks, but rather communicating with the rendering GPU (GPUvis) and compute GPUs (GPUcompute) to orchestrate GHT.
Fig. 1:
Diagram showing work flow and scheduling decision of different tasks on different devices. The coarse-grain tasks gets executed on the host using p−NGPU −1 threads, where p denotes the number of available CPU cores. The CPU host communicates with the rendering GPU (GPUvis) to send necessary data for GPUvis to visualize. Between each visualization frame, GPUvis assists with fine-grain computations at a subtask level to lighten the loads of other GPUs (GPUcompute), since these compute GPUs also assist with volume-rendering visualization subtasks to minimize off-device data transfers.
The program uses NGPU −1 threads for communications with the GPUs responsible mainly for fine-grain computations (GPUcompute). Each time a GPUcompute completes its fine-grain task, the GPU compresses the output and buffer it for visualization in the next iteration. In additions, these GPUs participate in visualizing its local data from the previous iteration when they finish a task by performing ray-casting locally. Each GPUcompute then sends the results as GLtexture to the GPUvis for rendering. This prevents the GPUs from having to send large 3D volumetric data to their peer, and only send ready-to-render 2D textures to GPUvis.
One thread is spared for communication with GPUvis. The main responsibility of GPUvis is to perform visualization of all outputs including those residing on the CPU host and other GPUs (GPUcompute). However, between visualization frames, GPUvis switches to compute mode to assists in fine-grain computations. Since, other compute GPUs spare some of their resources to assist with the visualization process to reduce amounts of data being sent off-device, it is necessary for GPUvis to lighten the loads of other GPUs whenever it can for optimal performance.
Our tested configuration consisted of one multi-core CPU and two GPUs. However, this scheduling scheme can be extended to work with more than two GPUs by executing the scheduling logic of GPUcompute shown in Fig. 1 on additional GPUs. The added GPUs will be beneficial if the execution time of the fine-grain tasks on GPUs exceeds that of the CPU coarse-grain tasks. This scheduling scheme, thus, scales out with additional GPUs as the amount of fine-grain tasks increases.
5. Host-Device Data Transfer Minimization
So far, we have only discussed an efficient scheduling technique to compute and visualize the simulation outputs. However, the details of data movements between devices still need to be examined as off-device data copies are relatively time consuming. In this section, we will discuss the technique used to minimize the copies of redundant data.
5.1. Observations
For optimization purposes, we categorize a dynamic pattern of a data type by two properties. First, a pattern is defined by the distribution characteristics in a single iteration. For example, the distribution of a chemical concentration is typically smooth due to diffusion, whereas ECM protein and cell distributions exhibit some degree of discontinuity. Second, the dynamics are defined by how the distribution changes from one iteration to another, for example localized vs. distributed changes. Table 1 shows the categorization of dynamic patterns for the different types of output and which device they were computed on.
Table 1:
Data Dynamic Pattern Categories
| Smooth | Discontinuous | |
|---|---|---|
|
| ||
| Localized | 0—n/a | I—ECM Proteins |
| Distributed | II—Chemicals | III—Cells |
|
| ||
| GPU | CPU | |
Notice that none of the patterns observed in wound healing ABMs fall into category 0. Data type that are discontinuous and distributed fall into category III. Examples include mobile agents such as neutrophils, macrophages and fibroblasts, or non-mobile agents such as platelets. In this work we focused our optimization efforts only on protein (signaling and ECM) data which fall into category I and II. The direction and challenges of developing optimization for category III will be discussed in Section 7.
Data category I are quantities that do not diffuse or spread, thus changes are typically local with discontinuous distributions. These are mostly structural quantities. For example, ECM proteins form an underlying structure of the tissue and provide building blocks of the overall structure similar to laying out bricks for a building. Once a brick is laid, it stays there until it is degraded or destroyed. In wound-healing models, these quantities are localized, and thus changes to these quantities happen mostly in highly-active areas such as wound sites.
In contrary, category II data diffuse or spread, thus their distributions are typically smooth. Once the mass gets deposited at a location, it undergoes immediate changes in concentrations and reaches other areas. This includes diffusible quantities such as chemicals, gas, and heat. The dynamics of these quantities are very distributed in nature. The changes usually keep happening and affect other areas until an equilibrium is reached. However, in wound healing models, inflammatory cells may not stop protein secretion until the wound is healed. Thus, an equilibrium may not be reached until the end of the simulation, resulting in constant changes all over the simulation area.
Since the computations of diffusible quantities involve relatively large amounts of data, they were performed on GPUs. These output data are, thus, local to the GPUs. As mentioned in Section 4, to lower the amounts off-device data copies, these data were compressed and store on the producer GPU using linear sampling so each GPU can assist in ray-casting. Through the CUDA texture filter mode, cudaFilterModeLinear, a sample fetched from a CUDA texture is automatically interpolated. An experiment was conducted to study the accuracy loss during compression and showed that an acceptable 95% of the accuracy was maintained.
The computations of indiffusible structural quantities (data category I) involve relatively small amounts of data so they were performed on CPU. Since these data are local to the CPU host, host-device data copies are inevitable during the visualization process.
5.2. Data Category I: Host-Device Activity-Aware Data Copy (HADC)
As discussed in the previous section, this type of data involves structural change and the outputs reside on the CPU host. These data, thus, need to be copied from CPU to GPUvis during visualization. Structural data change locally i.e. the change in one location does not directly impact other locations. More specifically, in wound healing models, most activities occur in the damaged areas. Given these observations, we developed a Host-Device Activity-Aware Data Copy (HADC) technique to reduce the copying of redundant data.
5.2.1. HADC Algorithm
Fig. 2 demonstrates the work flow of HADC during the simulation. First, the ABM world is divided into smaller sub-volumes. In each iteration, when a patch gets modified, the program identifies the parent sub-volume of that patch. The algorithm then looks up the corresponding entry in the activity map and increments its activity index. During the data buffering stage, only the data in the sub-volumes that had gone through significant changes get copied to the GPU for visualization. In this way, the amount of redundant data copies is minimized, resulting in a faster buffering process.
Fig. 2:
An example sequence demonstrating the Host-Device Activity-Aware Data Copy (HADC) technique to minimize the copying of redundant category I data. First, the ABM world gets divided up into smaller sub-volumes. In each iteration, when a patch is modified, the parent sub-volume gets identified. The corresponding entry in the activity map gets incremented. Only the sub-volumes with significant changes get copied to the GPU for visualization.
5.2.2. HADC Sub-Volume Size Recommendation Heuristic
The size of the sub-volumes is a very important factor in determining the performance of HADC. If the size is too large, a large amount of unnecessary data copies will be performed. However, if the size is too small, since each small sub-volume will be copied in a separate transaction, the latency of host-device copy operations will hurt the overall performance. To find an optimal partition, the program runs the model, and logs the locations of activities in each iteration. For a given M points (representing activity locations) in a log from the most active iteration, the goal is to find a partition of size w × h × d, where the M points appear in as few partitions as possible. In other words, the objective is to minimize the partition size (space) and number of communications (time) between host and device, where the two extremes are < O(WHD)space, O(1)time > and < O(1)space, O(M)time >.
To determine a good partition size, a modified octree algorithm is used. This heuristic consists of two passes: top-down and bottom-up. At the end of each pass, each node (partition) will either be labeled white (insignificantly low density) or black (significantly high density). The first pass starts top-down from the whole simulation grid, and split on the density conditions given the parameters ρlow and ρhigh. Once the program is done splitting as all nodes satisfy one of the density conditions (node density < ρlow or > ρhigh), the program will return a list of black node dimensions. Here, the smallest volume dimensions will be picked to minimize the partition size. The program then execute the second pass in an attempt to combine consecutive partitions if and only if the density condition requirements still hold. This bottom-up pass is executed to further minimize the number of host-device communications.
Given a wound healing model m =< wc, mc, ic, r >, where wc denotes the wound configuration (dimensions, wound location), mc denotes model configuration (dimensions), ic denotes the initial conditions (patient’s cytokine levels, treatment type etc.) and r denotes the model rules, if we know the optimal subvolume size of m, then we can choose a scaled subvolume size for model m′ =< wc′, mc, ic′, r >. For example, if the wound depth of m and m′ is 1 mm and 0.5 mm respectively in the x-direction, then we can scale the optimal subvolume size of m in the x direction by and reuse it for m′. This way, once we determined an optimal subvolume size for model m, we can apply it to any model m′ defined earlier.
6. Results
6.1. Model Configurations
We have developed VF ABM models for two different mammals: rat and human. As discussed in Section 2.3, the rat VF ABM serves as a test model due to its size and the availability of empirical data. The model configurations were determined based on empirical data and vocal fold literature reviews [17], [18], [19], [20], [22], [23], [24]. Table 2 summarizes the configurations of both models.
Table 2:
Summary of mammal vocal fold simulation configurations
| Unit | Rat | Human | |
|---|---|---|---|
| World Size | 3D Patches | 2.0 M | 153.9 M |
| ECM Data | Data Points | 6.0 M | 0.46 G |
| Chemical Data | Data Points | 16.0 M | 1.23 G |
| Platelets | Initial Number of Cells | 78.8 K | 34 M |
| Neutrophil | Initial Number of Cells | 517 | 1.72 M |
| Macrophage | Initial Number of Cells | 315 | 0.97 M |
| Fibroblast | Initial Number of Cells | 3.5 K | 12.20 M |
6.2. Performance Evaluation
To evaluate the quality of the subvolume size recommendation made by the heuristic proposed in Section 5, an experiment was run with different HADC partition sizes for different model and wound configurations. As shown in Fig. 4, the 2-pass octree technique worked well for small and medium size wounds (sizes up to of the VF size). However, it did not work well for large wound sizes as it recommended a sub-optimal partition. Thus, it is best for the modeler to use the heuristic to obtain the recommendation from smaller wound model, m and scale the partition size for m′ with larger wounds.
Fig. 4:
Data copy time speedup using HADC. For each wound configuration, the model was run with different HADC partition sizes. The partition size that resulted in the best average data transfer time was used to compute the speedup and compared with speedup using partition size recommended by heuristic described in Section 5. The 2-pass octree technique worked well for small and medium size wounds. In contrary, the scaled 2-pass octree using the scaled recommendation from smaller wound model, m for m′ with larger wounds work well for all wound sizes.
The original frame rate of the human VF visualization was bounded by the compute time of approximately 7 s per iteration [36], [44]. This resulted in poor interactivity due to the 0.14 fps frame rate. With the scheduling technique described Section 4, the frame rate improved to 7.2 fps. By using the subvolume size recommended by the heuristic proposed in Section 5 for HADC optimization, the visualization performance improved significantly to 42.8 fps.
For performance evaluation purposes, the human VF ABM was compared against our previous and other similar ABM work (Table 3). The bacteria-macrophage-antibiotic ABM was implemented with FLAME GPU [27]. FLAME GPU is a widely used modern HPC ABM framework, and thus, serves as a good performance standard. Additionally, we included a well regarded high-performance visualization prototype, MegaMol for comparison [45]. Despite MegaMol being an atomic-level visualization prototype, this powerful visualization tool is particle-based. Since particle-based simulation engine offer frameworks that can be adapted to visualize cellular-level data, MegaMol is well-suited to serve as our visualization performance comparison base. Our VF ABM is able to process data orders of magnitude larger than FLAME GPU at a similar frame rate. Furthermore, we are able to visualize 17× more data points than MegaMol at a 4.2× better frame rate. It is worth noting that MegaMol does perform more sophisticated rendering process than our framework. However, while our framework couples the visualization engine with grid-based real-time data generation, MagaMol did not support this feature [45].
Table 3:
Performance and scale comparison with existing biological visualization platforms
| #Data Points (×106) | In Situ Support | Frame Rate (fps) | Hardware | |
|---|---|---|---|---|
| MegaMol [45] | 100 | ✓ | 10 | Intel Core i7–2600 (16 GB RAM) NVIDIA GeForce GTX TITAN |
| 2D FLAME GPUa [27] | 0.48 | × | 33b | Intel Core i7 (8 GB RAM) NVIDIA 830M GeForce |
| 3D VF ABM (original [10]) | 0.46 | ✓ | 0.13 | Intel Xeon E5–2699 v4 (128 GB RAM) NVIDIA Tesla M40 |
| 3D VF ABM (optimized) | 1700 | ✓ | 42.8 | Intel Xeon E5–2699 v4 (128 GB RAM) NVIDIA Tesla M40 |
Bateria-Macrophage-Antibiotic ABM
Derived frame rate
7. Conclusion
We presented a scheduling technique for high-performance 3D ABM for wound healing application using GPU hyper tasking (GHT) to achieve high level of simulation interactivity. To achieve optimal concurrency in this scheduling scheme, different tasks were assigned to different devices. This task assignment resulted in output data being distributed across the CPU host and multiple GPUs. Thus, a Host-Device Activity-Aware Data Copy (HADC) technique was proposed to minimize host-device data copies. The resulting framework was used to implement a VF tissue repair response to injuries. The in situ visualization of the ECM and signaling proteins in healing VF was capable of processing and rendering 1.7 billion data points, at an average of 42.8 fps frame rate. Our ABM framework offers biomedical researchers a tool to efficiently explore the large amounts of output data with a high level of user-simulation interactivity.
We are currently exploring verification techniques to quantitatively compare model-generated ECM images to fluorescence microscopy images obtained from real tissue samples. Further, we plan to develop optimization techniques for visualization of inflammatory cells. As cell populations exhibit both structural and mobility properties, capturing their dynamics efficiently through visualization is a challenging task. The goal would be to couple cell visualization with the current framework, while maintaining a high level of interactivity.
Fig. 3:
(a) Comparison of vocal fold image (collagen in red, elastin and cells in green) of real rat vocal fold (uninjured control on the left and scarred in the middle) [43] and a zoomed image obtained from VF ABM simulation (right). (b) Visualization of human vocal fold both ECM proteins and signaling proteins (chemical gradients in turquoise-pear) during the healing process. The ECM proteins include collagen (red), hyaluronic acid (blue), and elastin (green). The healing and elapsed time stats are displayed in the top-left and top-right corner of the screen, respectively. This image is a result of a transfer function that emphasizes newly deposited ECM proteins in the wound area and assign low opacity to existing ECM proteins outside of the wound area.
Acknowledgment
The work is supported by National Institute of Deafness and other Communication Disorder of the National Institutes of Health under Grant No R01DC005788, Natural Sciences and Engineering Research Council of Canada under Grant No. RGPIN-2018-03843 and the Candian Institutes of Health Research under Grant No. 388583. The authors gratefully acknowledge the support provided by National Science Foundation under Grant No. CNS-1429404 MRI Project.
The authors would like to thank Sujal Bista for guidance in developing the visualization component and UMIACS staff for assistance in VirtualGL configuration.
References
- [1].Macal CM, “Everything you need to know about agent-based modelling and simulation,” Journal of Simulation, vol. 10, no. 2, pp. 144–156, 2016. [Google Scholar]
- [2].Li N, Verdolini K, Clermont G, Mi Q, Rubinstein EN, Hebda PA, and Vodovotz Y, “A patient-specific in silico model of inflammation and healing tested in acute vocal fold injury,” PloS one, vol. 3, no. 7, p. e2789, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [3].Wall F, “Agent-based modeling in managerial science: an illustrative survey and study,” Review of Managerial Science, vol. 10, no. 1, pp. 135–193, 2016. [Google Scholar]
- [4].D’Souza RM, Lysenko M, and Rahmani K, “Sugarscape on steroids: simulating over a million agents at interactive rates,” in Proceedings of Agent2007 conference. Chicago, IL, 2007. [Google Scholar]
- [5].Collier N. and North M, “Parallel agent-based simulation with repast for high performance computing,” Simulation, vol. 89, no. 10, pp. 1215–1235, 2013. [Google Scholar]
- [6].Murphy JT, Bayrak ES, Ozturk MC, and Cinar A, “Simulating 3-d bone tissue growth using repast hpc: Initial simulation design and performance results,” in Winter Simulation Conference (WSC), 2016. IEEE, 2016, pp. 2087–2098. [Google Scholar]
- [7].Swat MH, Thomas GL, Belmonte JM, Shirinifard A, Hmeljak D, and Glazier JA, “Multi-scale modeling of tissues using compucell3d,” Methods in Cell Biology, vol. 110, p. 325, 2012. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [8].Coakley S, Gheorghe M, Holcombe M, Chin S, Worth D, and Greenough C, “Exploitation of high performance computing in the flame agent-based simulation framework,” in High Performance Computing and Communication & 2012 IEEE 9th International Conference on Embedded Software and Systems (HPCC-ICESS), 2012 IEEE 14th International Conference on. IEEE, 2012, pp. 538–545. [Google Scholar]
- [9].Cytowski M. and Szymanska Z, “Large-scale parallel simulations of 3d cell colony dynamics,” Computing in Science & Engineering, vol. 16, no. 5, pp. 86–95, 2014. [Google Scholar]
- [10].Seekhao N, Shung C, JaJa J, Mongeau L, and Li-Jessen NY, “High-performance agent-based modeling applied to vocal fold inflammation and repair,” Frontiers in Physiology, vol. 9, p. 304, 2018. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [11].Rivi M, Calori L, Muscianisi G, and Slavnic V, “In-situ visualization: State-of-the-art and some use cases,” PRACE White Paper, pp. 1–18, 2012. [Google Scholar]
- [12].Project TV, “VirtualGL background,” http://www.virtualgl.org/About/Background, Tech. Rep., 2015.
- [13].Commander D, “User’s guide for turbovnc 0.6. retrieved january 14, 2010,” 2009. [Google Scholar]
- [14].Gottrup F, Ågren MS, and Karlsmark T, “Models for use in wound healing research: a survey focusing on in vitro and in vivo adult soft tissue,” Wound Repair and Regeneration, vol. 8, no. 2, pp. 83–96, 2000. [DOI] [PubMed] [Google Scholar]
- [15].Lim X, Tateya I, Tateya T, Muñoz-Del-Río A, and Bless DM, “Immediate inflammatory response and scar formation in wounded vocal folds,” Annals of Otology, Rhinology & Laryngology, vol. 115, no. 12, pp. 921–929, 2006. [DOI] [PubMed] [Google Scholar]
- [16].Welham NV, Lim X, Tateya I, and Bless DM, “Inflammatory factor profiles one hour following vocal fold injury.” The Annals of otology, rhinology, and laryngology, vol. 117, no. 2, pp. 145–152, 2008. [DOI] [PubMed] [Google Scholar]
- [17].Kurita S, “A comparative study of the layer structure of the vocal fold,” Vocal Fold Physiology, pp. 3–21, 1981. [Google Scholar]
- [18].Su M-C, Yeh T-H, Tan C-T, Lin C-D, Linne O-C, and Lee S-Y, “Measurement of adult vocal fold length,” The Journal of Laryngology & Otology, vol. 116, no. 6, pp. 447–449, 2002. [DOI] [PubMed] [Google Scholar]
- [19].Kutty JK and Webb K, “Tissue engineering therapies for the vocal fold lamina propria,” Tissue Engineering Part B: Reviews, vol. 15, no. 3, pp. 249–262, 2009. [DOI] [PubMed] [Google Scholar]
- [20].Prades J-M, Dumollard JM, Duband S, Timoshenko A, Richard C, Dubois MD, Martin C, and Peoc’h M, “Lamina propria of the human vocal fold: histomorphometric study of collagen fibers,” Surgical and Radiologic Anatomy, vol. 32, no. 4, pp. 377–382, 2010. [DOI] [PubMed] [Google Scholar]
- [21].Li NY, Vodovotz Y, Hebda PA, and Abbott KV, “Biosimulation of inflammation and healing in surgically injured vocal folds,” The Annals of otology, rhinology, and laryngology, vol. 119, no. 6, p. 412, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [22].Zörner S, Kaltenbacher M, and Döllinger M, “Investigation of prescribed movement in fluid–structure interaction simulation for the human phonation process,” Computers & fluids, vol. 86, pp. 133–140, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [23].Li NY, Heris HK, and Mongeau L, “Current understanding and future directions for vocal fold mechanobiology,” Journal of Cytology & Molecular Biology, vol. 1, no. 1, p. 001, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [24].Bhattacharya P. and Siegmund T, “A computational study of systemic hydration in vocal fold collision,” Computer methods in biomechanics and biomedical engineering, vol. 17, no. 16, pp. 1835–1852, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [25].Richmond P, Walker D, Coakley S, and Romano D, “High performance cellular level agent-based simulation with flame for the gpu,” Briefings in bioinformatics, vol. 11, no. 3, pp. 334–347, 2010. [DOI] [PubMed] [Google Scholar]
- [26].Richmond P. and Chimeh MK, “Flame gpu: Complex system simulation framework,” in High Performance Computing & Simulation (HPCS), 2017 International Conference on. IEEE, 2017, pp. 11–17. [Google Scholar]
- [27].de Paiva Oliveira A. and Richmond P, “Feasibility study of multiagent simulation at the cellular level with flame gpu.” in FLAIRS Conference, 2016, pp. 398–403. [Google Scholar]
- [28].Tamrakar S, Richmond P, and D’Souza RM, “Pi-flame: A parallel immune system simulator using the flame graphic processing unit environment,” Simulation, vol. 93, no. 1, pp. 69–84, 2017. [Google Scholar]
- [29].North MJ, Howe TR, Collier NT, and Vos JR, “The repast simphony runtime system,” in Proceedings of the agent 2005 conference on generative social processes, models, and mechanisms, vol. 10. ANL/DIS-06–1, co-sponsored by Argonne National Laboratory and The University of Chicago, 2005, pp. 13–15. [Google Scholar]
- [30].Emonet T, Macal CM, North MJ, Wickersham CE, and Cluzel P, “Agentcell: a digital single-cell assay for bacterial chemotaxis,” Bioinformatics, vol. 21, no. 11, pp. 2714–2721, 2005. [DOI] [PubMed] [Google Scholar]
- [31].Mirams GR, Arthurs CJ, Bernabeu MO, Bordas R, Cooper J, Corrias A, Davit Y, Dunn S-J, Fletcher AG, Harvey DG, et al. , “Chaste: an open source c++ library for computational physiology and biology,” PLoS Computational Biology, vol. 9, no. 3, p. e1002970, 2013. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [32].Hoehme S. and Drasdo D, “A cell-based simulation software for multi-cellular systems,” Bioinformatics, vol. 26, no. 20, pp. 2641–2642, 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [33].Starruß J, de Back W, Brusch L, and Deutsch A, “Morpheus: a user-friendly modeling environment for multiscale and multicellular systems biology,” Bioinformatics, vol. 30, no. 9, pp. 1331–1332, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [34].Falk M, Ott M, Ertl T, Klann M, and Koeppl H, “Parallelized agent-based simulation on cpu and graphics hardware for spatial and stochastic models in biology,” in Proceedings of the 9th International Conference on Computational Methods in Systems Biology. ACM, 2011, pp. 73–82. [Google Scholar]
- [35].Zhang L, Jiang B, Wu Y, Strouthos C, Sun PZ, Su J, and Zhou X, “Developing a multiscale, multi-resolution agent-based brain tumor model by graphics processing units,” Theoretical Biology and Medical Modelling, vol. 8, no. 1, p. 46, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [36].Seekhao N, Shung C, JaJa J, Mongeau L, and Li-Jessen NY, “Real-time agent-based modeling simulation with in-situ visualization of complex biological systems a case study on vocal fold inflammation and healing,” IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [37].Ayachit U, Bauer A, Geveci B, O’Leary P, Moreland K, Fabian N, and Mauldin J, “Paraview catalyst: Enabling in situ data analysis and visualization,” in Proceedings of the First Workshop on In Situ Infrastructures for Enabling Extreme-Scale Analysis and Visualization. ACM, 2015, pp. 25–29. [Google Scholar]
- [38].Kuhlen T, Pajarola R, and Zhou K, “Parallel in situ coupling of simulation with a fully featured visualization system,” 2011. [Google Scholar]
- [39].Henderson A, Ahrens J, Law C, et al. , The ParaView Guide. Kitware Clifton Park, NY, 2004. [Google Scholar]
- [40].Childs H, Brugger E, Bonnell K, Meredith J, Miller M, Whitlock B, and Max N, “A contract based system for large data visualization,” in Visualization, 2005. VIS 05. IEEE. IEEE, 2005, pp. 191–198. [Google Scholar]
- [41].Su Y, Wang Y, and Agrawal G, “In-situ bitmaps generation and efficient data analysis based on bitmaps,” in Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing. ACM, 2015, pp. 61–72. [Google Scholar]
- [42].Krekhov A, Grüninger J, Schlönvoigt R, and Krüger J, “Towards in situ visualization of extreme-scale, agent-based, worldwide disease-spreading simulations,” in SIGGRAPH Asia 2015 Visualization in High Performance Computing. ACM, 2015, p. 7. [Google Scholar]
- [43].Coppoolse JM, Van Kooten T, Heris HK, Mongeau L, Li NY, Thibeault SL, Pitaro J, Akinpelu O, and Daniel SJ, “An in vivo study of composite microgels based on hyaluronic acid and gelatin for the reconstruction of surgically injured rat vocal folds,” Journal of Speech, Language, and Hearing Research, vol. 57, no. 2, pp. S658–S673, 2014. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [44].Seekhao N, JaJa J, Mongeau L, and Li-Jessen NY, “In situ visualization for 3d agent-based vocal fold inflammation and repair simulation,” Supercomputing Frontiers and Innovations, vol. 4, no. 3, p. 68, 2017. [DOI] [PMC free article] [PubMed] [Google Scholar]
- [45].Grottel S, Krone M, Müller C, Reina G, and Ertl T, “Megamol prototyping framework for particle-based visualization,” IEEE transactions on visualization and computer graphics, vol. 21, no. 2, pp. 201–214, 2015. [DOI] [PubMed] [Google Scholar]




