Skip to main content
Springer Nature - PMC COVID-19 Collection logoLink to Springer Nature - PMC COVID-19 Collection
. 2020 May 26;12137:538–553. doi: 10.1007/978-3-030-50371-0_40

A Novel Metric to Evaluate In Situ Workflows

Tu Mai Anh Do 15,, Loïc Pottier 15, Stephen Thomas 16, Rafael Ferreira da Silva 15, Michel A Cuendet 17, Harel Weinstein 17, Trilce Estrada 18, Michela Taufer 16, Ewa Deelman 15
Editors: Valeria V Krzhizhanovskaya8, Gábor Závodszky9, Michael H Lees10, Jack J Dongarra11, Peter M A Sloot12, Sérgio Brissos13, João Teixeira14
PMCID: PMC7302250

Abstract

Performance evaluation is crucial to understanding the behavior of scientific workflows and efficiently utilizing resources on high-performance computing architectures. In this study, we target an emerging type of workflow, called in situ workflows. Through an analysis of the state-of-the-art research on in situ workflows, we model a theoretical framework that helps characterize such workflows. We further propose a lightweight metric for assessing resource usage efficiency of an in situ workflow execution. By applying this metric to a simple, yet representative, synthetic workflow, we explore two possible scenarios (Idle Simulation and Idle Analyzer) for the execution of real in situ workflows. Experimental results show that there is no substantial difference in the performance of both the in transit placement (analytics on dedicated nodes) and the helper-core configuration (analytics co-allocated with simulation) on our target system.

Keywords: Scientific workflow, In situ model, Molecular dynamics, High-performance computing

Introduction

High performance computing (HPC) is mainstream for enabling the execution of scientific workflows which are composed of complex executions of computational tasks and the constantly growing data movements between those tasks [14]. Traditionally, a workflow describes multiple computational tasks and represents data and control flow dependencies. Moreover, the data produced by the scientific simulation are stored in persistent storage and visualizations or analytics are performed post hoc. This approach is not scalable, mainly due to the fact that scientific workflows are becoming increasingly compute- and data-intensive at extreme-scale [13]. Storage bandwidth has also failed to keep pace with the rapid computational growth of modern processors due to the stagnancy of I/O advancements [7, 16]. This asymmetry in I/O and computing technologies, which is being observed in contemporary and emerging computing platforms, prevents post hoc processing from handling large volumes of data generated by large-scale simulations [2]. Therefore, storing the entire output of scientific simulations on disk causes major bottlenecks in workflow performance. To reduce this disparity, scientists have moved towards a new paradigm for scientific simulations called in situ, in which data is visualized and/or analyzed as it is generated [2]. This accelerates simulation I/O by bypassing the file system, pipelining the analysis, and improving the overall workflow performance [4].

An in situ workflow describes a scientific workflow with multiple components (simulations with different parameters, visualization, analytics, etc.) running concurrently [4, 16], potentially coordinating their executions using the same allocated resources to minimize the cost of data movement [2]. Data periodically produced by the main simulation are processed, analyzed, and visualized at runtime rather than post-processed on dedicated nodes. This approach offers many advantages for processing large volumes of simulated data and efficiently utilizing computing resources. By co-locating the simulation and the analysis kernels, in situ solutions reduce the global I/O pressure and the data footprint in the system [7]. To fully benefit from these solutions, the simulation and the analysis components have to be effectively managed so that they do not slow each other down. Therefore, in this paper we study and characterize two specific categories of in situ workflows, namely helper-core and in transit workflows [2]. Specifically, in the helper-core workflow, the analysis component is placed on the same node as the simulation; while in the in transit workflow, simulation data is staged to a dedicated node where the analysis is allocated. Workflows are required to capture the individual behavior of multiple coupled workflow components (i.e., concurrent executions of overlapped steps with inter-component data dependency). Throughout the use of a theoretical framework, this paper aims to provide guidelines for the evaluation and characterization of in situ workflows. We target a widely-used class of workflows, namely large-scale molecular dynamics (MD) simulations. We argue that the proposed solutions and the lessons learned from the proposed synthetic in situ workflows can be directly translated into production in situ workflows. This work makes the following contributions:

  1. We discuss practical challenges in evaluating next-generation workflows;

  2. We define a non-exhaustive list of imperative metrics that need to be monitored for aiding the characterization of in situ workflows. We model a framework for in situ execution to formalize the iterative patterns in in situ workflows—we develop a lightweight approach that is beneficial when comparing the performance of configuration variations in an in situ system; and

  3. We provide insights into the behaviors of in situ workflows by applying the proposed metric in characterizing an MD workflow.

Background and Related Work

In Situ Workflows Monitoring. Many monitoring and performance profiling tools for HPC applications have been developed over the past decade [5, 12]. With the advent of in situ workflows [3], new monitoring and profiling approaches targeting tightly-coupled workflows have been studied. LDMS [1] is a loosely-integrated scalable monitoring infrastructure that targets general large-scale applications that delivers a low-overhead distributed solution, in contrast to TAU [12], which provides a deeper understanding of the application at a higher computational cost. SOS [17] provides a distributed monitoring platform, conceptually similar to LDMS but specifically designed for online in situ characterization of HPC applications. ADIOS [9], the next-generation IO-stack, is built on top of many in situ data transport layers, e.g. DataSpaces [6]. Savannah [7], a workflow orchestrator, has been leveraged to bundle a coupled simulation with two main simulations, multiple analysis kernels, and a visualization service [4]. These works mainly focus on providing monitoring schemes for in situ workflows. This paper instead proposes a novel method to extract useful knowledge from the captured performance data.

In Situ Data Management. FlexAnalytics [19] optimizes the performance of coupling simulations with in situ analytics by evaluating data compression and query over different I/O paths: memory-to-memory and memory-to-storage. A large-scale data staging implementation [18] over MPI-IO operations describes a way to couple with in situ analysis using a non-intrusive approach. The analytics accesses data staged to the local persistent storage of compute nodes to enhance data locality. Our work mainly focuses on in-memory staging and comprehensively characterizes memory-to-memory transfer using RDMA for both within a compute node (helper-core) and across nodes (in transit).

General In Situ Workflow Architecture

In Situ Architecture. In this work, we propose an in situ architecture that enables a variety of in situ placements to characterize the behavior of in situ couplings. Although we focus on a particular type of in situ workflows (composed of simulation and data analytics), our approach is broader and applicable to a variety of in situ components, for example, several simulations coupled together. The in situ workflow architecture (Fig. 1) features three main components:

  • A simulation component that performs MD computations and periodically generates data in the form of atomic coordinates.

  • A data transport layer (DTL) that is responsible for efficient data transfer.

  • An analyzer component that applies several analysis kernels to the data received periodically from the simulation component via the DTL.

Fig. 1.

Fig. 1.

A general in situ workflow software architecture.

On the data path “simulation-to-analyzer” (1) the ingester ingests data from a certain data source and stores them in the Data Transport Layer (DTL) and the (2) retriever, in a reverse way, gets data from the DTL to perform further operations. These two entry points allow us to abstract and detach complex I/O management from the application code. This approach enables more control in terms of in situ coupling and is less intrusive than many current approaches. The ingester synchronously inputs data from the simulation by sequentially taking turns with the simulation using the same resource. The ingester is useful to attach simple tasks to preprocess data (e.g., data reduction, data compression). The architecture allows in situ execution with various placements of the retriever. A helper-core retriever co-locates with the ingester on a subset of cores where the simulation is running—it asynchronously gets data from the DTL to perform an analysis. As the retriever is using the helper-core placement, the analysis should be lightweight to prevent simulation slowdown. An in transit retriever runs on dedicated resources (e.g., staging I/O nodes [19]), receives data from the DTL and performs compute-intensive analysis tasks. We compare helper-core and in transit retrievers in detail in Sect. 6.

In Situ Workflow Metrics. To characterize in situ workflows, we have defined a foundational set of metrics (Table 1). As a first metric, it is natural to consider the makespan, which is defined as three metrics corresponding to time spent in each component: the simulation, the analyzer, and the DTL. The periodic pattern enacted by in situ workflows may impose data dependencies between steps of coupled components, e.g. the analyzer may have to wait for data sent by the simulation to become available in the DTL for reading. Thus, we monitor the idle time of each individual component. In this work, we use TAU to capture this information and we focus on how to use these data to characterize the in situ workflows.

Table 1.

Selected metrics for in situ workflows characterization

Name Definition Unit
Inline graphic Total workflow execution time s
Inline graphic Total time spent in the simulation s
Inline graphic Total time spent in the analysis s
Inline graphic Total time spent in data transfers s
Inline graphic Idle time during simulation s
Inline graphic Idle time during analysis s

In Situ Execution Model

Framework

In traditional workflows, the simulation and the post-processing analyzer are typical components, in which the post-processing follows the simulation in a sequential manner. Let a stage be a part of a given component. The simulation component (Inline graphic) is the computational step that produces the data and (Inline graphic) is the I/O stage that writes the produced data; The analytics component (Inline graphic) is the DTL stage that reads the data previously written and (Inline graphic) is the analysis stage.

However, in situ workflows exhibit a periodic behavior: Inline graphic, Inline graphic, Inline graphic, and Inline graphic follow the same sequential order but, instead of operating on all the data, they operate only on a subset of it iteratively. Here, this subset of data is called a frame and can be seen as a snapshot of the simulation at a given time t. Let Inline graphic, Inline graphic, Inline graphic, and Inline graphic be respectively, the simulation, the write, the read and the analysis stage at step i, respectively. In other words, Inline graphic produces the frame i, Inline graphic writes the produced frame i into a buffer, Inline graphic reads the frame i, and Inline graphic analyzes the frame i. Note that, an actual simulation computes for a given number of steps, but only a subset of these steps are outputted as frames and analyzed [15]. The frequency of simulation steps to be analyzed is defined by stride. Let n be the total number of simulation steps, S the stride, and m the number of steps actually analyzed and outputted as frames. We have Inline graphic. However, we model the in situ workflow itself and not only the simulation time (i.e., execution times of Inline graphic, Inline graphic, Inline graphic, and Inline graphic are known beforehand), thus the value of the stride does not impact our model. We set Inline graphic (or Inline graphic), so every step always produces a frame.

Execution Constraints. To ensure work conservation we define the following constraint: Inline graphic (m is the number of produced frames). Obviously, we have identical constraints for Inline graphic, Inline graphic, and Inline graphic. Similarly to the classic approach, we have the following precedence constraints for all i with Inline graphic:

graphic file with name M35.gif 1

Buffer Constraints. The pipeline design of in situ workflows introduces new constraints. We consider a frame is analyzed right after it has been computed. This implies that for any given step i, the stage Inline graphic can start if, and only if, the stage Inline graphic has been completed. Formally, for all i with Inline graphic:

graphic file with name M39.gif 2

Equations 1 and 2 guarantee that we buffer at most one frame at a time (Fig. 2). Note that, this constraint can be relaxed such that up to k frames can be buffered at the same time as follows, Inline graphic, where Inline graphic and Inline graphic. In this work, we only consider the case Inline graphic (red arrows in Fig. 3).

Fig. 2.

Fig. 2.

Two different execution scenarios for in situ workflow execution.

Fig. 3.

Fig. 3.

Dependency constraints within and across in situ steps. (Color figure online)

Idle Stages. Due to the above constraints, the different stages are tightly-coupled (i.e, Inline graphic and Inline graphic stages must wait Inline graphic and Inline graphic before starting their executions). Therefore, idle periods could arise during the execution (i.e., either the simulation or the analytics must wait for the other component). We can characterize two different scenarios, Idle Simulation and Idle Analysis in which idle time occurs. The former (Fig. 2(a)) occurs when analyzing a frame takes longer to complete compared to a simulation cycle (i.e., Inline graphic). The later (Fig. 2(b)) occurs when the simulation component takes longer to execute (i.e., Inline graphic). Figure 3 provides a detailed overview of the dependencies among the different stages. Note that, the concept of in situ step is defined and explained later in the paper.

Intuitively, we want to minimize the idle time on both sides. If the idle time is absent, then it means that we reach the idle-free scenario: Inline graphic. To ease the characterization of these idle periods, we introduce two idle stages, one per component. Let Inline graphic and Inline graphic be, respectively, the idle time occurring in the simulation and in the analysis component for the step i. These two stages represent the idle time in both components, therefore the precedence constraint defined in Eq. 1 results in:

graphic file with name M53.gif 3

Consistency Across Steps

This work is supported by the hypothesis that every execution of in situ workflows under the above constraints will reach a consistent state after a finite number of warming-up steps. Thus, the time spent on each stage within an iteration can be considered constant over iterations. Formally, there exists j where Inline graphic such that for all i where Inline graphic, we have Inline graphic. The same holds for each stage, Inline graphic, Inline graphic, Inline graphic, Inline graphic, and, Inline graphic. This hypothesis is confirmed in Sect. 6, and in practice, we observe that the cost of these non-consistent steps is negligible. Our experiments showed that, on average, Inline graphic for one hundred steps (Inline graphic). Therefore, we ignore the warming steps and we consider Inline graphic. For the sake of simplicity, we generalize in situ consistency behavior by denoting Inline graphic for all Inline graphic. We also have similar notations for Inline graphic and Inline graphic. This hypothesis allows us to predict the performance of a given in situ workflow by monitoring a subset of steps, instead of the whole workflow. From the two constraints defined by Eq. 3 and Eq. 2, and our hypothesis, we define:

graphic file with name M69.gif 4

The Idle Simulation scenario is when Inline graphic, and Inline graphic for Idle Analyzer scenario. Let Inline graphic be the total idle time for an in situ step, using Eq. 4 we derive:

graphic file with name M73.gif 5

In Situ Step

The challenge behind in situ workflows evaluation lies in collecting global information from multiple components (in our case, the simulation and the analytics) and use this information to derivate meaningful characteristics about the execution. The complexity of such a task is correlated to the number of steps the workflow is running and the number of components involved. By leveraging the consistency hypothesis in Eq. 4, we propose to alleviate this cost by proposing a metric that does not require data from all steps. The keystone of our approach is the concept of in situ step. Based on Eq. 3, the in situ step Inline graphic is determined by the timespan between the beginning of Inline graphic and the end of Inline graphic. The in situ step concept helps us to manipulate all the stages of a given step as one consistent task executing across components that can potentially run on different machines.

Different in situ steps overlap each other, so we need to distinguish the part that is overlapped (Inline graphic) from the other part (Inline graphic). Thus, Inline graphic. For example, in Fig. 2(a), to compute the time elapsed between the start of Inline graphic and the end of Inline graphic, we need to sum the two steps and remove the overlapped execution time Inline graphic. Thus, we obtain Inline graphic. This simple example will give us the intuition behind the makespan computation in Sect. 4.4.

The consistency hypothesis insures consistency across in situ steps. We denote Inline graphic as the consistent in situ step (i.e, Inline graphic), while Inline graphic and Inline graphic indicate, respectively, the overlapped and the non-overlapped part of two consecutive in situ steps. Thus, Inline graphic. To calculate the makespan, we want to compute the non-overlapped step Inline graphic. As shown in Fig. 2, the non-overlapped period Inline graphic is the aggregation of all stages belonging to one single component in an in situ step: Inline graphic and Inline graphic. Thus, we have two scenarios, if Inline graphic then Inline graphic, otherwise Inline graphic. Hence, Inline graphic.

Makespan

A rough estimation of the makespan of such workflow would be the sum of the execution time for all the stages (i.e, sum up the m in situ steps Inline graphic). But, recall that in situ steps interleave with each other, so we need to subtract the overlapped parts:

graphic file with name M98.gif 6

From Eq. 6, for m large enough, the term Inline graphic becomes negligible. Since in situ workflows are executed with a large number of iterations, then Inline graphic. This observation indicates that the non-overlapped part of an in situ step is enough to characterize a periodic in situ workflow. Using our framework and these observations, we define a metric to estimate the efficiency of a workflow.

In Situ Efficiency

Based on our in situ execution model, we propose a novel metric to evaluate resource usage efficiency E of an in situ workflow. We define efficiency as the time wasted during execution—i.e., idle times Inline graphic and Inline graphic. This metric considers all the components (simulation and the analysis) for evaluating in situ workflows:

graphic file with name M103.gif 7

This efficiency metric allows for performance comparison between different in situ runs with different configurations. By examining only one non-overlapped in situ step, we provide a lightweight approach to observe behavior from multiple components running concurrently in an in situ workflow. Note that, this model and the efficiency metric can be easily generalized to any number of components.

Molecular Dynamics Synthetic Workflow

MD is one of the most popular scientific applications executing on modern HPC systems. MD simulations reproduce the time evolution of molecular systems at a given temperature and pressure by iteratively computing inter-atomic forces and moving atoms over a short time step. The resulting trajectories allow scientists to understand molecular mechanisms and conformations. In particular, a trajectory is a series of frames, i.e. sets of atomic positions saved at fixed intervals of time. The stride is the number of time steps between frames considered for storage or further in situ analysis. For example in our framework, for a simulation with 100 steps and a stride of 20, only 5 frames will be sent by the simulation to the analysis component. Since trajectories are high-dimensional objects, scientists use well-chosen collective variables (CVs) to capture important molecular motions. Technically, a CV is defined as a function of the atomic coordinates in one frame. An example of a complex CV that we will use in this work is the Largest Eigenvalue of the Bipartite Matrix (LEBM). Given two amino acid segments A and B, if Inline graphic is the Euclidean distance between Inline graphic atoms i and j, then the symmetric bipartite matrix Inline graphic is defined by Inline graphic if Inline graphic or Inline graphic and 0 otherwise. Johnston et al. [8] showed that the largest eigenvalue of Inline graphic is an efficient proxy to monitor changes in the conformation of A relative to B. To study complex behavior of coupling between the MD simulation and the analysis component in exhaustively discussed parameter space, we have designed a synthetic in situ MD workflow (Fig. 4). The Synthetic Simulation component extracts frames from previously computed MD trajectories instead of performing an actual, compute-intensive MD simulation. This synthetic workflow is an implementation of the general and abstract software architecture proposed in Sect. 3.

Fig. 4.

Fig. 4.

Synthetic Workflow: the Extractor (1) sleeps during the emulated delay, then (2) extracts a snapshot of atomic states from existing trajectories and (3) stores it into a synthetic frame. The Ingestor (4) serializes the frame as a chunk and stages it in memory, then the Retriever (5) gets the chunk from the DTL and deserializes it into a frame.

Synthetic Simulation. The Synthetic Simulation emulates the process of a real MD simulation by extracting frames from trajectories generated previously by an MD simulation engine. The Synthetic Simulation enables us to tune and manage many simulation parameters (discussed in detail in Sect. 6) including the number of atoms and strides, which helps the Synthetic Simulation mimic the behavior of real molecular dynamics simulation. Note that, since the Synthetic Simulation does not emulate the computation part of the real MD simulation, it mimics the behavior of the I/O processes of the simulation. Thus, we define the emulated simulation delay, which is the period of time corresponding to the computation time in the real MD simulation. In order to estimate such delay for emulating the simulation time for a given stride and number of atoms, we use recent benchmarking results from the literature obtained by running the well-known NAMD [10] and Gromacs [11] MD engines. We considered the benchmarking performance for five practical system sizes of 81K, 2M, 12M atoms from Gromacs [11] and 21M, 224M atoms from NAMD [10] to interpolate to the simulation performance with the desired number of atoms (Fig. 5(a)). The interpolated value is then multiplied by the stride to obtain the delay (i.e., a function of both the number of atoms and the stride).

Fig. 5.

Fig. 5.

MD benchmarking results from the literature obtained by using 512 NVIDIA K20X GPUs. The results are interpolated to obtain the (a) estimated performance and then combined with the stride to synthesize the (b) emulated simulation delay.

In Sect. 6, we run the synthetic workflow with 200K, 400K, 800K, 1.6M, 3.2M, and 6.4M protein atoms. Figure 5(b) shows the emulated simulation delay when varying the stride for different numbers of atoms. The stride varying between 4–16K delivers a wide range of emulated simulation delay, up to 40 s.

Data Transport Layer (DTL). The DTL Server leverages DataSpaces [6] to deploy an in memory staging area for coupling data between the Synthetic Simulation and the Analyzer. DataSpaces follows the publish/subscribe model in terms of data flow, and the client/server paradigm in terms of control flow. The workflow system has to manage a DataSpaces server to manage data requests, keep metadata, and create in memory data objects.

Analyzer. The Analyzer plays the role of the analytics component in the synthetic in situ workflow. More specifically, the Retriever subscribes to a chunk from the in memory staging area and deserializes it into a frame. The MD Analytics then performs a given type of analysis on this frame. Recall that, in our model, only one frame at a time can be store by the DTL (see Fig. 3). We leverage DataSpaces built-in locks to ensure that a writing operation to the in memory staging area can only happen when the reading operation of the previous step is complete (constraint model by Eq. 2). Thus, the Analyzer is instructed by DataSpaces to wait for the next chunk available in the in memory staging area. Once a chunk has been received and is being processed, the Synthetic Simulation can send another chunk to the Analyzer.

Experiments and Discussions

For our experiments we use Tellico (UTK), an IBM POWER9 system that includes four 32-core nodes (2 compute nodes) with 256GB RAM each. Each compute node is connected through an InfiniBand interconnect network. We target three component placements: (1) helper-core—where the Synthetic Simulation, the DataSpaces server, and the Analyzer are co-located on the same compute node; (2) in transit S-S where the Synthetic Simulation and the DataSpaces server are co-located on one node, and the Analyzer runs on another node; and (3) in transit A-S where the Analyzer and the DataSpaces server are co-located on one node, and the Synthetic Simulation runs on a dedicated node. Note that the Synthetic Simulation only runs on one physical core as it mimics the behavior of a real simulation. On the other hand, the Analyzer assigns bipartite matrices (see Sect. 5) to multiple processes, so the CV calculation is improved by parallel processing. Since we experimented to designate different numbers of Analyzer processes, we fix that number at 16 processes (number of cores of an IBM AC922) to attain good speed up and to fit the entire Analyzer in a compute node.

Table 2 describes the parameters used in the experiments. For the Synthetic Simulation, we study the impact of the number of atoms (the size of the system) and the stride (the frequency at which the Synthetic Simulation component sends a frame to the Analyzer through the DTL). We consider a constant number of 100 frames to be analyzed due to the time constraint and the consistency in the behavior between in situ steps. For the DTL, we use the staging method DATASPACES for all the experiments. For the Analyzer, we choose to calculate a compute-intensive set of CVs (LEBM, Sect. 5) for each possible pair of non-overlapping segments of length 16. If there are n amino acids (alpha amino acids) in the system, there are Inline graphic segments, which amounts to Inline graphic LEBM calculations (Inline graphic). To fairly interpret the complexity of this analysis algorithm related to the system size, we manipulate the number of amino acids to be proportional to the number of atoms.

Table 2.

Parameters used in the experiments

Parameter Description Values used in the experiments
Synthetic simulation #atoms Number of atoms [Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic, Inline graphic]
#strides Stride [1000, 4000, 16000]
#frames Number of frames 100
Data transport layer SM Staging method DATASPACES
Analyzer

CV

lsegment

Collective variable

Length of segment pairs

LEBM

16

We leverage user-defined events to collect the proposed metrics using TAU [12] (Sect. 3). We focus on two different levels of information, the workflow level and the in situ step level (the time taken by the workflow to compute and analyze one frame). At the in situ step level, each value is averaged over three runs for each step. At the workflow level, each value is averaged over all in situ steps at steady-sate and then averaged over the three runs. We also depict the standard deviation to assess the statistical significance of the results. There are two levels of statistical error: for averages across the 3 trials at the in situ step level, and for averages over 94 in situ steps (excluding the three first steps and the three last steps) in each run at the workflow level.

Experimental Results

In Situ Step Characterization. We study the correlation of individual stages in each in situ step. Due to lack of space, the discussion is limited to a subset of the parameter space as the representative of two characterized idle-based scenarios. Figure 6 shows the execution time per step for each component while varying the stride. Confirming the consistency hypothesis across steps discussed in Sect. 4.2, we observe that the execution time per step is nearly constant, except for a few warm-up and wrap-up steps. Figure 6(a) falls under the Idle Simulation (IS) scenario, as the Inline graphic stage only appears in the Synthetic Simulation step. Similarly in Fig. 6(b), we observe the Idle Analyzer (IA) scenario because of the presence of Inline graphic. These findings verify the existence of two idle-based scenarios discussed in Sect. 4.1. Since both the Synthetic Simulation and Analyzer are nearly synchronized, we also underline that the execution time of a single step for each component is equal to each other. This information confirms the property of in situ workflow in Eq. 4. Overall, we can observe that the I/O stages (Inline graphic and Inline graphic) take an insignificant portion of time compared to the full step. This negligible overhead verifies the advantage of leveraging in-memory staging for exchanging frames between coupled components.

Fig. 6.

Fig. 6.

Execution time per step for each component with a helper-core placement (all component on the same node). The Synthetic Simulation stages are on the left and the Analyzer stages are on the right (lower is better).

Execution Scenarios. We study the impact of the number of atoms, stride, and components placement on the performance of the entire workflow and each component for different scenarios. Figure 7 shows that the workflow execution follows our model (Fig. 2). While increasing the number of atoms, which increases the simulation time and the chunk size, the total idle time Inline graphic decreases in the IS scenario, and increases in the IA scenario. Every in situ step exhibits a similar pattern in which at a certain system size the workflow execution switches from one scenario to another. We denote this point as the equilibrium point. We notice that with larger stride, the equilibrium point occurs at larger system sizes. The equilibrium point of the stride 4, 000 occurs at Inline graphic, but at Inline graphic with the stride of 16, 000. In terms of component placement comparison, at first glance, there is no big difference in total idle time of the three placements (see Sect. 6.1).

Fig. 7.

Fig. 7.

Detailed idle time Inline graphic for three component placements at different strides when varying the number of atoms (lower is better).

Estimated Makespan. The goal is to verify the assertion made by Eq. 6 stating the makespan of an in situ workflow can simply be expressed as the product of the number of steps and the time of one step Inline graphic. A typical MD simulation can easily feature >10Inline graphic number of in situ steps, thus a metric requiring only a few steps to be accurate is interesting. Figure 8 demonstrates the strength of our approach to estimate the Inline graphic (maximum error Inline graphic5%) using in situ steps and the accuracy of our model. In situ workflows run with a larger number of steps, monitoring the entire system increases the pressure and slows down the execution. Thus, without failures and external loads, only looking at a single non-overlapped step results in a scalable, accurate, and lightweight approach.

Fig. 8.

Fig. 8.

Left: Inline graphic is estimated from Inline graphic using the helper-core component placement with stride 16000, the yellow region represents the error. Ratio of Estimated Inline graphic to Measured Inline graphic uses the second y-axes on the right (close to 1 is better). Right: Resource usage efficiency (higher is better). (Color figure online)

Resource Usage Efficiency. We utilize the efficiency metric E given by Eq. 7, to evaluate an in situ configuration within the objective to propose a metric that allows users to characterize in situ workflows. Figure 8-right shows that component placements have a small variation in resource efficiency. We infer that the placement of components is not a decisive factor in coupling performance. This result is consistent with the previous finding in Sect. 6.1. The efficiency E increases and reach a maximum in the IS scenario, and decreases after this maximum in the IA scenario. Thus, an in situ run is most efficient at the equilibrium point, where Inline graphic. If a run is less efficient and classified as the IA scenario, it has more freedom to perform other analyses or increase the analysis algorithm’s complexity. In the IS scenario, the simulation is affordable to larger system size.

Conclusions

In this study, we explored the challenges of evaluating next-generation in situ workflows. We have provided an analysis of in situ workflows by identifying a set of metrics that should be monitored to assess the performance of these workflows on HPC architectures. We have designed a lightweight metric based on behavior consistency across in situ steps under constrained in situ execution model. We have validated the usefulness of this proposed metric with a set of experiments using an in situ MD synthetic workflow. We have compared three different placements for the workflow components, a helper-core placement and two in transit placements in which the DTL server co-locates with different components. Future work will study different models where the constraints are relaxed, for example where the workflow allows to buffer multiple frames in memory instead of one currently. Another promising research line is to extend our theoretical framework to take into account multiple analysis methods. In this case, the time taken by the analysis could vary regarding the method used.

Acknowledgments

This work is funded by NSF contracts #1741040, #1740990 and #1741057; and DOE contract #DE-SC0012636. We are grateful to IBM for the Shared University Research Award that supported the purchase of IBM Power9 system used in this paper.

Contributor Information

Valeria V. Krzhizhanovskaya, Email: V.Krzhizhanovskaya@uva.nl

Gábor Závodszky, Email: G.Zavodszky@uva.nl.

Michael H. Lees, Email: m.h.lees@uva.nl

Jack J. Dongarra, Email: dongarra@icl.utk.edu

Peter M. A. Sloot, Email: p.m.a.sloot@uva.nl

Sérgio Brissos, Email: sergio.brissos@intellegibilis.com.

João Teixeira, Email: joao.teixeira@intellegibilis.com.

Tu Mai Anh Do, Email: tudo@isi.edu.

Loïc Pottier, Email: lpottier@isi.edu.

Stephen Thomas, Email: sthoma99@utk.edu.

Rafael Ferreira da Silva, Email: rafsilva@isi.edu.

Michel A. Cuendet, Email: mac2109@med.cornell.edu

Harel Weinstein, Email: haw2002@med.cornell.edu.

Trilce Estrada, Email: estrada@cs.unm.edu.

Michela Taufer, Email: mtaufer@utk.edu.

Ewa Deelman, Email: deelman@isi.edu.

References

  • 1.Agelastos, A., et al.: LDMS: a scalable infrastructure for continuous monitoring of large scale computing systems and applications. In: SC 2014 (2014)
  • 2.ASCR Workshop on In Situ Data Management (2019)
  • 3.Bauer, A.C., et al.: In situ methods, infrastructures, and applications on high performance computing platforms. In: Computer Graphics Forum, vol. 35 (2016)
  • 4.Choi, J.Y., et al.: Coupling exascale multiphysics applications: methods and lessons learned. In: 2018 IEEE 14th International Conference on e-Science (e-Science) (2018)
  • 5.DeRose L, Homer B, Johnson D, Kaufmann S, Poxon H. Cray performance analysis tools. In: Resch M, Keller R, Himmler V, Krammer B, Schulz A, editors. Tools for High Performance Computing. Heidelberg: Springer; 2008. pp. 191–199. [Google Scholar]
  • 6.Docan C, et al. DataSpaces: an interaction and coordination framework for coupled simulation workflows. Cluster Comput. 2012;15:163–181. doi: 10.1007/s10586-011-0162-y. [DOI] [Google Scholar]
  • 7.Foster I, et al. Computing just what you need: online data analysis and reduction at extreme scales. In: Rivera FF, Pena TF, Cabaleiro JC, et al., editors. Euro-Par 2017: Parallel Processing; Cham: Springer; 2017. pp. 3–19. [Google Scholar]
  • 8.Johnston T, et al. In situ data analytics and indexing of protein trajectories. J. Comput. Chem. 2017;38:1419–1430. doi: 10.1002/jcc.24729. [DOI] [PubMed] [Google Scholar]
  • 9.Lofstead, J.F., et al.: Flexible IO and integration for scientific codes through the adaptable IO system (adios). In: 6th International Workshop on Challenges of Large Applications in Distributed Environments (2008)
  • 10.NAMD performance. https://www.ks.uiuc.edu/Research/namd/benchmarks/
  • 11.Páll S, Abraham MJ, Kutzner C, Hess B, Lindahl E. Tackling exascale software challenges in molecular dynamics simulations with GROMACS. In: Markidis S, Laure E, editors. Solving Software Challenges for Exascale; Cham: Springer; 2015. pp. 3–27. [Google Scholar]
  • 12.Shende SS, Malony AD. The Tau parallel performance system. Int. J. High Perform. Comput. Appl. 2006;20(2):287–311. doi: 10.1177/1094342006064482. [DOI] [Google Scholar]
  • 13.da Silva RF, et al. A characterization of workflow management systems for extreme-scale applications. Future Gener. Comput. Syst. 2017;75:228–238. doi: 10.1016/j.future.2017.02.026. [DOI] [Google Scholar]
  • 14.Taylor IJ, Deelman E, Gannon GB, Shields M, editors. Workflows for e-Science: Scientific Workflows for Grids. London: Springer; 2007. [Google Scholar]
  • 15.Thomas, S., et al.: Characterizing in situ and in transit analytics of molecular dynamics simulations for next-generation supercomputers. In: 15th eScience (2019)
  • 16.Vetter, J.S., et al.: Extreme heterogeneity 2018 - productive computational science in the era of extreme heterogeneity: report for DOE ASCR workshop on extreme heterogeneity. Technical report, LBNL, Berkeley, CA, USA, December 2018
  • 17.Wood, C., et al.: A scalable observation system for introspection and in situ analytics. In: 2016 5th Workshop on Extreme-Scale Programming Tools (ESPT) (2016)
  • 18.Wozniak, J.M., et al.: Big data staging with MPI-IO for interactive x-ray science. In: 2014 IEEE/ACM International Symposium on Big Data Computing (2014)
  • 19.Zou H, et al. FlexAnalytics: a flexible data analytics framework for big data applications with I/O performance improvement. Big Data Res. 2014;1:4–13. doi: 10.1016/j.bdr.2014.07.001. [DOI] [Google Scholar]

Articles from Computational Science – ICCS 2020 are provided here courtesy of Nature Publishing Group

RESOURCES