Skip to main content
. Author manuscript; available in PMC: 2018 Jun 7.
Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2018 Mar;10597:105790A. doi: 10.1117/12.2293694

Table 2.

Model of wall time and resource time for average analysis parameters definition

Definition Description
#img The total number of images that are needed to be averaged.
#job The total number of map jobs that are split from large datasets.
η The total number of images per map task, which is the chunk size. It helps find the total number of map jobs: #job = #img/η. This is the variable that we are trying to find an optimizing value. We assume there is no local weighted concern in split map tasks, it means all map tasks share the same value of η.
SizeBig Maximum input file size of datasets, we use it for worst case scenario estimation.
SizeSmall Minimum input file size of datasets, we use it for upper and lower bound of η.
SizeGen Maximum output file size that is generated by image averaging software.
Bandwidth The bandwidth of cluster.
VdiscR,VdiscW Data read / write speed of local hard drive.
#region The total number of regions of a table in cluster.
mem The total memory of one machine. We presume all machines have same amount of memory.
core The total number of CPU cores of the cluster.
α When map tasks generate intermediate results, part of them are stored in network buffer, and others are flushed into local temporarily and transfer to reduce task’s shuffle phase later. α is unbuffered ratio of map tasks’ results due to limit of heap size and cannot be held in memory.
β β is an experimental empirical parameter to represent the ratio of rack-local map task for Hadoop scenario, namely the data is loaded/stored via network. We empirically get its value with 0.9.
discR(x) discW(x) x is the size of a file. The function is used to calculate the time to read / write a file from / to local disc, namely discR(x) = x/VdiscR; discW(x) = x/VdiscW
bdw(x) x is the size of a file. The function is used to calculate the time to transfer through network, namely bdw(x) = x/Bandwidth
avgANTS(η) x is the size of a file. We use ANTS AverageImages [24] to empirically test average summary statistics analysis. We also do several profiling experiments to model this image processing. However, it is hard to conclude a concrete model for ANTS average to match all profiling results. We can prove the total number of file sizes that are needed to be averaged (η·x) grows much slower than chunk size η itself. And the best solution is doing all average on 1 CPU. We found a worse case of avgANTS(η) = 0.4η + 5 to illustrate this process.