. Author manuscript; available in PMC: 2018 Jun 7.

Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2018 Mar;10597:105790A. doi: 10.1117/12.2293694

Table 2.

Model of wall time and resource time for average analysis parameters definition

Definition	Description
#img	The total number of images that are needed to be averaged.
#job	The total number of map jobs that are split from large datasets.
η	The total number of images per map task, which is the chunk size. It helps find the total number of map jobs: #job = #img/η. This is the variable that we are trying to find an optimizing value. We assume there is no local weighted concern in split map tasks, it means all map tasks share the same value of η.
SizeBig	Maximum input file size of datasets, we use it for worst case scenario estimation.
SizeSmall	Minimum input file size of datasets, we use it for upper and lower bound of η.
SizeGen	Maximum output file size that is generated by image averaging software.
Bandwidth	The bandwidth of cluster.
VdiscR,VdiscW	Data read / write speed of local hard drive.
#region	The total number of regions of a table in cluster.
mem	The total memory of one machine. We presume all machines have same amount of memory.
core	The total number of CPU cores of the cluster.
α	When map tasks generate intermediate results, part of them are stored in network buffer, and others are flushed into local temporarily and transfer to reduce task’s shuffle phase later. α is unbuffered ratio of map tasks’ results due to limit of heap size and cannot be held in memory.
β	β is an experimental empirical parameter to represent the ratio of rack-local map task for Hadoop scenario, namely the data is loaded/stored via network. We empirically get its value with 0.9.
discR(x) discW(x)	x is the size of a file. The function is used to calculate the time to read / write a file from / to local disc, namely discR(x) = x/VdiscR; discW(x) = x/VdiscW
bdw(x)	x is the size of a file. The function is used to calculate the time to transfer through network, namely bdw(x) = x/Bandwidth
avgANTS(η)	x is the size of a file. We use ANTS AverageImages [24] to empirically test average summary statistics analysis. We also do several profiling experiments to model this image processing. However, it is hard to conclude a concrete model for ANTS average to match all profiling results. We can prove the total number of file sizes that are needed to be averaged (η·x) grows much slower than chunk size η itself. And the best solution is doing all average on 1 CPU. We found a worse case of avgANTS(η) = 0.4η + 5 to illustrate this process.