#img
|
The total number of images that are needed to be averaged. |
#job
|
The total number of map jobs that are split from large datasets. |
η |
The total number of images per map task, which is the chunk size. It helps find the total number of map jobs: #job = #img/η. This is the variable that we are trying to find an optimizing value. We assume there is no local weighted concern in split map tasks, it means all map tasks share the same value of η. |
SizeBig |
Maximum input file size of datasets, we use it for worst case scenario estimation. |
SizeSmall |
Minimum input file size of datasets, we use it for upper and lower bound of η. |
SizeGen |
Maximum output file size that is generated by image averaging software. |
Bandwidth |
The bandwidth of cluster. |
VdiscR,VdiscW |
Data read / write speed of local hard drive. |
#region
|
The total number of regions of a table in cluster. |
mem |
The total memory of one machine. We presume all machines have same amount of memory. |
core |
The total number of CPU cores of the cluster. |
α |
When map tasks generate intermediate results, part of them are stored in network buffer, and others are flushed into local temporarily and transfer to reduce task’s shuffle phase later. α is unbuffered ratio of map tasks’ results due to limit of heap size and cannot be held in memory. |
β |
β is an experimental empirical parameter to represent the ratio of rack-local map task for Hadoop scenario, namely the data is loaded/stored via network. We empirically get its value with 0.9. |
discR(x) discW(x) |
x is the size of a file. The function is used to calculate the time to read / write a file from / to local disc, namely discR(x) = x/VdiscR; discW(x) = x/VdiscW
|
bdw(x) |
x is the size of a file. The function is used to calculate the time to transfer through network, namely bdw(x) = x/Bandwidth
|
avgANTS(η) |
x is the size of a file. We use ANTS AverageImages [24] to empirically test average summary statistics analysis. We also do several profiling experiments to model this image processing. However, it is hard to conclude a concrete model for ANTS average to match all profiling results. We can prove the total number of file sizes that are needed to be averaged (η·x) grows much slower than chunk size η itself. And the best solution is doing all average on 1 CPU. We found a worse case of avgANTS(η) = 0.4η + 5 to illustrate this process. |