Skip to main content
. Author manuscript; available in PMC: 2018 Jun 7.
Published in final edited form as: Proc SPIE Int Soc Opt Eng. 2018 Mar;10597:105790A. doi: 10.1117/12.2293694

Table 1.

HadoopBase-MIP system interface description

Operation Interface Parameter Description
Upload
  • -

    Table name: will create a table if it does not exist.

  • -

    A text file path for a file contains groups of <column family, qualifier> - to create / alter a table scheme.

  • -

    A text file path for a file contains all images’ tuple of <system file path, file unique name, column family, qualifier>; File unique name will be used as rowkey.

  • -

    Overwrite – Boolean value. It helps update images or avoid uploading duplicate data.

  • -

    Region split policy : default policy, hierarchical policy [2]

  • -

    Pre-split: Boolean value. It is only valid when creating a new table.

  • -

    A text file path for a file contains all rowkeys for pre-split a table.

Retrieve
  • -

    Table name

  • -

    Rowkey: set this value if retrieval is image based

  • -

    Start rowKey and / or stop row key: set them for a retrieval range, if both keys are empty, then retrieval is whole table column based.

  • -

    Column family

  • -

    Column qualifier

  • -

    A text file path for a file contains all data retrieval destination path.

  • -

    A text file path for a file contains all row keys need to skip to retrieve.

Delete All options are same with Retrieve except a text file path for a file contains all data retrieval destination path.
Monitor Hadoop built in job monitoring tool.
MapReduce Template
  • -

    Table name: two names, one for source table to read data, another one for target table to write back result.

  • -

    Column family: three values, one for data query (will introduce more in section 2.3), one for image data retrieval, the last one for target table.

  • -

    Column qualifier: three correspondence value with column family.

  • -

    A text file path for a file contains all start / stop row key pair, each pair is input for a map job.

  • -

    A text file path for a file contains all row keys need to skip to retrieve.

  • -

    Analysis level – three options, image-based [18], dataset based [17] and large dataset based (will introduce more in section 2.2 & 2.3).

Load Balancer HBase default built in balancer is to balance the total number of region on each server. Our proposed load balancer is offline greedy allocation. First it finds all regions and images on each serve; second, moving images based on region; finally, the data allocation ratio of each machine meets the ratio of (total number of CPU * Million instructions per seconds (MIPS)) per nodes. MIPS is calculated by Linux perf and is varied based on different types of CPU.
  • -

    Table name

  • -

    Column family

  • -

    Column qualifier