Skip to main content
. 2019 Mar 16;32(9):4417–4451. doi: 10.1007/s00521-019-04095-y

Table 1.

Big data analytics tools

Platforms and tools Description
Advanced data visualization ADV can reduce quality problems which can occur when retrieving medical data for extra analysis
Presto Distributed SQL query engine used to analyze huge amount of data that collected every single day
The Hadoop Distributed File System (HDFS) HDFS enables the underlying storage for the Hadoop cluster and enhances healthcare data analytics system by dividing large amount of data into smaller one and distributed it across various servers/nodes
MapReduce Breaks task into subtasks and gathering its outputs and efficient for large amount of data
Mahout An apache project, goal is to generate free applications of distributed and scalable ML algorithms that supports healthcare data analytics on Hadoop systems
Jaql Functional, declarative query language, aim to process large datasets. It facilitates parallel processing by converting high-level queries into low-level ones
PIG and PIG Latin Configured to assimilate all types of data (structured/unstructured, etc.)
Avro Facilitates data encoding and serialization that improves data structure by specifying data types, meaning and scheme
Zookeeper Allows a centralized infrastructure with various services, providing synchronization across a cluster of servers
Hive Hive is a run-time Hadoop support architecture that permits to develop Hive Query Language (HQL) statements akin to typical SQL statements