Advanced data visualization |
ADV can reduce quality problems which can occur when retrieving medical data for extra analysis |
Presto |
Distributed SQL query engine used to analyze huge amount of data that collected every single day |
The Hadoop Distributed File System (HDFS) |
HDFS enables the underlying storage for the Hadoop cluster and enhances healthcare data analytics system by dividing large amount of data into smaller one and distributed it across various servers/nodes |
MapReduce |
Breaks task into subtasks and gathering its outputs and efficient for large amount of data |
Mahout |
An apache project, goal is to generate free applications of distributed and scalable ML algorithms that supports healthcare data analytics on Hadoop systems |
Jaql |
Functional, declarative query language, aim to process large datasets. It facilitates parallel processing by converting high-level queries into low-level ones |
PIG and PIG Latin |
Configured to assimilate all types of data (structured/unstructured, etc.) |
Avro |
Facilitates data encoding and serialization that improves data structure by specifying data types, meaning and scheme |
Zookeeper |
Allows a centralized infrastructure with various services, providing synchronization across a cluster of servers |
Hive |
Hive is a run-time Hadoop support architecture that permits to develop Hive Query Language (HQL) statements akin to typical SQL statements |