Skip to main content
. 2023 Jan 12;10:409. Originally published 2021 May 21. [Version 3] doi: 10.12688/f1000research.52791.3

Figure 2. A schematic representation of the relationship between Sherlock's two main components: the Query Engine (dockerized) and the Data Lake.

Figure 2.

Left side: The core component of the platform: the Presto Query Engine. It allows the user to execute different analytical SQL queries on top of the data files in the Data Lake (right side). Right side: The structure of our Data Lake. They are separated into four different zones. Working from bottom to top; 1) The Raw Zone contains the raw data from the different external or bespoke databases in their original formats. 2) The landing Zone, where the data is in JSON Lines format, is compatible with the Presto Query Engine. 3) The master zone, where the data is in a common optimized and compressed format, is called ORC. This format enables faster query execution than the landing zone. 4) The Project Zone, where we store exclusively the data needed for specific/active projects.