Figure 1.
Schematic overview of the data flow of extract-load-transform (ELT) process to build the data warehouse from ENA and ePMC datasets. ENA records are parsed (A1), filtered for valid country tags, and fed into ePMC RESTful API to extract matching secondary publication (B1) by ENA accession or project accession numbers. Primary publications are linked by ENA record (A2) to the DOI, PMCID, or PMID. The resulting datasets are normalized as tables ENA_SEQUENCES, PMC_REFERENCES and loaded into the data warehouse (A3, B2). This is complemented by a manual ingested list of the world's countries and economics groups into the tables COUNTRIES and COUNTRY2GRP, respectively (C1). Finally, SQL queries are applied to generate charts and reports in the Web application.