Table 1:
Source | Examples | Aspect of bigness1 | Key technical issues | Typical uses |
---|---|---|---|---|
-omic/biological | Whole exome profiling, metabolomics | Wide | Lab effects, informatics pipeline | Etiologic research, screening |
Geospatial | Neighborhood characteristics | Wide | Spatial autocorrelation | Etiologic research, surveillance |
Electronic health records | Records of all patients with hypertension | Tall, often also Wide | Data cleaning, natural language | Clinical research, surveillance |
Personal monitoring | Daily GPS records, Fitbit readings | Tall | Redundancy, inferring intentions | Etiologic research, potentially clinical decision-making |
Effluent data | Google Search Results, Reddit | Tall | Selection biases, natural language | Surveillance, screening, identifying hidden social networks. |
‘Wide’ datasets have many columns; ‘tall’ datasets have many rows.