Structured data |
Datasets that are often represented as a table, where each row is a user’s record. A column contains each user’s value for the same attribute. |
Survey, census, and health care cross-sectional data with demographic attributes (such as sex and age), observations (such as medical condition). |
Set-valued (semistructured) data |
Datasets where each user is associated with a collection of data points. Each data point typically contains information about “actions” or “events” relating to the user. The most prominent types of set-valued data are time series, where each point includes both the timestamp and the description of an action or event. |
Most types of behavioral data, such as location data (containing individual trajectories, with time and location from GPS coordinates or cell towers, for each place visited), movies and videos watched online, and supermarket shopping data. |
Unstructured data |
Datasets that do not have a natural structured representation. |
Text data (messages, tweets, letters), graphs (social networks), images and videos (face, body posture, fingerprint, iris). |