Table 1.
Dataset | # of samples | # of features | # of classes | Classification task |
---|---|---|---|---|
Classification of body sites | ||||
Costello et al. (2009) Body Habitat (CBH) | 552 | 1454 | 6 | Classify body habitats: skin (357), oral cavity (46), External Auditory Canal (44), Hair (14), Nostril (46), Feces (45) |
Costello et al. (2009) Skin Sites (CSS) | 357 | 600 | 12 | Classify skin sites: external nose (14), forehead (32), glans penis (8), labia minora (6), axilla (28), pinna (27), palm (64), palmar index finger (28), plantar foot (64), popliteal fossa (46), velar forearm (28), umbilicus (12) |
Human Microbiome Project (HMP) | 1025 | 323 | 5 | Classify 5 major body sites: anterior nares (269), buccal mucosa (312), stool (319), supragingival plaque (313), tongue dorsum (316) |
Classification of subjects | ||||
Costello et al. (2009) Subject (CS) | 140 | 464 | 7 | Classify 7 subjects: (20, 20, 20, 20, 20, 20, 20) |
Fierer et al. (2010) Subject (FS) | 104 | 294 | 3 | Classify 3 subjects: (40, 33, 31) |
Fierer et al. (2010) Subject x Hand (FSH) | 98 | 294 | 6 | Classify by subject and left/right hand: (20, 18, 17, 14, 16, 13) |
Classification of disease states | ||||
Inflammatory Bowel Disease (IBD) | 1025 | 1025 | 2 | Classify disease states: normal (500), IBD (500) |
Pei et al. (2013) Diagnosis (PDX) | 200 | 5955 | 4 | Classify disease states: normal (28), reflux esophagitis (36), Barrett’s esophagus (84), esophageal adenocarcinoma (52) |
We consider three different categories of classification aims: body sites, subjects, and disease states. Number of samples for a particular class is included between the round brackets. The number of features equals the number of different OTUs (i.e., microbes)