Skip to main content
. 2019 Jun 20;20(Suppl 12):314. doi: 10.1186/s12859-019-2833-2

Table 1.

Real metagenomic data used in this paper

Dataset # of samples # of features # of classes Classification task
Classification of body sites
Costello et al. (2009) Body Habitat (CBH) 552 1454 6 Classify body habitats: skin (357), oral cavity (46), External Auditory Canal (44), Hair (14), Nostril (46), Feces (45)
Costello et al. (2009) Skin Sites (CSS) 357 600 12 Classify skin sites: external nose (14), forehead (32), glans penis (8), labia minora (6), axilla (28), pinna (27), palm (64), palmar index finger (28), plantar foot (64), popliteal fossa (46), velar forearm (28), umbilicus (12)
Human Microbiome Project (HMP) 1025 323 5 Classify 5 major body sites: anterior nares (269), buccal mucosa (312), stool (319), supragingival plaque (313), tongue dorsum (316)
Classification of subjects
Costello et al. (2009) Subject (CS) 140 464 7 Classify 7 subjects: (20, 20, 20, 20, 20, 20, 20)
Fierer et al. (2010) Subject (FS) 104 294 3 Classify 3 subjects: (40, 33, 31)
Fierer et al. (2010) Subject x Hand (FSH) 98 294 6 Classify by subject and left/right hand: (20, 18, 17, 14, 16, 13)
Classification of disease states
Inflammatory Bowel Disease (IBD) 1025 1025 2 Classify disease states: normal (500), IBD (500)
Pei et al. (2013) Diagnosis (PDX) 200 5955 4 Classify disease states: normal (28), reflux esophagitis (36), Barrett’s esophagus (84), esophageal adenocarcinoma (52)

We consider three different categories of classification aims: body sites, subjects, and disease states. Number of samples for a particular class is included between the round brackets. The number of features equals the number of different OTUs (i.e., microbes)