Skip to main content
. 2014 Jun 12;8(6):e2945. doi: 10.1371/journal.pntd.0002945

Figure 1. Classification tree for soil-transmitted helminth infection, with infection by any helminth (Ascaris lumbricoides, Trichuris trichiura, or hookworm spp.) considered to be STH positive.

Figure 1

Each internal node contains the name of the independent variable (IV) selected to partition the data and the number of observations in the node. These nodes are numbered 1–5; the color of fill corresponds to the WASH characteristic of the IV and the border color represents how that characteristic was measured (e.g. at the school, home or pupil level). The branches emanating from each terminal node are labeled with the value of the IV used to partition the data. The square boxes represent terminal nodes and are numbered T1–T6 and contain the distribution of positive and negative cases at that terminal node as well as the predicted status for the node (“+” or “−“). Note that Terminal node T4 is classified as STH positive (“+”), even though the majority of pupils represented in the node are negative, because of the 2∶1 misclassification cost favoring sensitivity over specificity. 1This variable started out as a 4-level ordinal variable for father education but due to the optimal partition identified by the algorithm – “deceased’ vs. “no education”, “primary only”, and “secondary or more”– the variable ended up as indicator variables for father deceased.