Skip to main content
. 2020 Sep 16;3(9):e2012734. doi: 10.1001/jamanetworkopen.2020.12734

Table 1. Most Important Predictors by Category for the Random Forest Model.

Data source Variable Aggregation Importancea Value by outcome, mean (SD)b
Space Years Function No EBLL EBLL
Blood lead levelsc Child mean BLL, μg/dL Tract 3 Median 1.00 1.4 (0.4) 1.7 (0.4)
Child mean BLL, μg/dL Tract 3 Mean 0.91 1.9 (0.6) 2.3 (0.6)
Child maximum BLL, μg/dL Tract 3 Mean 0.84 3.8 (1.1) 4.5 (1.1)
Child EBLL ≥6 μg/dL Tract 3 Count 0.81 16.0 (9.1) 22.0 (9.3)
Child mean BLL, μg/dL Tract 2 Mean 0.81 1.8 (0.5) 2.2 (0.5)
Building characteristics Residential value, $105 Block NA Mean 0.52 0.6 (3.5) 0.3 (1.6)
Latitude, ° Address NA NA 0.47 41.9 (0.1) 41.8 (0.1)
Housing age, y Block NA Mean 0.47 80.6 (24.1) 89.1 (19.9)
Residential value, $105 Block NA Sum 0.47 4.8 (11.1) 3.2 (5.0)
Rooms per unit, No. Block NA Mean 0.46 5.3 (1.1) 5.2 (1.0)
American Community Survey Medicaid insurance, No. Tract 5 Percentage 0.32 28.6 (14.1) 34.6 (12.2)
High school graduate, No. Tract 5 Percentage 0.30 14.0 (7.3) 16.9 (6.9)
Associate’s degree, No. Tract 5 Percentage 0.30 5.0 (2.8) 4.8 (2.7)
Employer insurance, No. Tract 5 Percentage 0.30 41.2 (17.0) 33.9 (13.4)
Bachelor’s degree, No. Tract 5 Percentage 0.30 13.7 (11.6) 9.4 (8.1)
Investigations Compliance, No. Tract 3 Percentage 0.27 40.0 (22.6) 33.5 (18.4)
Inspection, No. Tract 3 Percentage 0.27 58.4 (19.6) 54.1 (16.9)
Inspection, No. Tract 2 Percentage 0.25 58.4 (22.0) 53.2 (19.4)
Compliance, No. Tract 2 Percentage 0.24 37.8 (24.8) 30.2 (20.4)
Inspection interior hazard, No. Tract 3 Percentage 0.22 53.8 (32.8) 62.6 (28.1)
Building permits and violations Violations, No. Address All Count 0.09 2.7 (9.2) 2.8 (8.8)
Violations, No. Address 5 Count 0.09 2.4 (8.3) 2.7 (8.4)
Wall violations, No. Address All Percentage 0.08 8.6 (13.4) 8.8 (11.8)
Wall violations, No. Address 5 Percentage 0.08 8.5 (13.5) 8.8 (11.9)
Window violations, No. Address All Percentage 0.08 6.2 (11.1) 8.0 (12.7)

Abbreviations: BLL, blood lead level; EBLL, elevated BLL; NA, not applicable.

a

Importance of a feature in the random forest model is measured as the mean reduction in error after a tree in the forest splits the data on that variable. Here it is rescaled to have a maximum of 1.00.

b

Excludes missing predictors.

c

Elevated levels were at least 6 μg/dL, venous or capillary samples.