Table 1. Most Important Predictors by Category for the Random Forest Model.
Data source | Variable | Aggregation | Importancea | Value by outcome, mean (SD)b | |||
---|---|---|---|---|---|---|---|
Space | Years | Function | No EBLL | EBLL | |||
Blood lead levelsc | Child mean BLL, μg/dL | Tract | 3 | Median | 1.00 | 1.4 (0.4) | 1.7 (0.4) |
Child mean BLL, μg/dL | Tract | 3 | Mean | 0.91 | 1.9 (0.6) | 2.3 (0.6) | |
Child maximum BLL, μg/dL | Tract | 3 | Mean | 0.84 | 3.8 (1.1) | 4.5 (1.1) | |
Child EBLL ≥6 μg/dL | Tract | 3 | Count | 0.81 | 16.0 (9.1) | 22.0 (9.3) | |
Child mean BLL, μg/dL | Tract | 2 | Mean | 0.81 | 1.8 (0.5) | 2.2 (0.5) | |
Building characteristics | Residential value, $105 | Block | NA | Mean | 0.52 | 0.6 (3.5) | 0.3 (1.6) |
Latitude, ° | Address | NA | NA | 0.47 | 41.9 (0.1) | 41.8 (0.1) | |
Housing age, y | Block | NA | Mean | 0.47 | 80.6 (24.1) | 89.1 (19.9) | |
Residential value, $105 | Block | NA | Sum | 0.47 | 4.8 (11.1) | 3.2 (5.0) | |
Rooms per unit, No. | Block | NA | Mean | 0.46 | 5.3 (1.1) | 5.2 (1.0) | |
American Community Survey | Medicaid insurance, No. | Tract | 5 | Percentage | 0.32 | 28.6 (14.1) | 34.6 (12.2) |
High school graduate, No. | Tract | 5 | Percentage | 0.30 | 14.0 (7.3) | 16.9 (6.9) | |
Associate’s degree, No. | Tract | 5 | Percentage | 0.30 | 5.0 (2.8) | 4.8 (2.7) | |
Employer insurance, No. | Tract | 5 | Percentage | 0.30 | 41.2 (17.0) | 33.9 (13.4) | |
Bachelor’s degree, No. | Tract | 5 | Percentage | 0.30 | 13.7 (11.6) | 9.4 (8.1) | |
Investigations | Compliance, No. | Tract | 3 | Percentage | 0.27 | 40.0 (22.6) | 33.5 (18.4) |
Inspection, No. | Tract | 3 | Percentage | 0.27 | 58.4 (19.6) | 54.1 (16.9) | |
Inspection, No. | Tract | 2 | Percentage | 0.25 | 58.4 (22.0) | 53.2 (19.4) | |
Compliance, No. | Tract | 2 | Percentage | 0.24 | 37.8 (24.8) | 30.2 (20.4) | |
Inspection interior hazard, No. | Tract | 3 | Percentage | 0.22 | 53.8 (32.8) | 62.6 (28.1) | |
Building permits and violations | Violations, No. | Address | All | Count | 0.09 | 2.7 (9.2) | 2.8 (8.8) |
Violations, No. | Address | 5 | Count | 0.09 | 2.4 (8.3) | 2.7 (8.4) | |
Wall violations, No. | Address | All | Percentage | 0.08 | 8.6 (13.4) | 8.8 (11.8) | |
Wall violations, No. | Address | 5 | Percentage | 0.08 | 8.5 (13.5) | 8.8 (11.9) | |
Window violations, No. | Address | All | Percentage | 0.08 | 6.2 (11.1) | 8.0 (12.7) |
Abbreviations: BLL, blood lead level; EBLL, elevated BLL; NA, not applicable.
Importance of a feature in the random forest model is measured as the mean reduction in error after a tree in the forest splits the data on that variable. Here it is rescaled to have a maximum of 1.00.
Excludes missing predictors.
Elevated levels were at least 6 μg/dL, venous or capillary samples.