Skip to main content
. 2020 Jan 10;128(1):017008. doi: 10.1289/EHP4817

Table 2.

Accuracy of k-means cluster analysis for subgrouping homes by region in the household exposure study (HES; Massachusetts and California) and Green Housing Study (GHS; Boston, Massachusetts, and Cincinnati, Ohio) using concentrations of chemicals detected in at least 10 percent of residential indoor air or dust samples.

Study Homes (n) Sample matrix Chemicals in studya (n) Chemicals in cluster analysisb (n) Reporting limits Accuracyc (%) Adjusted Rand indexd
HES 122 Air 24 13 Originale 98.4 0.93
HES 122 Air 24 13 Censoredf 92.6 0.72
HES 120 Dust 44 25 Original 96.7 0.87
HES 120 Dust 44 18 Censored 55.8 0
GHS 77g Air 35 28 Original 80.5 0.36
GHS 77g Air 35 28 Constanth 80.5 0.36
a

Number of chemicals measured in the same medium in all homes in each cluster analysis.

b

Number of chemicals detected in at least 10% of homes given the reporting limits used in each analysis.

c

Number of homes correctly grouped by region using k-means clustering divided by the total number of homes analyzed.

d

The adjusted Rand index measures similarity between the two clusters identified by k-means analysis and the two true regional subgroups in the data. It has an expected value of zero for random clusters and a maximum value of 1 in the case of perfect agreement.

e

In analyses using the original reporting limits, concentrations that were not detected were substituted with the sample-specific reporting limit (SSRL). We calculated the SSRL as the method reporting limit (MRL) divided by the sample-specific volume of air or sample-specific mass of dust.

f

In analyses with censored reporting limits, we calculated the most frequent MRL reported in each site (in cases of ties we used the lower value). We defined MRLcensor as the higher of the two modal MRLs and calculated censored sample-specific reporting limits (SSRLcensor) as MRLcensor divided by the sample-specific volume of air or sample-specific mass of dust. For all records where the original SSRL or detected or estimated concentration was lower than SSRLcensor, the concentration was substituted with SSRLcensor.

g

Cluster analysis was performed on 77 homes comprising 105 samples. A total of 49 homes were sampled once, and 28 homes were sampled twice approximately six months apart. For homes sampled twice, we used the average exposure for each chemical.

h

Nondetects were substituted with the MRL divided by the median volume of air across Boston, Massachusetts, and Cincinnati, Ohio.