Skip to main content
Public Health Action logoLink to Public Health Action
. 2014 Mar 21;4(1):47–52. doi: 10.5588/pha.13.0098

Using tuberculosis patient characteristics to predict future cases with matching genotype results

J E Oeltmann 1,, E S Click 1, P K Moonan 1
PMCID: PMC4479101  PMID: 26423761

Abstract

Setting: United States.

Background: It is unknown whether tuberculosis (TB) case or patient characteristics can predict the likelihood of future related TB cases.

Objective: To estimate the likelihood for future related cases, i.e., cases with matching TB genotypes within the same county diagnosed within the 2 years following the year of reporting of each included case.

Design: We considered all TB cases with genotyping results reported in the United States during 2004–2010. Predictive scores were calculated based on patient characteristics by dividing the number of patients who were not the last case in a county-level TB genotype cluster by the total number of patients.

Results: Overall, there was a 30.8% chance that a future related case would be detected during the 2 years following the report year of any given case. Future related cases were detected in 34.7% of instances following the diagnosis of smear-positive cases, 51.9% of instances following the diagnosis of a homeless patient and 45.2% of instances following the diagnosis of a patient who reported substance abuse. Predictive scores ranged by race (White 13.9%, Native Hawaiian 43.8%) and age group (⩾65 years 13.1%, 0–4 years 43%), and were higher for US-born patients.

Conclusions: Behavioral and sociodemographic factors can help predict the likelihood of future related cases and can be used to prioritize contact investigations.

Keywords: tuberculosis, genotype, transmission


Tuberculosis (TB) genotyping can help identify TB cases that may be attributable to recent transmission. When Mycobacterium tuberculosis isolates cluster by genotype (i.e., ⩾2 patients with M. tuberculosis isolates with matching genotype results), they often represent recent transmission, particularly when the patients reside in similar communities or regions.1 Combining routinely collected surveillance and genotyping data can identify ongoing transmission within a community.2 Using a national genotyping surveillance database, we characterized TB patients who represented the first or index case diagnosed within a localized TB genotype cluster of cases.

The objective of this analysis was to identify patient characteristics associated with index case status, and to assess whether the patient characteristics of known cases can be used to predict the likelihood of future genotype-clustered cases in the same county, hereafter called future related cases. This information could help those involved in local TB control to prioritize and enhance contact investigations of cases most likely to be the first of one or more future related cases.

METHODS

Data for this analysis came from incident TB cases reported to the National Tuberculosis Surveillance System (NTSS) of the Centers for Disease Control and Prevention (CDC) from 50 states and the District of Columbia. The NTSS includes demographic, behavioral and clinical information for each reported patient.3 We obtained genotyping data from the CDC's National Tuberculosis Genotyping Service (NTGS), consisting of spacer oligonucleotide typing and 12-locus mycobacterial interspersed repetitive unit-variable number of tandem repeats (MIRU-VNTR) typing data.4 NTSS and NTGS are linked at the patient level. We considered all TB cases reported between 1 January 2004 and 31 December 2010 with genotyping results and linked NTSS data. From this data set, two subsets of data were created: one to assess factors associated with index-case status, and another to calculate predictive scores based on reported case characteristics. Further details regarding data restrictions for each of the two subsets of data are provided below.

Factors associated with index patients

Genotype clusters were defined as ⩾2 cases with matching genotyping (spoligotype and 12-locus MIRU-VNTR) results reported within the same county. Index cases were the first case reported in a cluster, based on treatment start date. Secondary cases were the remaining clustered cases that occurred after the treatment start date of the index case. Index cases were included only if they began treatment on or after 1 January 2006, and no cases with a matching genotype had been observed in the county in the previous 2 years. This restriction was created to reduce misclassification of secondary cases as index cases. Unique cases were cases that were not clustered. Unique cases began treatment between 1 January 2006 and 31 December 2008 and were excluded if one or more cases with matching genotype results were observed during the 2 years before (2004–2005) or after (2009–2010) the 3-year inclusion period. These exclusion criteria were created to reduce misclassification of secondary or index cases as unique cases.

We calculated the distribution of demographic, clinical and behavioral characteristics among index cases and unique cases and assessed differences using Pearson's χ2 test. Specific characteristics (independent variables) examined were sex, age, race/ethnicity, country of origin, substance abuse, history of homelessness, incarceration at the time of diagnosis, sputum smear status, site of disease and human immunodeficiency virus (HIV) status. In the NTSS, excess alcohol use, injection drug use, and non-injection drug use are defined as positive if reported within the year before TB diagnosis. We defined substance abuse as having any one of these three behaviors reported.

We used multivariate logistic regression to assess the association between being an index case, relative to unique cases (dependent variable), and the abovementioned independent factors. A significance level of 0.01 was used for all statistical tests.

Predictive scores of patient characteristics

To estimate the ability with which individual patient characteristics can predict one or more future related cases, we calculated predictive scores for patient and clinical characteristics for both US-born and foreign-born patients. For any given case, a future related case was a case with matching genotyping results diagnosed in the same county during the 2 years following the report year of each included case. All cases in a cluster were categorized as either the last case diagnosed in a cluster or not the last case diagnosed in a cluster. Cases reported in 2006, 2007 and 2008 were included in this analysis. All cases reported in 2006 were defined as either the last case in a cluster or not, based on data available from 2006, 2007 and 2008. Cases reported in 2007 were similarly defined based on data from 2007, 2008 and 2009, and cases reported in 2008 were defined based on data from 2008, 2009 and 2010. Therefore, for each case included, there were at least 2 years of follow-up time during which another case with matching genotype results could have been diagnosed within the same county. This length of time is referred to as the follow-up period. A predictive score was calculated for each sociodemographic, behavioral and clinical characteristic. Predictive scores for all cases and for cases with individual characteristics were estimated by dividing the number of cases that were not the last case in a county-based cluster by the total number of cases (unique cases + clustered cases). This proportion represents the percentage of instances at which, given a specific characteristic, a future related case will be diagnosed.

Approval by an institutional review board was not required because data were collected and analyzed for this project as part of routine TB surveillance; the project was therefore not considered research involving human subjects.

RESULTS

Factors associated with index patients

Of 51 527 reported culture-positive cases with genotyping data, 2918 were eligible index cases and 13 612 were eligible unique cases; 16 485 were considered secondary cases and were not included in this portion of the analysis. A total of 18 512 were excluded from this analysis to avoid potential misclassification as a unique (n = 16 432) or index case (n = 2080) (Table 1).

TABLE 1.

Number of cases eligible and ineligible for analysis of factors associated with index case status by year

graphic file with name i2220-8372-4-1-47-t01.jpg

Distributions of all patient and disease characteristics differed significantly for index and unique cases (Table 2). A larger proportion of index cases were male, born in the United States, were substance users, were homeless and had a history of incarceration. With the exception of Asians and Whites, the proportions of all other racial/ethnic groups were larger for index cases.

TABLE 2.

Demographic, social and clinical characteristics of index and unique TB cases (n = 16 530)

graphic file with name i2220-8372-4-1-47-t02.jpg

Crude and adjusted odds ratios (ORs) and 99% confidence levels (CIs) for the assessment of the relation between patient and disease characteristics and being an index case relative to a unique case are presented in Table 3. After controlling for disease characteristics known to be related to TB transmissibility (sputum smear status, site of disease), the odds of being an index case were significantly increased for all age groups relative to the oldest patients, race/ethnic minority patients, patients born in the United States and those who were substance users.

TABLE 3.

Crude and adjusted associations between demographic, social and clinical characteristics and index case status

graphic file with name i2220-8372-4-1-47-t03.jpg

Predictive scores of patient characteristics

During the period from 1 January 2006 to 31 December 2008, 8364 (30.8%) of 27 146 total eligible cases were not the last clustered patients reported in the same county during the follow-up period. This suggests that there is roughly a 30% chance a related case will be diagnosed in the same county following the diagnosis of any given case. Predictive scores for specific patient characteristics are provided in Table 4. In general, predictive scores were higher among US-born patients, males and minorities. Among US-born patients, predictive scores for age groups exhibited the widest range of values (from 49.8% for the youngest patients to 18.1% for the oldest). Following the diagnosis of TB among the youngest patient (age 0–4 years), a future related TB case was thus identified 49.8% of the time, while following the diagnosis of TB among the oldest (⩾65 years) patients, a future related case was only identified 18.1% of the time. Among foreign-born patients, diagnosis of TB among the youngest patients was the least predictive of a future related case (predictive score = 11.8%).

TABLE 4.

Predictive scores of demographic, social and clinical characteristics to predict future related TB cases

graphic file with name i2220-8372-4-1-47-t04.jpg

Future related cases were detected 51.9% of the time following the diagnosis of a homeless patient, 45.2% of the time following the diagnosis of a patient who reported substance use, and 38.8% of the time following diagnosis of patients with TB within a correctional facility (Table 3). As expected, sputum smear-positive disease had a higher predictive score than sputum smear-negative disease (34.7 vs. 28.9%) and pulmonary disease had a higher predictive score than extra-pulmonary disease (33.0% vs. 22.1%).

DISCUSSION

The odds of being an index case relative to a unique case were influenced by patient characteristics. As suspected, characteristics associated with index cases also had the highest predictive scores for diagnosis of future related cases. We believe that characteristics associated with index cases, when noted, can serve as a predictor of future related cases. Prioritizing patients with these characteristics for contact investigations could result in a more efficient use of resources. These future related cases, in theory, should be identified during contact investigations around the reported case. Ideally, these individuals would be identified early enough to prevent progression to active TB disease. This information can therefore be used to prioritize and enhance contact investigations around those patients most likely to be followed by later diagnosed cases with matching genotypes and within the same county. Three characteristics stood out as being the most predictive of future related cases: age ⩽4 years, homelessness and substance abuse. Following the diagnosis of a patient with any one of these characteristics, there was roughly a 50% chance that a future related case would be diagnosed.

An index case is not necessarily the source of a chain of TB transmission. Using TB surveillance data, which does not collect information about interpersonal connections between patients, we were not able to determine the true source patient who infected the later diagnosed patients. Index patients are simply the first reported among a series of cases, defined here as having matching genotype and county of residence. For example, the odds of being an index case was highest for children aged 0–4 years, as young children with TB tend to be sentinels for recent transmission rather than the actual source.5 However, when a childhood case (aged <4 years) was diagnosed in a foreign-born patient, there was only an 11.8% chance of detecting a future related case. We believe this is because foreign-born children with TB who are living in the United States most likely acquired their infection before immigration to the United States, and if the source of their infection is still overseas it would not be captured in the NTSS.

While index cases may not always be source cases, our findings are similar to those from studies that assessed risk factors for genotype clustering,6 recent transmission7 and involvement in outbreaks,8 reinforcing the notion that index cases characterized by these risk factors may often be the source case for future cases. We found only one study that assessed characteristics associated with the generation of secondary cases using epidemiologic but not genotyping data. Rodrigo et al. reported similar results associated with substance use, suggesting that substance users are more likely to be associated with secondary cases.9

Previous studies have described the characteristics of the first two patients within a genotype cluster and reported that the infectiousness of the first two patients was associated with greater cluster growth in New York City,10 and that <3 months between diagnosis of the first two patients, one or both patients being young (age <35 years), both patients residing in an urban setting and both patients being from sub-Saharan Africa were associated with greater cluster growth in The Netherlands.11 While our results share some common findings, they are difficult to compare to these previous studies due to the differing case definitions and methodology. We choose to focus on the first case, rather than the first two cases, within a cluster because we aimed to provide those involved in TB control with a simple, rapid method to evaluate the likelihood of secondary cases. We are of the opinion that at the time of diagnosis of a patient characterized by any of the significant risk factors identified from this analysis, TB control authorities can expect an increased likelihood of future related cases and can immediately plan to increase case-finding efforts.

The reasons why some TB strains circulate locally, while others do not, are not completely understood. However, some if not all reasons might be explained by the characteristics of the persons infected rather than the strain itself. A noteworthy finding was that having a history of substance use or homelessness had a significantly higher predictive score than did sputum smear-positive or pulmonary disease, the two primary indicators used to initiate a contact investigation.5 We therefore believe that some social and behavioral factors are in fact more important than disease characteristics (smear and chest radiograph results) for determining the likelihood of the spread of TB disease, and should be considered when prioritizing cases for contact investigation.

Lack of access to routine health care, social gathering behaviors and inadequate TB control measures among substance users may explain the increased likelihood of secondary cases following the diagnosis of TB among substance users or the homeless.12,13 Similarly, lack of access to routine care may also explain the increased likelihood of secondary cases following the diagnosis of TB among minorities,14 as the longer a TB case remains undiagnosed, the greater the chances of continued transmission. It is also possible that some cases of TB among those without routine access to care were never diagnosed and reported. If so, our estimates of the association between substance use, homelessness, minority race and clustering would be biased toward the null and the true association between these factors and clustering may therefore be even stronger.

Some additional limitations should be considered. First, as submission of isolates for genotyping is not mandatory, all reported culture-positive cases are not represented. Second, county-based genotype clustering serves only as a proxy for recent TB transmission in the absence of epidemiologic links between patients. Third, cluster size is time-dependent. Had we chosen to use a longer follow-up period to define clusters, cluster size could only get larger, but we do not believe this would significantly alter our main results regarding factors related to index cases or the relative differences in predictive scores associated with specific characteristics. Furthermore, predictive scores would increase if applied to a community with a higher prevalence of TB transmission. Finally, as behavioral data were self-reported, the validity of these data is not known.

CONCLUSION

We were able to use demographic, clinical, and behavioral data to identify cases that were associated with an increased risk of future related cases within a county. After controlling for HIV, sputum smear status and site of disease, we found increased odds of related secondary cases associated with young age, minority race/ethnicity, birth in the United States, homelessness and substance use. An enhanced contact investigation following the TB diagnosis in patients with any of these characteristics may help to prevent future cases.

Acknowledgments

The authors are grateful to National TB Genotyping Service scientists at the California Department of Health Services, Sacramento, CA, and the Michigan Department of Community Health, Lansing, MI, USA, the local and state TB program and laboratory personnel who participate in surveillance and genotyping activities, and the Surveillance, Epidemiology, and Outbreak Investigations Branch, Division of TB Elimination, The National Center for HIV/AIDS, Viral Hepatitis, STD, and TB Prevention, Centers for Disease Control and Prevention, Atlanta, GA, USA. The authors thank T Navin and J Grant for their helpful review.

References

  • 1.Barnes P F, Cave M D. Molecular epidemiology of tuberculosis. N Engl J Med. 2003;349:1149–1156. doi: 10.1056/NEJMra021964. [DOI] [PubMed] [Google Scholar]
  • 2.Moonan P K, Ghosh S, Oeltmann J E, Kammerer J S, Cowan L S, Navin T R. Estimating recent transmission of M. tuberculosis in the United States based on genotyping and geospatial scanning. Emerg Infect Dis. 2012;18:458–465. doi: 10.3201/eid1803.111107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Centers for Disease Control and Prevention. Reported tuberculosis in the United States, 2012. Atlanta, GA, USA: US Department of Health and Human Services, CDC; 2013. [Google Scholar]
  • 4.Centers for Disease Control and Prevention. Notice to readers: new CDC program for rapid genotyping of Mycobacterium tuberculosis isolates. MMWR Morb Mortal Wkly Rep. 2005;54:47. [Google Scholar]
  • 5.National Tuberculosis Controllers Association; Centers for Disease Control and Prevention (CDC) Guidelines for the investigation of contacts of persons with infectious tuberculosis. Recommendations from the National Tuberculosis Controllers Association and CDC. MMWR Recomm Rep. 2005;54(RR-15):1–47. [PubMed] [Google Scholar]
  • 6.Fok A, Numata Y, Schulzer M, FitzGerald M J. Risk factors for clustering of tuberculosis cases: a systematic review of population-based molecular epidemiology studies. Int J Tuberc Lung Dis. 2008;12:480–492. [PubMed] [Google Scholar]
  • 7.Nava-Aguilera E, Andersson N, Harris E et al. Risk factors associated with recent transmission of tuberculosis: systematic review and meta-analysis. Int J Tuberc Lung Dis. 2009;13:17–26. [PubMed] [Google Scholar]
  • 8.Mitruka K, Oeltmann J E, Ijaz K, Haddad M B. Tuberculosis outbreak investigations in the United States, 2002–2008. Emerg Infect Dis. 2011;17:425–31. doi: 10.3201/eid1703.101550. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Rodrigo T, Caylà J A, García de Olalla P et al. Characteristics of tuberculosis patients who generate secondary cases. Int J Tuberc Lung Dis. 1997;1:352–357. [PubMed] [Google Scholar]
  • 10.Driver C R, Macaraig M, McElroy P D et al. Which patients' factors predict the rate of growth of Mycobacterium tuberculosis clusters in an urban community? Am J Epidemiol. 2006;164:21–31. doi: 10.1093/aje/kwj153. [DOI] [PubMed] [Google Scholar]
  • 11.Kik S V, Verver S, van Soolingen D et al. Tuberculosis outbreaks predicted by characteristics of first patients in a DNA fingerprint cluster. Am J Respir Crit Care Med. 2008;178:96–104. doi: 10.1164/rccm.200708-1256OC. [DOI] [PubMed] [Google Scholar]
  • 12.Lofy K H, McElroy P D, Lake L et al. Outbreak of tuberculosis in a homeless population involving multiple sites of transmission. Int J Tuberc Lung Dis. 2006;10:68. [PubMed] [Google Scholar]
  • 13.Oeltmann J E, Kammerer J S, Pevzner E S, Moonan P K. Tuberculosis and substance abuse in the United States, 1997–2006. Arch Intern Med. 2009;169:189–197. doi: 10.1001/archinternmed.2008.535. [DOI] [PubMed] [Google Scholar]
  • 14.Smedley B D, Stith A Y, Nelson A R, editors. Unequal treatment: confronting racial and ethnic disparities in health care. Washington, DC, USA: National Academies Press; 2003. [PubMed] [Google Scholar]

Articles from Public Health Action are provided here courtesy of The International Union Against Tuberculosis and Lung Disease

RESOURCES