Abstract
Surveillance systems of contagious diseases record information on cases to monitor incidence of disease and to evaluate effectiveness of interventions. These systems focus on a well-defined population; a key question is whether observed cases are infected through local transmission within the population or whether cases are the result of importation of infection into the population. Local spread of infection calls for different intervention measures than importation of infection. Besides standardized information on time of symptom onset and location of cases, pathogen genotyping or sequencing offers essential information to address this question. Here we introduce a method that takes full advantage of both the genetic and epidemiological data to distinguish local transmission from importation of infection, by comparing inter-case distances in temporal, spatial and genetic data. Cases that are part of a local transmission chain will have shorter distances between their geographical locations, shorter durations between their times of symptom onset and shorter genetic distances between their pathogen sequences as compared to cases that are due to importation. In contrast to generic clustering algorithms, the proposed method explicitly accounts for the fact that during local transmission of a contagious disease the cases are caused by other cases. No pathogen-specific assumptions are needed due to the use of ordinal distances, which allow for direct comparison between the disparate data types. Using simulations, we test the performance of the method in identifying local transmission of disease in large datasets, and assess how sensitivity and specificity change with varying size of local transmission chains and varying overall disease incidence.
Introduction
An essential question in contagious disease surveillance settings is whether cases result from local transmission within a population or from importation of infection from outside the population. This distinction is of importance, as interrupting local transmission calls for different interventions than stopping importation of infection. Unfortunately, distinguishing cases related through local transmission from cases that result from importation of infection is difficult.
When occurrences of a contagious disease are monitored and stored in a standardized way, statistical algorithms can aid in identification of local spread of the disease. For example, drug-resistant pathogens found in hospitals either are due to nosocomial transmission or are brought into the hospital by the patient. The former can be identified using surveillance data by assessing the number of cases in a fixed time period [1]; this identification is essential in optimizing hospital control measures.
Genetic sequence data of pathogens provide an informative data source for distinguishing between local transmission and importation. Sampled pathogens are now routinely genotyped or sequenced in many settings, offering the potential to distinguish cases that were infected in a local transmission chain from those that were infected elsewhere by evaluating small genetic differences between sampled pathogens. However, existing algorithms to find clusters of related cases in large datasets focus only on temporal data [2]–[8] or on spatiotemporal data [9]–[11]. Although genetic data are already being used to distinguish between different strains of the same species [1], [12], the full potential offered by these data has so far not been utilized [13].
In outbreaks of contagious diseases, cases are caused by other cases. This property results in clusters of cases due to local transmission of contagious diseases having a different mathematical structure in space and time than clusters of cases due to non-contagious diseases. Clusters due to contagious disease tend to have a more chain-like shape (figure 1). However, existing clustering algorithms that focus on spatiotemporal data do not account for this property, as they often have not been developed specifically for contagious diseases.
Here, we present a method that identifies locally infected cases from a dataset containing genetic, temporal and geographical data. For each pair of cases, we assess the distance between these cases with respect to their locations, their times of symptom onset, and the genetic sequences of the pathogens isolated from the cases. For a pair of cases not related through local transmission, we expect the distances in the separate data types to be independent. For cases that are part of a local transmission chain, we expect the distances between these cases to be small for each of the separate data types. We employ a form of hierarchical clustering that uses an ordinal distance between cases based on their genetic, temporal and geographical distances, and that reflects the fact that for contagious diseases, cases are caused by other cases. Clusters of cases resulting from local transmission are identified by testing whether they have smaller pairwise distances than would be expected under a null hypothesis of independence between the location, time of symptoms onset, and pathogen sequence of cases. As the purpose of the present paper is to introduce and explain the methodology, and to illustrate its use and limitations, we use simulated datasets. To test the ability of the method to detect local transmission of a contagious disease, by assessing the sensitivity and false positive rate of assigning cases to local transmission clusters.
Methods
We consider a contagious disease surveillance dataset that consists of a large number of cases. We assume that for each case we know the date of symptom onset or sampling date, the geographical location, and the genetic type or sequence of the pathogen. Some of the cases might be infected within the time and region of the study, while others are infected elsewhere. Our objective will be to identify transmission clusters; sets of cases related through a local transmission chain.
It is infeasible to consider every possible subset of cases in the dataset as a possible transmission cluster, because the number of subsets grows exponentially with the number of cases. We adopt a hierarchical clustering approach; here the dataset is sequentially divided into subsets of increasing size, yielding a tree-like structure, or dendogram (figure 2). The subsets encountered in this way are the most plausible local transmission clusters.
To perform hierarchical clustering one needs a measure of dissimilarity between sets of cases. We construct such a measure using both a measure of dissimilarity between individual cases and a linkage criterion that gives the similarity of two subsets as a function of the pairwise dissimilarities of their elements.
Linkage criterion
We use single linkage clustering [14]–[16], the oldest and arguably simplest linkage criterion, which states that the distance between two sets of cases is the minimum of the pairwise inter-element distances. A commonly cited drawback of this criterion is that it tends to create chain-like clusters, with a high average distance within the cluster. Contagious disease epidemiology seems one of the few settings where this property is actually an advantage. Since contagious disease cases are caused by other cases the distance between two sets is well described by be the smallest pairwise distance and chain-like clusters are very plausible; for instance outbreaks spreading to another location, or viruses mutating in a certain direction over time.
Pairwise dissimilarity
It is not immediately obvious how a dissimilarity should be defined between two individual cases. We have to combine a temporal, a geographical and a genetic distance, comparing days with kilometers and mutations. Furthermore, the absolute values of these distances are not directly informative. First, because we assume no knowledge on pathogen characteristics we cannot interpret any absolute value. Second, because many cases in one geographical, temporal or genetic region might be the result of a high population density, seasonality or higher pathogen fitness, respectively, rather than of local transmission. The relevant notion of dissimilarity between two cases is therefore not an absolute distance, but the number of other cases found in between the two cases [17], i.e. closer to both the two cases than the two cases are to each other. For example, two cases living a kilometer apart are more likely to be related when this is in a rural area than in a large city, two cases infected at the same day are more likely to be related when they are infected during an off-season than during an epidemic, and two identical strains are more likely to be related when this is a rare sequence than when this sequence is ubiquitous. This relevant notion of number of cases in between two cases is not pathogen specific and allows for combination of the three disparate data types. We define the dissimilarity di for a given data type i between two cases a and b as (figure 3):
(1) |
where |.| denotes the number of elements of a set, ∧ the logical AND operator, Di the absolute distance for data type i (time, location or genetic) and the ‘−1’ ensures di(a,a) = 0.
Under our null hypothesis of all cases being unrelated, the dissimilarities in the different data types are independent. In contrast, dissimilarities between two cases infected in the same local transmission chain will be small for each of the data types. We obtain the full dissimilarity d between two points a and b as the expected number of cases in between them under the null hypothesis; the product of the data type specific dissimilarities (figure 3)
(2) |
When the data are continuous (i.e. all observed values are unique) it is possible to analytically obtain the full distribution for d under the null hypothesis. For instance, dtime is distributed as the absolute value of the difference of two independent random variables following a discrete uniform distribution on [1,N], with N the number of cases. When cases are infected locally there will be more small dissimilarities than under the null hypothesis. When a data type is discrete (as genetic data always are) several cases can have identical values for this data type. In this case we propose an extension to di(a,b) in which the dissimilarity between two cases is the expected number of cases in between them if this identical value was due to small measurement error (Appendix S1).
Finding putative transmission clusters
We want to assess for a given subset S of the dataset D whether it is a local transmission cluster, i.e. whether the cases in S are closer together than would be expected under the null hypothesis. Define the unique ‘weakest link’ l(S) of a cluster as the largest dissimilarity in the minimal spanning tree of S; the larger the dissimilarity the less likely it is that all cases in the cluster are part of a local transmission chain. l(S) increases with S; we therefore compare l(S) to the value we would expect under the null hypothesis for a cluster of at least this size. We call S a putative transmission cluster (PTC) if the probability of observing a cluster of at least this size with weakest link at most l(S) under the null hypothesis is less than 0.001. This probability can be obtained by permuting the dataset (see Appendix S1 for details). The upper bound for the probability (here 0.001) should be small but other than that is arbitrary, and could be changed depending on the application.
It is important to note that the tests applied to each of the clusters encountered in the hierarchical clustering scheme are not independent. For example, if a set of ten cases is found to be a PTC then the set of eleven cases constructed by adding one random nearby case will probably also be a PTC. This dependence is inherent to clustering algorithms and not necessarily a problem, as finding the cluster is usually more important than uniquely identifying all the cases that belong to it. It could, however, lead to a high false positive rate when assessing whether cases are correctly assigned to a PTC.
Testing on simulated datasets
To gauge the strengths and limitations of the algorithm, we tested it on simulated datasets where we know precisely which cases were part of a local transmission chain and which were import. The performance of any clustering algorithm depends on how strongly the clusters are separated in the data. For instance, if a dataset consists of several outbreaks clearly distinguishable in time and place, we expect an algorithm to do well. On the other hand, when cases belonging to one outbreak can be found throughout the spatiotemporal and genetic space, any algorithm will struggle to identify the outbreak. The simulations are thus focused on the intermediate region, where clustered cases are not easily distinguishable based on separate data types, but are still close enough that the combined information from the data types yields enough information for clustering.
We use two measures of the performance of the proposed method. First, we take the percentage of locally infected cases correctly assigned to a PTC. Second, we take the percentage of imported cases incorrectly assigned to a PTC. The former is a measure of the sensitivity, the latter of the false positive rate. If sensitivity is high whilst the false positive rate remains low, the method could be suitable for use in outbreak detection. As we assess performance of the method at the case level, while statistical tests are performed at the cluster level, the false positive rate is not guaranteed to be beneath the p-value of 0.001 used. If the false positive rate becomes too large the method would be unsuitable for outbreak detection, as too many false alarms would be given, but might still be useful in assessing properties of locally infected cases. We performed simulations under different incidence rates and with different sizes of the transmission clusters.
Each of the simulated datasets consisted of many unrelated cases and a small number of local transmission clusters, containing in expectation ten percent of the total number of cases for each of the simulations. All of the cases have a time, position and genotype associated with them. The geographical position of an imported case A is given by x A∼(Uniform(0,100), Uniform(0,100)), its time of sampling by tA∼Uniform(0,100), and its genotype genA is represented by a string of 8 random bits (each can be 0 or 1 with equal probability). Locally infected cases were simulated with infectors chosen as specified below. An infected case B and its infecting case A are related as follows; xB∼xA+(Normal(0,4), Normal(0,4)), tB∼tA+Exponential(1), and genB is generated from genA by flipping each bit with probability 1/16.
The absolute spatial distance between two cases was taken to be the usual Euclidean distance, with distances in both dimensions the minimum of |x1−x2| and 100−|x1−x2|. This makes the geographical region a torus, ensuring all clusters are fully observed. The absolute genetic distance was calculated as the number of different bits, leading to an expected genetic distance of 0.5 between infector and infected. The absolute temporal distance was the absolute value of the time difference. From these absolute distances dissimilarities were calculated for each data type, and combined into pairwise dissimilarities using equation (2).
In a first scenario all locally infected cases belong to the same outbreak, with an index case randomly chosen from the unrelated cases. In a second scenario we generated smaller transmission clusters, by letting 1/9 of all cases generate secondary cases according to a geometric distribution with mean R = 0.5. These were themselves equally infectious, yielding transmission clusters of expected size 2. In the epidemiological literature this scenario is known as ‘stuttering transmission chains’ [18]; small outbreaks occur but large outbreaks do not, since the mean number of secondary cases per infectious case, or effective reproduction number, R, is smaller than one. In a third scenario we generated even more and smaller outbreaks, with each case generating new cases according to a geometric distribution with mean R = 0.1, yielding a dataset with many very small transmission clusters of expected size 1.11. For all of these scenarios, we performed simulations with an initial number of unrelated cases of 90, 450 or 900, representing different incidence rates, yielding nine simulation scenarios in total. The expected total number of cases for these simulations are thus 100, 500 and 1000. For each of these nine scenarios, we simulated 100 datasets, and applied the methodology to identify significant clusters.
The results of the clustering algorithm depend on how related infector-infected pairs are in each of the data types. When this relation is stronger, we expect clustering results to improve. We therefore performed additional simulations where the distances between infector-infected pairs are smaller (Appendix S1).
Many actual surveillance datasets face the problem of missing or unobserved cases. This is similar to a scenario where all cases are observed, but the relation between infector-infected pairs is weakened. To illustrate this, we performed analyses on simulated datasets from which we randomly discarded 20% of all cases (Appendix S1).
Results
Figure 4 gives a graphical representation of a typical simulated dataset with small outbreaks for each of the separate data types. Figure S1 gives the same information for a simulated large outbreak (see Supporting Information for corresponding simulated datasets). Under the chosen parameter settings, identifying the different clusters by only looking at the separate data types is challenging.
The results for finding local transmission for all simulation scenarios are given in figure 5 and table 1. For each simulation scenario, we report the distribution and median of the percentage of cases assigned to a putative transmission cluster (PTC), both for locally infected and for imported cases. For all scenarios, the percentage of locally infected assigned to a PTC is higher than the percentage of import cases assigned to a PTC. This means that for all scenarios the three data types, when combined, provide sufficient statistical signal to identify local transmission.
Table 1. Median of sensitivity/false positive rate of assigning locally infected cases to a putative transmission cluster for simulated datasets.
High incidence | Low incidence | Very low incidence | |
Large outbreak | 1.00/0.16 | 1.00/0.06 | 1.00/0.01 |
Small outbreaks | 0.49/0.01 | 0.58/0.01 | 0.60/0.00 |
Very small outbreaks | 0.06/0.00 | 0.08/0.00 | 0.11/0.00 |
In general, the method assigns outbreak cases to PTC's more often when transmission clusters are larger. This is no surprise as the strength of the statistical signal increases with outbreak size. This higher sensitivity comes at the cost of a lower specificity; when assessing the large outbreaks of expected size 100 the median false positive rate is 0.16, while it is near 0 for most other scenarios. The false positives here are cases that are, coincidentally, close to the actual cluster; the number of such cases increases with the size of the outbreak.
The method has a lower sensitivity and specificity when the incidence rate is higher. This is because there are more unrelated cases per unit of space, time and genetics, while the absolute inter-outbreak case distances remain the same. Therefore, the ordinal distance between outbreak cases becomes larger when incidence rates are higher, which makes it harder to identify transmission clusters (figure S3, table S2). When inter-outbreak case distances become smaller, outbreaks are easier to detect (see Appendix S1, figure S2, table S1).
Discussion
We have presented a method to identify transmission chains of locally infected contagious disease cases in large databases containing temporal, geographical and genetic data. The method does not require assumptions on population at risk or pathogen-specific properties. The method is novel in explicitly incorporating the genetic distances measured between sampled pathogens, and in accounting for the chain-like structure of transmission chains of contagious diseases.
Several methods to find locally infected cases in large datasets have been published and some are commonly used in epidemiological investigations [1], [9], [12], [13]. However, many of these methods were not explicitly developed for analysis of contagious diseases, and ignore the fundamental characteristic of contagious diseases: that each infected case can itself be a source for new cases. The method presented here does take this into account by focusing on the distances between cases, rather than on the number of cases in a particular region of space-time. This latter approach is suitable when cases are caused by one common source, rather than by the cases themselves.
The ability of the method to correctly cluster together cases belonging to the same transmission chain depends on how ‘close’ these cases are to each other in time, space and genetics, relative to non-related cases. This depends both on the properties of the pathogen studied, the incidence of disease, and the size of the study region and duration. For example, a dataset resulting from a study period of one year on a pathogen with an average serial interval of half a week would for our method, due to the ordinal distances used, be equivalent to a dataset resulting from a study period of ten years on a pathogen with a serial interval of five weeks and ten times lower incidence. Shorter serial intervals, lower incidence rates and longer study periods allow for more accurate identification of outbreak cases.
In our simulations, we have taken the Euclidean distance between geographical locations. However, this is not always the most relevant distance metric. For example, people are more likely to travel between densely populated areas than between sparsely populated areas [19]. Thus, when a study region encompasses both urban and rural areas, more relevant measures of distance could be given by mobility patterns [20] or road distances [21]. Note that no extra correction is needed to adjust for the higher urban population densities leading to more cases; this is taken into account by the ordinal distance used.
The non-parametric method introduced is able to identify locally infected cases when little is known about the pathogen studied. When precise pathogen-specific information or information on population at risk is available, a more precise description of the system can be given. More specific methods can be used and information can be obtained from the data types separately, which should lead to better identification of transmission clusters. The non-parametric method could still be of value in such as a scenario, as it provides a simple first-try approach: results can be compared to those obtained from an analysis that uses more information, and assumptions made about pathogen characteristics and population at risk can be tested.
Here we performed validation of the method using simulated datasets, in which the origin of cases is known. Results obtained give confidence the method can be sensibly applied to actual surveillance datasets. Examples of existing large molecular epidemiological databases include VNTR typing datasets of tuberculosis [22], spa typing datasets of MRSA [23] and short read sequencing datasets of hepatitis B, hepatitis A and norovirus [24]–[26]. Note that the relevant spatial information differs for these datasets; we might focus on place of residence for tuberculosis, but on hospital, ward or even bed for MRSA. Future work will have to focus on applying the method presented here to such datasets.
The method presented has some drawbacks. First, the null hypothesis states that all cases in the dataset are independent. This leads to a bias when many transmission clusters are present; as the locally infected cases cluster together, the remaining independent cases will themselves lie closer together (in ordinal distance) than under the null hypothesis. Thus, especially when a high percentage of cases are locally infected, the statistical test could overestimate the percentage of clustered cases. This overestimation might be alleviated if prior information on the percentage of cases locally infected were available. Second, we assumed independence between the different data types for unrelated cases. These data are never truly independent, as all cases belong to the same phylogenetic tree acting over a long time scale. The local outbreaks we are interested in can be seen as local tips of these large trees. Whether the data can be approximated as being independent depends on the spatial dynamics of the pathogen, the evolutionary time separating sampled pathogens and the size of the region studied. For example, the approximation might be valid when studying MRSA at the hospital level, but not at the level of a continent as geographical structure can be seen in genotypes sampled [23]. Third, we have not taken into account the boundaries of our datasets, i.e. the edges of the geographical area and time window studied. This might decrease the sensitivity of finding clusters near the start or end of the study period, and should be addressed when the method is applied prospectively.
Results of clustering methods such as the one presented are important in epidemiological investigations for a number of reasons. First, they provide a measure of how much local transmission takes place. For example, if many putative transmission clusters are found in a hospital setting, infection control measures have to be intensified. Second, the algorithm can be used as a tool to find transmission clusters in large databases that can then be further investigated, removing the need for a detailed analysis by hand of the complete database. Third, properties of clustered cases can be compared to non-clustered cases. This will, for example, allow researchers to test whether patients of a particular age are more prone to transmit disease, or whether certain genotypes are more likely to spread in a hospital setting. These applications differ in their requirements on the sensitivity and specificity of the algorithm, where generally the second application will have the most stringent, and the third application the most relaxed requirements.
With the decreasing cost of sequencing and genotyping techniques, the availability of genetic data continues to grow. In particular, in many surveillance settings large molecular epidemiological databases have been set up. As the size and complexity of these databases grows, we can only expect the usefulness of automated methods such as these to assist in answering public health questions will grow concordantly.
Supporting Information
Funding Statement
The authors have no support or funding to report.
References
- 1. Huang SS, Yokoe DS, Stelling J, Placzek H, Kulldorff M, et al. (2010) Automated detection of infectious disease outbreaks in hospitals: a retrospective cohort study. PLoS Med 7: e1000238. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Farrington CP, Andrews NJ, Beal AD, Catchpole MA (1996) A Statistical Algorithm for the Early Detection of Outbreaks of Infectious Disease. J R Statist Soc A 159: 547–563. [Google Scholar]
- 3. Hutwagner LC, Maloney EK, Bean NH, Slutsker L, Martin SM (1997) Using laboratory-based surveillance data for prevention: an algorithm for detecting Salmonella outbreaks. Emerg Infect Dis 3: 395–400. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Le Strat Y, Carrat F (1999) Monitoring epidemiologic surveillance data using hidden Markov models. Stat Med 18: 3463–3478. [DOI] [PubMed] [Google Scholar]
- 5. Murphy SP, Burkom H (2008) Recombinant temporal aberration detection algorithms for enhanced biosurveillance. J Am Med Inform Assoc 15: 77–86. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Nobre FF, Stroup DF (1994) A monitoring system to detect changes in public health surveillance data. Int J Epidemiol 23: 408–418. [DOI] [PubMed] [Google Scholar]
- 7. Stern L, Lightfoot D (1999) Automated outbreak detection: a quantitative retrospective analysis. Epidemiol Infect 122: 103–110. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8. Stroup DF, Williamson GD, Herndon JL, Karon JM (1989) Detection of aberrations in the occurrence of notifiable diseases surveillance data. Stat Med 8: 323–329 discussion 331-322. [DOI] [PubMed] [Google Scholar]
- 9. Kulldorff M, Heffernan R, Hartman J, Assuncao R, Mostashari F (2005) A space-time permutation scan statistic for disease outbreak detection. PLoS Med 2: e59. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Watkins RE, Eagleson S, Veenendaal B, Wright G, Plant AJ (2009) Disease surveillance using a hidden Markov model. BMC Med Inform Decis Mak 9: 39. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11. Hossain MM, Lawson AB (2010) Space-time Bayesian small area disease risk models: development and evaluation with a focus on cluster detection. Environ Ecol Stat 17: 73–95. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12. Stelling J, Yih WK, Galas M, Kulldorff M, Pichel M, et al. (2010) Automated use of WHONET and SaTScan to detect outbreaks of Shigella spp. using antimicrobial resistance phenotypes. Epidemiol Infect 138: 873–883. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13. O'Brien TF, Stelling J (2011) Integrated Multilevel Surveillance of the World's Infecting Microbes and Their Resistance to Antimicrobial Agents. Clin Microbiol Rev 24: 281–295. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Florek K, Lukaszewicz J, Perkal J, Steinhaus H, Zubrzycki S (1951) K. Florek, J. Lukaszewicz, J. Perkal, H. Steinhaus, S. Zubrzycki, Sur la liaison et la division des points d'un ensemble fini. Colloquium Mathematicum 2: 282–285. [Google Scholar]
- 15. McQuitty LL (1957) Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ Psychol Measmt 17: 207–222. [Google Scholar]
- 16. Sneath PHA (1957) The application of computers to taxonomy. J gen Microbiol 17: 201–226. [DOI] [PubMed] [Google Scholar]
- 17. Jacquez GM (1996) A k nearest neighbour test for space-time interaction. Stat Med 15: 1935–1949. [DOI] [PubMed] [Google Scholar]
- 18. Lloyd-Smith JO, George D, Pepin KM, Pitzer VE, Pulliam JR, et al. (2009) Epidemic dynamics at the human-animal interface. Science 326: 1362–1367. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Xia Y, Bjornstad ON, Grenfell BT (2004) Measles metapopulation dynamics: a gravity model for epidemiological coupling and dynamics. Am Nat 164: 267–281. [DOI] [PubMed] [Google Scholar]
- 20. Balcan D, Colizza V, Goncalves B, Hu H, Ramasco JJ, et al. (2009) Multiscale mobility networks and the spatial spreading of infectious diseases. Proc Natl Acad Sci U S A 106: 21484–21489. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21. Talbi C, Lemey P, Suchard MA, Abdelatif E, Elharrak M, et al. (2010) Phylodynamics and human-mediated dispersal of a zoonotic virus. PLoS Pathog 6: e1001166. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. de Beer JL, van Ingen J, de Vries G, Erkens C, Sebek M, et al. (2013) Comparative study of IS6110 RFLP and VNTR typing of Mycobacterium tuberculosis in the Netherlands, based on a five year nationwide survey. J Clin Microbiol [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23. Grundmann H, Aanensen DM, van den Wijngaard CC, Spratt BG, Harmsen D, et al. (2010) Geographic distribution of Staphylococcus aureus causing invasive infections in Europe: a molecular-epidemiological analysis. PLoS Med 7: e1000215. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24. Verhoef L, Kouyos RD, Vennema H, Kroneman A, Siebenga J, et al. (2011) An integrated approach to identifying international foodborne norovirus outbreaks. Emerg Infect Dis 17: 412–418. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 25. van Ballegooijen WM, van Houdt R, Bruisten SM, Boot HJ, Coutinho RA, et al. (2009) Molecular sequence data of hepatitis B virus and genetic diversity after vaccination. Am J Epidemiol 170: 1455–1463. [DOI] [PubMed] [Google Scholar]
- 26. Mukomolov S, Kontio M, Zheleznova N, Jokinen S, Sinayskaya E, et al. (2012) Increased circulation of hepatitis A virus genotype IIIA over the last decade in St Petersburg, Russia. J Med Virol 84: 1528–1534. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.