Reassortment Networks and the Evolution of Pandemic H1N1 Swine-Origin Influenza

Shahid H Bokhari; Laura W Pomeroy; Daniel A Janies

doi:10.1109/TCBB.2011.95

. 2011 Jun 6;9(1):214–227. doi: 10.1109/TCBB.2011.95

Reassortment Networks and the Evolution of Pandemic H1N1 Swine-Origin Influenza

Shahid H Bokhari ¹, Laura W Pomeroy ¹, Daniel A Janies ¹

PMCID: PMC7186031 PMID: 22076498

Abstract

Prior research developed Reassortment Networks to reconstruct the evolution of segmented viruses under both reassortment and mutation. We report their application to the swine-origin pandemic H1N1 virus (S-OIV). A database of all influenza A viruses, for which complete genome sequences were available in Genbank by October 2009, was created and dynamic programming was used to compute distances between all corresponding segments. A reassortment network was created to obtain the minimum cost evolutionary paths from all viruses to the exemplar S-OIV A/California/04/2009. This analysis took 35 hours on the Cray Extreme Multithreading (XMT) supercomputer, which has special hardware to permit efficient parallelization. Six specific H1N1/H1N2 bottleneck viruses were identified that almost always lie on minimum cost paths to S-OIV. We conjecture that these viruses are crucial to S-OIV evolution and worthy of careful study from a molecular biology viewpoint. In phylogenetics, ancestors are typically medians that have no functional constraints. In our method, ancestors are not inferred, but rather chosen from previously observed viruses along a path of mutation and reassortment leading to the target virus. This specificity and functional constraint render our results actionable for further experiments in vitro and in vivo.

Keywords: Cray XMT, graph theory, influenza, multithreading, networks, pandemic, reassortment, shortest paths, S-OIV, swine flu

1. Introduction

A newly emergent strain of influenza subtype H1N1, termed the swine origin influenza virus (S-OIV), was first detected during an outbreak in Mexico and southwestern USA in spring of 2009 [1], [2]. The disease was next observed in other parts of the USA in late March 2009 and the virus was first isolated in mid April 2009 [3], [4]. Throughout the spring, the epidemic spread worldwide and the World Health Organization (WHO) declared an influenza pandemic, phase 6, on 11 June 2009 [5]. By 25 April 2010, 17, 919 deaths attributable to H1N1 S-OIV were reported with confirmed cases in more than 214 countries [6].

Influenza is a negative sense single-stranded RNA virus in the family Orthomyxoviridae. It is maintained in migratory waterfowl and can infect many species including humans, birds, pigs, horses, and other animals. The virus genome contains eight segments that code for 10 or 11 proteins. Three segments encode the polymerase complex: basic polymerase 2 (PB2), basic polymerase 1 (PB1), and the acidic protein (PA). The nucleoprotein segment (NP) encodes a protein that binds to viral RNA. The matrix segment (MP) encodes two proteins: a structural component of the viral capsid and a membrane ion channel. The nonstructural segment (NS) encodes a protein essential for cellular RNA processing and transport. Two other segments, hemagglutinin (HA) and neuraminidase (NA), encode viral surface glycoproteins responsible for host cell entry and exit, respectively [7], [8]. The hemagglutinin and neuraminidase genes determine the viral subtype, designated Inline graphic , where is one of 16 known hemagglutinin subtypes [9] and is one of nine known neuraminidase subtypes [7].

When two different influenza viruses infect the same host cell, novel combinations of the eight genomic segments can create a novel virus through a process known as reassortment [7], [8], a form of horizontal gene transfer (HGT). Often, the host population is immunologically naïve to the new viruses created through reassortment, which can lead to worldwide pandemics as observed in 1957 with the H2N2 pandemic, in 1968 with the H3N2 pandemic, and in 2009 with the H1N1 pandemic [8], [10]. The role of reassortment in influenza evolution is also discussed in [11] and [12].

Reassortment events leading to novel influenza viruses are poorly understood, greatly underestimated, and thus are an area of continuing research. Reassortment events are often identified by visual comparisons of incongruence among phylogenetic trees [13]. Some algorithms have been developed to infer reassortment events through statistical techniques in the absence of phylogenetics [14], [15]. Prior research in using phylogenetic networks to analyze situations where HGT occurs has been done by Huson and Bryant [16], and Makarenkov and Legendre [17]. An extensive survey of combinatorial methods for phylogenetic networks appears in [18].

In this paper, we build on previous work by Bokhari and Janies [19] to explore the evolution of H1N1 S-OIV pandemic viruses using a reassortment network. This paper reports multidisciplinary research, involving several diverse fields. As such, we must necessarily cover a wide range of issues in order to provide a clear understanding of our methods and results. It is important to describe many of the low-level implementation details to allow other researchers to appreciate the complexities of this project and pursue its applications and extensions. In particular, we cover the details of data gathering and processing that were necessary to create the database that is at the center of this work. The issues of parallel computation, both on a distributed memory commodity cluster as well as on a shared-memory massively multithreaded supercomputer are also covered, as the reassortment algorithm could not be run in an acceptable amount of time without these. Finally, the presentation of output as “in-trees” that capture both mutation and reassortment is important for interpretation and evaluation of our results.

The main contributions of this paper are:

1.
Implementation of a parallel version of the Bokhari and Janies [19] reassortment algorithm.
2.
Testing of this algorithm on all 5, 016 fully sequenced influenza A genomes, as of late 2009.
3.
Validation of the reassortments reconstructed by the network algorithm against prior results by other researchers.
4.
Identification of six “bottleneck viruses” that lie on almost every evolutionary path to the 2009 S-OIV set of viruses. These possibly played a significant role in the S-OIV pandemic and are worthy of further study in a molecular biology context.

We start the paper with an overview of the workflow of our research in Section 2, which is followed by a discussion of the architecture of the Cray XMT supercomputer. In Section 3, we review existing models for S-OIV reassortment. The main results of our paper appear in Sections 4 and 5. We conclude with a discussion of our results and suggestions for future research in Section 6. ¹

2. Overview of Workflow

There are three main components of our research (Fig. 1). To start with, virus names, dates, locations, subtypes, and nucleotide sequences were downloaded from the National Center for Biotechnology Information (www.ncbi.nlm.nih.gov). Next, pairwise distances between all pairs of corresponding sequences were computed using dynamic programming. Finally, the distances were used to create a Reassortment Network and to compute the minimum cost paths from all viruses to a given target virus. (A minimum cost path, or shortest path, from node Inline graphic to node has the shortest distance over all paths from to .) These shortest paths are used to create the in-tree that has, in general, all known viruses as its leaves and the given target virus as its root.

2.1. Creating the Virus Database

Although the RNA sequences of influenza viruses are available in the public databases www.ncbi.nlm.nih.gov/genomes/FLU [20] and www.ncbi.nlm.nih.gov/sites/batchentrez, extracting this information in a form that could be used for our study was a challenging problem that required significant manual curation.

We need, for each virus, a unique strain-specific identifier (a TaxID), strain name (e.g., A/Canada-NS/RV1535/2009(H1N1)), subtype (H1N1 in this case), year (2009), and RNA sequences for each of the eight segments. Acquiring this information is complicated by the following issues:

•
Many segment sequences are repeated in the database due to redundant effort.
•
The year entry in the database is often not standardized (e.g., 1979-chicken instead of 1979).
•
Sequences include those isolated from laboratory adapted and vaccine strains that have to be excluded by inspection.
•
Segment types are part of name strings that also include place names. For example, “PA” could be a location (a state of the US) or the segment type.
•
Segment types are labeled in numerous ways, e.g., polymerase 2, polymerase basic protein 2, segment 1, Sequence 1, P2, PB2, pb2.

These issues make it difficult to fully automate the process of extracting information from the database and require significant manual effort. Full details of the steps required to acquire this data are given in the Supplemental Material, Appendix E, which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.95. Our final curated list of viruses is available in the supplemental material as http://bmi.osu.edu/~shahid/SOIV/CuratedList.txt.

2.2. Notation (Adapted from [19])

The number of viruses.
The number of stages of evolution.
The number of segments in each virus.
Segment number of virus .
The distance between segments of
viruses and .
The distance between viruses and .
.
A reassorted virus where segment of
virus replaces segment of virus .
The distance between a reassorted virus
and virus .
.

2.3. Computing Intersegment Distances

Weighted edit distances for every pair of corresponding segments were computed using standard dynamic programming [21, Section 11.5], as detailed in [19]. We use “end-space free” alignments [21 Section 11.6.4]. Such alignments ignore any missing bases at the ends of the input strings, which are due to sequencing artifact. When computing distances, we needed to account for symbols other than A, C, T, G, that represent ambiguous bases [22]. This issue is described in detail in the online Supplemental Material, Appendix F.

The number of fully sequenced (all eight segments known) influenza A viruses in www.ncbi.nlm.nih.gov/genomes/FLU as of 15 October 2009 was 5, 016. Thus, we needed to carry out Inline graphic alignments. The memory requirements in this case are modest, as the length of an influenza segment is bases, implying dynamic programming matrices of size . Furthermore, the alignments are completely independent and thus trivial to parallelize.

We initially used the resources of Ohio Supercomputer Center (OSC, www.osc.edu), which has several thousand processors. The computation of all distances took 12-24 hours on this shared system, where we had to compete with hundreds of other users.

During the course of our research, our department (Biomedical Informatics) acquired its own cluster, called “Bucki,” with 576 AMD Opteron 2378 (2.4 GHz) processors that we could use in dedicated mode. The time on this system was reduced to 90 minutes by finely dividing the workload, as detailed in the online Supplemental Material, Appendix G.

2.4. The Reassortment Network

We now turn to the actual creation and solution of reassortment networks. As the theory underlying these networks has been discussed in detail by Bokhari and Janies [19], we present only a brief sketch.

A reassortment network representing Inline graphic stages of evolution has layers . Even layers represent viruses and odd layers represent events, which can be stasis, mutation, or reassortment. Edges extend only from layer to , (resulting in a multipartite graph). A path from a virus in the first layer to a virus in the last layer represents a series of stasis, mutation, or reassortment events that transform Inline graphic to .

Edges in the reassortment network have weights on them. The weighting scheme is designed such that the cost of the path between virus Inline graphic in layer 0 and virus in layer equals the sum of the costs of the mutation and reassortment events that transform into (stasis costs between identical segments are, of course, zero).

The cost of a mutation of virus Inline graphic into virus is the sum of the edit costs of the individual segments. When virus reassorts to obtain a segment from virus to become virus , the reassortment cost is the sum of the segment edit distances between virus (with segment replaced by the corresponding segment of virus ) and virus Inline graphic . These concepts are clarified in Fi gs. 12 and 13 in the online Supplemental Material, Appendix D.

2.5. Motivation for Reassortment Algorithm

Our reassortment network is a layered graph in which alternating layers of nodes correspond to viruses and reassortment events between pairs of viruses. Edges in this graph correspond to transitions between viruses and reassortment events, with edge weights corresponding to the costs of transitions. These costs are the sums of segment distances between pairs of viruses.

Paths in a reassortment network correspond to evolutionary changes and the lengths of the paths correspond to the sums of the edge weights in the paths. Lower path lengths correspond to smaller sums of distances between the corresponding sequence of viruses. It follows that the shortest path indicates the minimum cost sequence of mutation and/or reassortment events required to transform one virus into another.

2.6. Finding Shortest Paths

Once a Inline graphic stage reassortment network for viruses with segments each has been set up, the shortest evolutionary path between any two viruses can be found in time . This time suffices to find shortest paths from all viruses in layer 0 to a given target virus in layer . These paths constitute the in-tree for the target virus. The expression Inline graphic represents a very large number of computational steps, given that , and , and massive parallelism is required to achieve reasonable run times, as discussed below.

2.7. The Cray XMT

The algorithm for finding shortest paths in the reassortment network is difficult to parallelize on conventional cluster machines because

1.
the reassortment network is a graph that has little locality and thus requires long-range communications between graph nodes,
2.
there is a strict precedence relationship on the order in which the nodes are labeled, that is, from layer 0 to layer ,
3.
if layers are assigned to individual processors, then all processors whose layers are currently not being updated will be idle, and
4.
if layers are partitioned over processors, then there will be need for frequent and heavy interprocessor communications.

The Cray Extreme Multithreading (XMT) supercomputer is the latest in a family of machines that originated in the Tera [23] and subsequently evolved into the MTA (Multithreaded Architecture) [24]. Reviews of the architecture of this machine and its applications are available in [25] and [26].

The key features of this machine include

1.
hardware support for 128 threads per processor,
2.
zero overhead switching between threads,
3.
large uniformly accessible shared memory,
4.
extremely fine grained synchronization using Full/Empty bits with individual 64-bit words,
5.
powerful interconnect, and
6.
powerful compiler and performance analysis tools.

These features of the XMT allow straightforward and efficient parallelization of many important problems, especially those in bioinformatics, as surveyed in [26]. In particular, the XMT is the only machine that can efficiently parallelize large-scale reassortment networks.

The largest XMT that is available to researchers as of 2010 is the 128 processor, 1 Terabyte machine at the Center for Adaptive Supercomputer Software (CASS) in the Pacific Northwest National Laboratory (PNL).

2.8. Parallelization of Algorithm

The XMT allows the parallelization of ordinary C code with some loop restructuring and the judicious addition of pragmas (compiler directives in code). The use of machine specific synchronization is occasionally required.

As an example, Fig. 2 describes the routine that labels event nodes from virus nodes when finding shortest paths in a reassortment network. The #pragmas assure the compiler that the loops that follow are safely parallelizable. Only the i and k loops need to be parallelized since the number of Inline graphic and this results in enough parallelism ( computations) to keep the machine busy. Thus, the s loop is not parallelized. The readfe(.) (wait until full, then read and set empty) (writeef(.) (wait until empty, then write and set full)) functions lock (unlock) their arguments (which are individual words) so that multiple threads can safely and correctly update them. Although expressed as functions, these are really machine operations that are executed in one clock cycle, leading to very efficient, fine grained synchronization. This behavior is possible because the Cray XMT supercomputer has special synchronization bits with each memory location. An illustration of synchronization is provided in the online Supplemental Material, Appendix H. Notice how variables are declared only in the scopes where they are used. This ensures that if a loop is parallelized, the corresponding thread will have an independent copy of the variable, thus eliminating contention.

2.9. Implementation Details

When going from a serial to a parallel version of our code, we first examined the major loops to identify which could be parallelized. In these loops, we had to ensure that all accesses to shared variables were safely parallelizable. This is easy to do, since the XMT compiler will indicate unsafe parallelization (i.e., situations where there are accesses to shared locations without appropriate synchronization). Full/Empty operations, readfe(.) and writeef(.), as described above, were used in such situations. In a few cases, loops had to be restructured so as to allow efficient parallelization.

One of the major issues when using the XMT is the speed of input/output. When large volumes of data, such as the distance matrices (which are of size 101 million words) are to be input, these have first to be loaded into a special parallel file system and then transferred in parallel into the 1 Terabyte shared memory of the XMT. Ordinary C input/output functions are executed serially and cannot be used for large data volumes.

Our code occupied about 120 GB of memory and ran at the rate of about 1.1 hours per stage, for a total of 35 hours for a 32-stage problem, using 128 processors.

3. S-OIV Evolution Models

A number of research groups have presented models of the emergence of influenza in human hosts from influenza in swine hosts, including S-OIV. These include [4], [27], [28], [29], [30]. Figs. 3, 4, and 5 present the salient features of these models.

In work by Dawood et al. [4], Olsen [27], Trifonov et al. [28], and Kingsford et al. [29], “classes” of viruses such as “Classical H1N1 swine,” “Eurasian H1N1 swine,” etc. are mentioned. We have been unable to find enumerated lists of these classes of viruses in the literature. This lack of specificity complicates the comparison of the results of our algorithm with those of other research groups.

Consider class “Eurasian Swine Influenza” shown in of [29], Fig. 3], which is a phylogenetic tree for the NA segment and includes many avian viruses, e.g., A/duck/Nanchang/1904/1992, A/goose/Italy/296426/2003, A/chicken/ Hebei/718/2001, etc. The rationale behind this inclusion is not explained and appears to be based only on the similarity between the NA segments and not on any clinical evidence of infection of swine with avian viruses.

4. Validation of Model

We ran our reassortment network for different target viruses. To start with, we present the results of two runs with non-SOIV viruses as targets. The objective here is to test our algorithm against prior results by Karasin et al. [31]. The following validation covers virus evolution over the period 1977-1999, while the S-OIV investigation (Section 5, below) covers the years 1930-2009. The same master database of intersegment distances was used in both cases; the reassortment algorithm is insensitive to the year of the target virus.

4.1. Constraints

We used a threshold of 500: if two viruses differed in more than distance 500 in any one segment, the mutation or reassortment event was ignored. This is necessary to get useful information from our reassortment network. This is because the algorithm searches for shortest paths in terms of sums of edge weights. If a high threshold (or no threshold) is used, all evolutionary paths will be only a few edges long, thus obscuring fine grained information on mutations and reassortments. If a zero threshold is used, no path will be found (as the target will be disconnected from the rest of the graph). For our 5, 016 virus, 35 stage problem, we found that 500 was a choice of threshold that gave useful information. This phenomenon is discussed in Section 5.4 and Fig. 8 of our previously published paper [19].

We allowed reassortments between viruses Inline graphic and to yield only if the years of and were ≤ the year of . Under these constraints, some viruses may not participate in paths to the target.

4.2. Interpreting In-Trees

As described in Fig. 6, a black box represents a virus with the number on the right of the box indicating the length of the shortest path from that virus to the target. Two black boxes connected by a black line indicate a mutation event. Each dashed red box indicates a reassortment, with the segment name and number indicated on the right hand side. In Fig. 6, A/PuertoRico/8/34 (which has total distance 2, 348 from the target (not shown)) mutates into A/Alaska/1935 which is at distance 2, 407 from the target. A/Albany/1618/1951 reassorts: it obtains segment 1 (PB2) from A/HongKong/117/1977 to become A/Tientsin/78/1977.

Virus names that are too long to fit in available space are truncated (indicated with “ Inline graphic ”) and full names are available in the corresponding digital annotations in the pdf files in the online Supplemental Material.

4.3. A/Swine/Colorado/1/1977

Fig. 7 shows part of the tree resulting from a run with A/swine/Colorado/1/1977(H3N2) as target. The full tree is in Supplemental file ASwineColorado.pdf. The blue cut associates (on its right hand side) viruses that are distance Inline graphic from the target virus. These viruses are exclusively of human origin. Nonhuman origin viruses, on the left side of the cut, have weight greater than 1, 800. There is a clear differentiation between human () and nonhuman () viruses. This validates the result by Karasin et al. [31] that A/swine/Colorado/1/1977 is wholly human in origin.

4.4. A/Swine/Nebraska/209/98

The in-tree from a run with target A/swine/Nebraska/209/98(H3N2) is shown in Fig. 8. The full tree is in Supplemental file ASwineNebraska.pdf. According to Karasin et al. [31], the PA and PB2 segments should be derived from avian viruses and the remaining from human. In the tree of Fig. 8, the lowest cost paths (blue box) pass through reassortments with avian viruses to obtain PA and PB2 just before the target. This partially supports Karasin et al. [31], since these are two disjoint sets of paths and not one path with two reassortments, as we would have expected. We conjecture that this is due to missing data—a richer data set might have yielded the expected path.

5. Results

5.1. Parameters of 32-Stage Run

We now present the results of a run with the exemplar S-OIV virus A/California/04/2009 as target. This 32-stage run took 35 hours on a 128 processor Cray XMT. The reassortment network generates an in-tree of shortest paths from 5, 015 viruses to A/California/04/2009.

In addition to the constraints stated in Section 4.1, we used a reassortment overhead of 10 units to suppress trivial, small distance reassortments that would otherwise clutter up our paths. Such reassortments are indistinguishable from point mutations and drive interesting large-distance reassortments out of the 35-stage range of our experiment. The overhead is incorporated by adding 10 units to the reassortment edges of our network.

5.2. Suppressing Intra-S-OIV Events

As a result of the intense interest in the pandemic strains, the number of S-OIV viruses in our database is a disproportionately large fraction of the total viruses. Consequently, the target virus is surrounded by a very large subtree that represents intra-S-OIV evolution and obscures other evolutionary events. To concentrate on our immediate objective of tracing the origin of the exemplar S-OIV virus A/California/04/2009, we suppressed paths through all other S-OIVs by temporarily setting their distances from everyone else to infinity.

5.3. Viewing the In-Trees

Despite the constraints mentioned in Section 4.1, the size of the database is such that very large trees are still generated and it is a challenge to visualize the results. After trying several approaches, we have chosen to separate the in-trees by the years of their source viruses. Thus, F ig. 14 in the online Supplemental Material, Appendix J, shows the in-tree for viruses from the 1930s, 40s, 50s while F ig. 15 shows those from the 1960s (trees for subsequent decades are available in the online Supplemental Material). For clarity, even with this approach, we needed to prune paths in which the leaf nodes were highly similar in terms of their distance to target. As the number of sequenced viruses has been increasing dramatically since the 1960s, it is impossible to show results from the 1970s onward on paper. Full information from the run is available as a spreadsheet as described below.

In Supplemental Fi g. 14, we show part of the in-tree corresponding to source viruses from the 1930-59 time period. The paths in this figure indicate how these viruses could possibly have evolved into the S-OIV A/California/04/209.

Although the source viruses (leaf nodes) are 60 or more years old, several have low-distance paths to the S-OIV.

In particular, A/swine/1931, could have reassorted with A/swine/Kansas/015252 to eventually become A/California/04/2009. A/chicken/Germany/N/1949 has high distance to the target, but has a path that transforms it into A/California/04/2009 after three reassortments. The intermediate viruses in this path are evenly spaced in terms of distance.

Note that many of the paths pass through A/swine/HongKong/1562/2005(H1N2), A/swine/Guangxi/13/2006(H1N2), and A/swine/Kansas/77778/2007(H1N1). These viruses are two edges away from the target, have path weight Inline graphic and form part of an important “bottleneck” set that we will discuss below. The annotated tree is in Supplemental file tree304050.pdf.

The in-tree for the 1960s is shown in Supplemental Fi g. 15. It is noteworthy that the bottleneck set has now expanded to include A/swine/HongKong/1110/2006(H1N2) and that most paths pass through this set. The annotated tree is in Supplemental file tree60.pdf. Supplemental file tree70.pdf holds the in-tree for the 1970s. The bottleneck viruses are the same as for the 1960s, and have path weight to target Inline graphic . A number of additional viruses now occur at distance two edges from the target, but these have path weight , which is significantly greater than the bottleneck viruses. For the 1980s, Supplemental file tree80.pdf has a new bottleneck virus: A/swine/ Shanghai/1/2007. This has slightly greater distance, i.e., 1, 030, than the others, but is still significantly smaller than other viruses at distance two edges from target. Supplemental file tree90.pdf shows a new bottleneck virus A/Iowa/CEID23/2005(H1N1), with distance 1, 008 to target.

For viruses that were identified in 2000 and later, an annotated tree is available in Supplemental file tree00.pdf. Fig. 9 shows a selection of paths that pass through a set of six bottleneck viruses, five of which were encountered in previous decades. The new bottleneck is A/swine/Shanghai/1/2007(H1N2).

The virus A/duck/NC/91347/01(H1N2) is located in the same position in the tree as the bottleneck viruses and has distance to target of 954, which is smaller than any of the bottlenecks. However, in our run, only one path is found through it, unlike the bottleneck viruses which have hundreds of paths, as described below. We, therefore, do not include this in our list of bottleneck viruses, but do consider it worthy of future analysis in a molecular biology context because of its low distance.

5.4. Bottleneck Viruses

Analysis of the results of the 32-stage run reveals that 3, 600 out of 3, 926 paths pass through a set of bottleneck viruses before reaching the S-OIV A/California/04/2009, as shown in Table 1. These viruses are:

1.
A/swine/Shanghai/1/2007(H1N2)
2.
A/swine/Guangxi/13/2006(H1N2)
3.
A/swine/HongKong/1110/2006(H1N2)
4.
A/swine/HongKong/1562/2005(H1N2)
5.
A/swine/Kansas/77778/2007(H1N1)
6.
A/Iowa/CEID23/2005(H1N1)

All of the above reassort with
7.
A/swine/HongKong/NS29/2009(H1N1)

TABLE 1. Numbers of Paths through Bottleneck Viruses.

1	A/swine/Shanghai/1/2007	292
2	A/swine/Guangxi/13/2006	1252
3	A/swine/HongKong/1110/2006	199
4	A/swine/HongKong/1562/2005	919
5	A/swine/Kansas/77778/2007	736
6	A/Iowa/CEID23/2005	202
Paths through bottleneck viruses		3600
Total paths in tree		3926

Open in a new tab

to obtain the NA segment before reaching A/California/04/2009. Viruses 1-6 are obtained by reassortments with a number of different viruses. Some of these are the bottlenecks themselves. For example, one of the possible evolutionary paths in Fig. 9 shows A/mallard/Maryland/182/2006 reassorting with A/swine/HongKong/1110/2006 (bottleneck no. 3) to yield A/swine/Guangxi/13/2006 (bottleneck no. 2). The nonbottleneck viruses that donate segments are:

1.
A/swine/OH/511445/2007(H1N1)
2.
A/turkey/Ontario/31232/2005(H3N2)
3.
A/swine/Shanghai/1/2005(H1N1)

Fig. 10 illustrates the bottleneck viruses. This subtree was obtained from the full tree by only including paths that pass through bottleneck viruses 1-6, listed above, and excluding everything beyond the second reassortment.

A very large proportion of paths reach the S-OIV target through the bottleneck viruses, as shown in Table 1. This strongly suggests that these paths represent actual evolutionary events. Note, however, that only a subset of these paths is shown in the figures discussed above. The fact that these six bottleneck viruses occur repeatedly and at the lowest distance found from the target A/California/04/2009 suggests that they are important in the evolutionary history of S-OIV influenza A.

5.5. Spreadsheet

The Supplemental Material includes a spreadsheet SOIVspread.xls that lists all paths to A/California/04/2009, from the 32-stage run, sorted by year of source virus and then by path cost. Bottleneck viruses are marked with “ Inline graphic ” in this spreadsheet and in the tables that follow below. Explanation of the notation used in this spreadsheet appears in the Supplemental Appendix I.

6. Discussion

Our analysis yields detailed information on the series of reassortments and mutations required to transform a set of viruses to S-OIV viruses. We have identified six bottleneck viruses that almost invariably occur on shortest paths found to the target S-OIV virus.

To investigate these bottleneck viruses further, Table 2 shows details of the reassortments that occur in the final stage of the network. The entries in this table can be interpreted using the notation given in Section 2.2. As an example, the first block shows virus 1, A/swine/ Shanghai/1/2007(H1N2) reassorting with virus 7 A/swine/HongKong/NS29/2009(H1N1) to yield target Inline graphic A/California/04/2009(H1N1). The row 472 422 396 574 335 1029 143 171 (3542) indicates the distances between the eight segments of viruses 1 and 7, with the number in parenthesis (3542) giving the sum of these distances. Similarly, and indicate the distances between viruses 7 and 1, and target Inline graphic , respectively. shows the distances between the reassorted virus (i.e., the virus obtained by replacing segment 6 (NA) of virus 1 with the corresponding segment of virus 7) and the target virus . Further elucidation of the reassortment notation is available in the Supplemental Appendix C.

TABLE 2. Reassortments of Bottlenecks 1-6 with Virus 7 to Obtain Target .

	PB2	PB1	PA	HA	NP	NA	MP	NS
1	A/swine/Shanghai/1 /2007(H1N2)
	472	422	396	574	335	1029	143	171	(3542)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	126	143	123	202	66	1041	144	56	(1901)
	126	143	123	202	66	160	144	56	(1020)
2	A/swine/Guangxi/13/2006(H1N2)
	475	449	455	632	379	1047	148	177	(3762)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	114	124	143	111	70	1064	149	59	(1834)
	114	124	143	111	70	160	149	59	(930)
3	A/swine/HongKong/1110/2006(H1N2)
	458	416	397	593	337	1026	148	174	(3549)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	125	128	131	116	81	1031	143	62	(1817)
	125	128	131	116	81	160	143	62	(946)
4	A/swine/HongKong/1562/2005(H1N2)
	463	411	394	584	338	1059	147	175	(3571)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	108	123	124	102	73	1076	148	65	(1819)
	108	123	124	102	73	160	148	65	(903)
5	A/swine/Kansas/77778/2007(H1N1)
	458	395	389	597	338	361	146	192	(2876)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	135	132	139	127	95	362	148	64	(1202)
	135	132	139	127	95	160	148	64	(1000)
6	A/Iowa/CEID23/2005(H1N1)
	465	414	395	592	324	372	151	175	(2888)
7	A/swine/HongKong/NS29/2009(H1N1)
	472	410	397	601	331	160	53	192	(2616)
	A/California/04/2009(H1N1)
	124	129	118	184	76	363	153	54	(1201)
	124	129	118	184	76	160	153	54	(998)

Open in a new tab

In Table 2, all bottleneck viruses have small differences from the target virus in all segments except NA. A reassortment with A/swine/HongKong/NS29/2009, which has low distance (160) for the NA segment results in a virus that is very close to the target. Furthermore, the years of the three viruses in each possible reassortment are obviously consistent (as a result of constraints built into the reassortment network). Finally, most of these viruses are remarkably close to each other, as Table 3 demonstrates. This distance matrix gives the absolute distances (sums of the segment-wise distances) between all pairs of bottleneck viruses. The four H1N2 viruses are very close to each other as are the two H1N1 viruses.

TABLE 3. Absolute Pairwise Distances between Bottleneck Viruses.

1	2	3	4	5	6
0	554	541	487	1706	1662	1
	0	473	426	1735	1753	2
		0	349	1643	1682	3
			0	1625	1666	4
				0	654	5
						0

Open in a new tab

Tables 4 and 5 show the shortest paths from our run, organized by number of edges and permit us to identify unusual events. We would expect the path length to increase with the number of edges in a path across paths and within paths and this does, in general, happen. However it is noticeable that the weight of the shortest path for edges Inline graphic in Table 4 is remarkably small. This path starts in A/duck/NC/91347/01, a virus that we have already noted, in Section 5.3, as being oddly similar to the bottleneck viruses (though only having a single path through it). Olsen et al. [32] described this strain in 2003 and showed that it has great similarity with swine origin viruses. Similarly, for paths with three edges, A/swine/OH/511445/2007 (not a bottleneck) has an unusually low distance to the target, compared with other viruses with the same number of edges.

TABLE 4. The Four Shortest Paths with Two, Three, and Four Edges Each (*** = Bottleneck) (All Reassortments Match Those in Figs. 3, 4, and 5).

Open in a new tab

TABLE 5. Shortest Paths with 5—10 Edges (*** = Bottleneck) (All Reassortments Except Those Noted Match Figs. 3, 4, and 5).

edges=5
2002
start	H5N2	A/mallard/Maryland/789/2002 cost 2667
becomes	H5N2	A/mallard/MD/790/2002 cost 2662
gets HA	H1N2	from A/swine/HongKong/1110/2006¹
becomes	H1N2	A/swine/Guangxi/13/2006 cost 940^***
gets NA	H1N1	from A/swine/HongKong/NS29/2009
becomes	H1N1	A/California/04/2009 cost 0
edges=6
1998
start	H3N2	A/Swine/Nebraska/209/98 cost 1441
gets NP	H3N2	from A/Swine/Minnesota/593/99
becomes	H3N2	A/Swine/Iowa/533/99 cost 1417
gets HA	H1N1	from A/swine/Shanghai/1/2005
becomes	H1N2	A/swine/HongKong/1562/2005 cost 913^***
gets NA	H1N1	from A/swine/HongKong/NS29/2009
becomes	H1N1	A/California/04/2009 cost 0
edges=7
1995
start	H3N2	A/NewYork/687/1995 cost 2643
becomes	H3N2	A/NewYork/678/1995 cost 2637
gets PA	H3N2	from A/Swine/Minnesota/593/99
becomes	H3N2	A/Swine/Iowa/533/99 cost 1417
gets HA	H1N1	from A/swine/Shanghai/1/2005
becomes	H1N2	A/swine/HongKong/1562/2005 cost 913^***
gets NA	H1N1	from A/swine/HongKong/NS29/2009
becomes	H1N1	A/California/04/2009 cost 0
edges=8
1995
start	H3N2	A/NewYork/623/1995 cost 2656
becomes	H3N2	A/NewYork/612/1995 cost 2652
becomes	H3N2	A/NewYork/635/1996 cost 2647
gets PA	H3N2	from A/Swine/Minnesota/593/99
becomes	H3N2	A/Swine/Iowa/533/99 cost 1417
gets HA	H1N1	from A/swine/Shanghai/1/2005
becomes	H1N2	A/swine/HongKong/1562/2005 cost 913^***
NA	H1N1	from
becomes	H1N1	A/California/04/2009 cost 0
edges=9
2003
start	H3N2	A/NewYork/213/2003 cost 2948
becomes	H3N2	A/NewYork/215/2003 cost 2945
gets PA	H3N1	from A/swine/MI/PU243/04³
becomes	H3N2	A/turkey/Ohio/313053/2004 cost 1545
NA	H3N1	from A/swine/IN/PU542/04³
becomes	H3N1	A/swine/MI/PU243/04 cost 1327
gets HA	H1N1	from A/swine/OH/511445/2007
becomes	H1N1	A/swine/Kansas/77778/2007 cost 1010^***
gets NA	H1N1	from A/swine/HongKong/NS29/2009
becomes	H1N1	A/California/04/2009 cost 0
edges=10
1977
start	H1N6	A/mallard/Alberta/42/1977 cost 3382
gets HA	H4N6	from A/gadwallduck/ALB/53/1977³
becomes	H4N6	A/canvasbackduck/ALB/274/1977 cost 3336
gets NS	H1N1	from A/mallard/Alberta/46/1977³
becomes	H4N6	A/bluewingedteal/ALB/243/1977 cost 3270
gets NA	H1N1	from A/mallard/Alberta/127/1977²
becomes	H4N1	A/mallardduck/Alberta/291/1977 cost 2847
gets HA	H1N1	from A/swine/OH/511445/2007
becomes	H1N1	A/swine/Kansas/77778/2007 cost 1010^***
gets NA	H1N1	from A/swine/HongKong/NS29/2009
becomes	H1N1	A/California/04/2009 cost 0

Open in a new tab

^¹

Matches only Olsen and Kingsford, Nagarajan & Salzberg (Fig. 3).

^{^2.}

Matches only Smith et al. (Fig. 5).

^³

Does not match any of Figs. 3, 4 or 5.

In Table 5, it is immediately noticeable that nonbottleneck A/Swine/Nebraska/209/98 has path length 1, 441 (with edges Inline graphic ) which is significantly different from other paths that are for edges , 7, and 8.

Turning to Table 6, we note that while the oldest shortest path is 2, 066 for A/swine/USA/1976-MA/1931, the shortest paths for later decades ( Inline graphic ) are not significantly different. In the 1970s, some biological event results in a large multiplicity of shortest paths of length . In subsequent decades the shortest path lengths decrease significantly, dropping to 950-1500 in the 2000s. Once again we see A/duck/NC/91347/01 standing out because it is dated 2001, unlike the remaining viruses from this decade that are from 2004 or later.

TABLE 6. The Five Shortest Paths by Decade^†.

Subtype	Decade	Path length
	1930–1939
H1N1	A/swine/USA/1976-MA/1931	2066
H1N1	A/swine/USA/1976/1931	2072
H1N1	A/swine/1931	2100
H1N1	A/swine/Ohio/23/1935	2108
H1N1	A/Phila/1935	2308
	1940–1949
H1N1	A/swine/Jamesburg/1942	2073
H1N1	A/AA/Marton/1943	2367
H1N1	A/Bellamy/1942	2401
H1N1	A/Weiss/1943	2405
H1N1	A/Hickox/1940	2415
	1950–1959
H1N1	A/swine/Wisconsin/1/1957	2056
H1N1	A/Albany/13/1951	2704
H1N1	A/FortWorth/1950	2764
H1N1	A/Malaya/302/1954	2824
H1N1	A/Malaysia/1954	2824
	1960–1969
H1N1	A/swine/Wisconsin/1/1961	2023
H3N2	A/HongKong/1–8-MA21–1/1968	2894
H3N2	A/HongKong/1–8-MA21–3/1968	2911
H3N2	A/Albany/10/1968	2912
H3N2	A/Beijing/1/1968	2912
	1970–1979
H1N1	A/swine/Minnesota/5892–7/1979	2005
H1N1	A/swine/Tennessee/10/1976	2015
H1N1	A/swine/Nebraska/123/1977	2021
H1N1	A/swine/Minnesota/27/1976	2021
H1N1	A/swine/Tennessee/17/1976	2023
	1980–1989
H1N1	A/swine/Iowa/17672/1988	1921
H1N1	A/turkey/NC/17026/1988	1934
H1N1	A/swine/Wisconsin/1915/1988	1953
H1N1	A/Swine/Indiana/1726/1988	1958
H1N1	A/swine/Kansas/3024/1987	1972
	1990–1999
H3N2	A/Swine/Iowa/533/99	1417
H3N2	A/Swine/Minnesota/593/99	1425
H3N2	A/Swine/Nebraska/209/98	1441
H1N1	A/turkey/IA/21089-3/1992	1910
H1N1	A/swine/Maryland/23239/1991	1911
	2000–2009 (excludes bottlenecks)
H1N2	A/duck/NC/91347/01	954
H1N1	A/swine/OH/511445/2007	1053
H3N1	A/swine/MI/PU243/04	1327
H3N1	A/swine/IN/PU542/04	1384
H1N1	A/swine/Ontario/53518/03	1478

Open in a new tab

^{^†}

Only the first of any set of paths originating from same location.

6.1. Significance of Bottleneck Viruses

As far as we are aware, the discovery of bottleneck viruses is new to the field and has not been reported elsewhere. How do these bottleneck viruses arise and what is their significance? Reviewing Table 2, we see that

1.
five of the six bottlenecks were isolated from swine (the 6th., A/Iowa/CEID23/2005, was isolated from a swine farm worker [33]),
2.
all bottlenecks obtained the NA segment from A/swine/HongKong/NS29/2009,
3.
the NA segments of the bottlenecks are very distant from the target(A/California/04/2009)s NA, (from 363 to 1, 076 units), and
4.
the donor(A/swine/HongKong/NS29/2009)s NA is only 160 units from the target’s NA.

We can, therefore, argue that there are six potential immediate sources from which the target SOIV could have emerged. A/swine/Shanghai/1/2007 and A/swine/Guangxi/13/2006(H1N2) are discussed by Yu et al. [34]. All four Chinese bottleneck viruses are included in the Table entitled “Recent ancestral swine influenza A viruses of pandemic (H1N1) 2009 viruses,” in [35], Supplementary Material]. Interestingly, this Table also includes A/swine/OH/511445/2007, which is not a bottleneck, but appears as a donor to the bottlenecks in Fig. 10 (see Section 6.2). A/swine/Shanghai/1/2007 is also included in Table 1 of Trifonov et al. [28]. Both Hong Kong viruses are listed in Table S3 in Smith et al. [30], ].

A/swine/Kansas/77778/2007 is a particularly virulent strain discussed by Ma et al. [36]. This strain also appears in the phylogenetic trees given by Kingsford et al. [29]. The donor A/swine/HongKong/NS29/2009 is listed by Smith et al. [30], Supplementary Information] and Vijaykrishna et al. [37], Online Supporting Material].

The bottleneck viruses are, thus, the key members of the large sets that have been identified by other researchers as being important to the SOIV evolution. Our algorithmic technique highlights these six viruses as being crucial to the SOIV pandemic. We were able to discover these viruses via search of a comprehensive database without reliance on preconceived notions of lineages to sample. It is now for biologists, virologists, and epidemiologists to apply molecular biology techniques to establish the functional reasons for the prominence of these viruses. Our research, which is of a purely algorithmic nature, will proceed in the directions given in Section 6.4.

One of the main accomplishments of our research is to provide means for identification of specific viral isolates that are likely similar to ancestors of epidemic viruses. By concentrating on actual viral isolates rather than inferred ancestors as in phylogenetics, we assure that our results are functionally plausible. In phylogenetics, the goal is to infer median states at ancestors to optimize an objective function over edit costs—there are typically no functional constraints attempted or implied (but see Wheeler [38] for a counterexample). The specificity and functional constraints inherent to our method are important because they permit other researchers, especially in molecular biological domains, to use our results to choose in vitro and in vivo models.

6.2. Nonbottleneck Viruses of Interest

In addition to the bottleneck viruses, A/duck/NC/91347/01, A/swine/OH/511445/2007, and A/Swine/Nebraska/209/98 are worthy of detailed study because of their unusually low distance to the target.

6.3. Validation of Reassortments

It is of great interest to compare the reassortments in the paths of Tables 4 and 5 against the models described in Section 3. We consider a reassortment indicated in our in-tree as matching a reassortment in Figs. 3, 4, or 5, if there is a path in the figures matching (in terms of subtype, host) the reassortment from the in-tree. For example, in Table 4 the three-edge path for 1977 gets PB1 from H3N2 swine. There is a path from H3N2 swine to SOIV in each of Figs. 3, 4, and 5.

All reassortments in Table 4 and most in Table 5 match the models of Olsen and Kingsford, Nagarjan and Salzberg (Fig. 3), Trifonov, Khiabanian, and Rabadan (Fig. 4), and Smith et al. (Fig. 5). Those that do not match are indicated by footnotes in Table 5.

As stated in Section 3, the unavailability of enumerated lists of classes of viruses (e.g., “Classical H1N1 swine,” etc.) limits the granularity of our validation. Nevertheless, we see a broad agreement between our reassortments and the results of other researchers, which indicates that our research is consistent with, and a useful addition to, the existing knowledge and methods base. The in-trees generated by our algorithm provide an alternate and functionally actionable model for analyzing the evolution of S-OIV and other viruses.

6.4. Future Research

Some issues that present themselves for future work are:

1.
Bokhari and Janies [19] have proposed means for incorporating temporal, geographic, and host constraints in reassortment networks. These would result in more precise analyses.
2.
Given that fine grained data are available for the present pandemic, it is of great interest to monitor intra-SOIV evolution, and
3.
Refine the algorithm to repeat this analysis at a finer granularity, using distances between specific proteins encoded in segments, rather than global distances between segments.

Acknowledgments

The authors wish to thank Jeffrey Parvin for his encouragement of this research. They also thank David Mizell and Kristyn Maschhoff for providing access to “Egret,” the XMT at Cray Inc. and for assistance with performance and programming issues on the machine. All of the reassortment networks discussed in this paper were run on “Cougar,” the XMT at the Center for Adaptive Supercomputing Software (CASS), Pacific Northwest Laboratory (PNL). We thank John Feo and Andrés Marquez for their support and Michael Peterson for his efforts in assuring uninterrupted access to the machine. The lavish computational resources provided by Ohio Supercomputer Center (OSC) and by the Department of Biomedical Informatics (BMI) at The Ohio State University are gratefully acknowledged. Hideaki Kikuchi provided valuable advice on the BMI “Bucki” cluster. We thank Ambreen Bokhari for her help with the figures. We are grateful to Saniyah Bokhari and Saba Bokhari for their careful editing of the manuscript. This work was supported by Pacific Northwest National Laboratory [Contract 97499 to Shahid H. Bokhari]; Ohio Supercomputer Center [allocation of time to Shahid H. Bokhari and Daniel A. Janies]; US Army Research Laboratory and Office [Grants W911NF-05-1-0271 and HR-0011-09-2-009 to Daniel A. Janies].

Biographies

graphic file with name 5871586-photo-1-source.gif

Shahid H. Bokhari (S’75-M’78-SM’84-F’97) received the BSc degree in electrical engineering from the University of Engineering and Technology, Lahore, Pakistan, in 1974, and the MS and PhD degrees in electrical and computer engineering from the University of Massachusetts, Amherst, in 1976 and 1978, respectively. His current research is directed toward the applications of high-performance computing to problems in bioinformatics. He is a fellow of the IEEE and of the ACM.

graphic file with name 5871586-photo-2-source.gif

Laura W. Pomeroy received the BS degree in molecular biology from Purdue University in 2002 and the PhD degree in biology from the Pennsylvania State University in 2008. She was a postdoctoral researcher in the Ohio State University Medical Center (2008-2010) and is currently a postdoctoral researcher in the College of Veterinary Sciences at the Ohio State University. Her current research focuses on the evolutionary and ecological dynamics of viruses.

graphic file with name 5871586-photo-3-source.gif

Daniel A. Janies received the BS degree in biology from the University of Michigan in 1988 and the PhD degree in zoology from the University of Florida in 1995. He joined the Department of Biomedical Informatics at The Ohio State University in 2003. Currently, he is an associate professor and teaches computational analysis of genomic information for biomedical research. He worked as a postdoctoral fellow (1996-1999) and a principal investigator (2000-2002) at the American Museum of Natural History in New York. His current research concerns the evolution of animals and their development as well as the global spread of emergent infectious diseases.

Funding Statement

Footnotes

^{^1.}

The Supplemental Material to this paper (which can be found on the Computer Society Digital Library at http://doi.ieeecomputersociety.org/10.1109/TCBB.2011.95) includes a set of detailed Appendices. It also contains several large pdf files that are electronically annotated. The process of uploading may not have preserved the annotations. All Supplemental Material is duplicated at bmi.osu.edu/~shahid/SOIV, where the files have been verified to behave correctly. Annotations can only be viewed with the Adobe Reader.

Contributor Information

Shahid H. Bokhari, Email: shahid@bmi.osu.edu.

Laura W. Pomeroy, Email: pomeroy.26@osu.edu.

Daniel A. Janies, Email: Daniel.Janies@osumc.edu.

References

[1].Fraser C., Donnelly C.A., Cauchemez S., Hanage W.P., Van Kerkhove M.D., Hollingsworth T.D., Griffin J., Baggaley R.F., Jenkins H.E., Lyons E.J., Jombart T., Hinsley W.R., Grassly N.C., Balloux F., Ghani A.C., Ferguson N.M., Rambaut A., Pybus O.G., Lopez-Gatell H., Alpuche-Aranda C.M., Chapela I.B., Zavala E.P., Guevara D.M., Checchi F., Garcia E., Hugonnet S. and Roth C., “Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings,” Science, vol. 324, no. 5934, pp. 1557–1561, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[2].Neumann G., Noda T. and Kawaoka Y., “Emergence and Pandemic Potential of Swine-Origin H1N1 Influenza Virus”, Nature, vol. 459, no. 7249, pp. 931–999, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[3].Gibbs A.J., Armstrong J.S. and Downie J.C., “From Where did the 2009 “Swine-Origin” Influenza A Virus (H1N1) Emerge?”, J. Virology, vol. 6, p. 207, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[4].Dawood F.S., Jain S., Finelli L., Shaw M.W., Lindstrom S., Garten R.J., Gubareva L.V., Xu X., Bridges C.B. and Uyeki T.M., “Emergence of a Novel Swine-Origin Influenza A (H1N1) Virus in Humans”, The New England J. Medicine, vol. 360, no. 25, pp. 2605–2615, 2009. [DOI] [PubMed] [Google Scholar]
[5].Chan M., “World Now at the Start of 2009 Influenza Pandemic”, [Online]. Available: www.who.int/mediacentre/news/statements/2009/h1n1_pandemic_phase6_20090611/en/index.html, 2009.
[6].WHO “WHO Pandemic (H1N1) 2009 Update 98,” [Online]. Available: www.who.int/csr/don/2010_04_30a/en/index.html, 2009.
[7].Webster R.G., Bean W.J., Gorman O.T., Chambers T.M. and Kawaoka Y., “Evolution and Ecology of Influenza A Viruses”, Microbiological Rev., vol. 56, no. 1, pp. 152–159, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
[8].Nelson M.I. and Holmes E.C., “The Evolution of Epidemic Influenza”, Nature Rev. Genetics, vol. 8, no. 3, pp. 196–205, 2007. [DOI] [PubMed] [Google Scholar]
[9].Fouchier R.A., Munster V., Wallensten A., Bestebroer T.M., Herfst S., Smith D., Rimmelzwaan G.F., Olsen B. and Osterhaus A.D., “Characterization of a Novel Influenza A Virus Hemagglutinin Subtype (H16) Obtained from Black-Headed Gulls”, J. Virology, vol. 79, no. 5, pp. 2814–2822, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]
[10].Guan Y., Vijaykrishna D., Bahl J., Zhu H., Wang J. and Smith G.J.D., “The Emergence of Pandemic Influenza Viruses”, Protein and Cell, vol. 1, no. 1, pp. 9–13, Jan. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[11].Webster R., Bean W., Gorman O., Chambers T. and Kawaoka Y., “Evolution and Ecology Of Influenza-A Viruses”, Microbiological Rev., vol. 56, pp. 152–179, Mar. 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]
[12].Zhou N., Senne D., Landgraf J., Swenson S., Erickson G., Rossow K., Liu L., Yoon K., Krauss S. and Webster R., “Genetic Reassortment of Avian, Swine, and Human Influenza A Viruses in American Pigs”, J. Virology, vol. 73, pp. 8851–8856, Oct. 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]
[13].Campitelli L., Di Martino A., Spagnolo D., Smith G.J., Di Trani L., Facchini M., De Marco M.A., Foni E., Chiapponi C., Martin A.M., Chen H., Guan Y., Delogu M. and Donatelli I., “Molecular Analysis of Avian H7 Influenza Viruses Circulating in Eurasia in 1999-2005: Detection of Multiple Reassortant Virus Genotypes”, J. General Virology, vol. 89, pt. 1, pp. 48–59, 2008. [DOI] [PubMed] [Google Scholar]
[14].Rabadan R., Levine A.J. and Krasnitz M., “Non-Random Reassortment in Human Influenza A Viruses”, Influenza and Other Respiratory Viruses, vol. 2, no. 1, pp. 9–22, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[15].Khiabanian H., Trifonov V. and Rabadan R., “Reassortment Patterns in Swine Influenza Viruses”, PLoS One, vol. 4, no. 10, p. e7366, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[16].Huson D.H. and Bryant D., “Application of Phylogenetic Networks in Evolutionary Studies”, Molecular Biology and Evolution, vol. 23, no. 2, pp. 254–267, 2006. [DOI] [PubMed] [Google Scholar]
[17].Makarenkov V. and Legendre P., “From a Phylogenetic Tree to a Reticulated Network”, J. Computational Biology, vol. 11, no. 1, pp. 195–212, 2004. [DOI] [PubMed] [Google Scholar]
[18].Huson D.H. and Scornavacca C., “A Survey of Combinatorial Methods for Phylogenetic Networks”, Genome Biology and Evolution, vol. 3, pp. 23–35, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]
[19].Bokhari S.H. and Janies D.A., “Reassortment Networks for Investigating the Evolution of Segmented Viruses”, IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 288–298, Apr-Jun 2010. [DOI] [PubMed] [Google Scholar]
[20].Bao Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J. and Lipman D., “The Influenza Virus Resource at the National Center for Biotechnology Information”, J. Virology, vol. 82, no. 2, pp. 596–601, Apr-Jun 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]
[21].Gusfield D. Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, 1997. [Google Scholar]
[22].Cornish-Bowden A., “IUPAC-IUB Symbols for Nucleotide Nomenclature”, Nucleic Acids Research, vol. 13, pp. 3021–3030, 1985. [DOI] [PMC free article] [PubMed] [Google Scholar]
[23].Alverson R., Callahan D., Cummings D., Koblenz B., Porterfield A. and Smith B., “The Tera Computer System”, in Proc. Fourth Int’l Conf. Supercomputing, pp. 1–6, 1990. [Google Scholar]
[24].Snavely A., Carter L., Boisseau J., Majumdar A., Gatlin K.S., Mitchell N., Feo J. and Koblenz B., “Multi-Processor Performance on the Tera MTA”, in Proc. IEEE/ACM Conf. Supercomputing, pp. 1–8, 1998. [Google Scholar]
[25].Bokhari S.H. and Sauer J.R., “Sequence Alignment on the Cray MTA-2”, Concurrency and Computation, vol. 16, pp. 823–839, 2004. [Google Scholar]
[26].Bokhari S. and Sauer J., “Parallel Algorithms for Bioinformatics,” Parallel Computing for Bioinformatics, Zomaya A., Ed., pp. 509–529, Wiley, 2006. [Google Scholar]
[27].Olsen C.W., “The Emergence of Novel Swine Influenza Viruses in North America”, Virus Research, vol. 85, no. 2, pp. 199–210, 2002. [DOI] [PubMed] [Google Scholar]
[28].Trifonov V., Khiabanian H. and Rabadan R., “Geographic Dependence, Surveillance, and Origins of the 2009 Influenza A (H1N1) Virus”, The New England J. Medicine, vol. 361, no. 2, pp. 115–119, 2009. [DOI] [PubMed] [Google Scholar]
[29].Kingsford C., Nagarajan N. and Salzberg S.L., “2009 Swine-Origin Influenza A (H1N1) Resembles Previous Influenza Isolates”, PLoS One, vol. 4, no. 7, p. e6402, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[30].Smith G.J., Vijaykrishna D., Bahl J., Lycett S.J., Worobey M., Pybus O.G., Ma S.K., Cheung C.L., Raghwani J., Bhatt S., Peiris J.S., Guan Y. and Rambaut A., “Origins and Evolutionary Genomics of the 2009 Swine-Origin H1N1 Influenza A Epidemic”, Nature, vol. 459, no. 7250, pp. 1122–1125, 2009. [DOI] [PubMed] [Google Scholar]
[31].Karasin A.I., Schutten M.M., Cooper L.A., Smith C.B., Subbarao K., Anderson G.A., Carman S. and Olsen C.W., “Genetic Characterization of H3N2 Influenza Viruses Isolated from Pigs in North America, 1977-1999: Evidence for Wholly Human and Reassortant Virus Genotypes”, Virus Research, vol. 68, no. 1, pp. 71–85, 2000. [DOI] [PubMed] [Google Scholar]
[32].Olsen C.W., Karasin A. and Erickson G., “Characterization of a Swine-Like Reassortant H1N2 Influenza Virus Isolated from a Wild Duck in the United States”, Virus Research, vol. 93, no. 1, pp. 115–121, 2003. [DOI] [PubMed] [Google Scholar]
[33].Gray G.C., McCarthy T., Capuano A.W., Setterquist S.F., Olsen C.W., Alavanja M.C. and Lynch C.F., “Swine Workers and Swine Influenza Virus Infections”, Emerging Infectious Diseases, vol. 13, no. 12, pp. 1871–1878, Dec. 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]
[34].Yu H., Zhang P.-C., Zhou Y.-J., Li G.-X., Pan J., Yan L.-P., Shi X.-X., Liu H.-L. and Tong G.-Z., “Isolation and Genetic Characterization of Avian-Like H1N1 and Novel Ressortant H1N2 Influenza Viruses from Pigs in China”, Biochemical and Biophysical Research Comm., vol. 386, no. 2, pp. 278–283, 2009. [DOI] [PubMed] [Google Scholar]
[35].Chen G. and Shih S., “Genomic Signatures of Influenza A Pandemic (H1N1) 2009 Virus”, Emerging Infectious Diseases, vol. 15, no. 12, pp. 1897–1903, [Online]. Available: www.cdc.gov/EID/content/15/12/1897.htm, Dec. 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]
[36].Ma W., Vincent A.L., Lager K.M., Janke B.H., Henry S.C., Rowland R.R.R., Hesse R.A. and Richt J.A., “Identification and Characterization of a Highly Virulent Triple Reassortant H1N1 Swine Influenza Virus in the United States”, Virus Genes, vol. 40, no. 1, pp. 28–36, Feb. 2010. [DOI] [PubMed] [Google Scholar]
[37].Vijaykrishna D., Poon L.L., Zhu H.C., Ma S.K., Li O.T., Cheung C.L., Smith G.J., Peiris J.S. and Guan Y., “Reassortment of Pandemic H1N1/2009 Influenza A Virus in Swine”, Science, vol. 328, no. 5985, p. 1529, June 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]
[38].Wheeler W., “Fixed Character States and the Optimization of Molecular Sequence Data”, Cladistics, vol. 15, pp. 379–385, 1999. [DOI] [PubMed] [Google Scholar]

[ref1] [1].Fraser C., Donnelly C.A., Cauchemez S., Hanage W.P., Van Kerkhove M.D., Hollingsworth T.D., Griffin J., Baggaley R.F., Jenkins H.E., Lyons E.J., Jombart T., Hinsley W.R., Grassly N.C., Balloux F., Ghani A.C., Ferguson N.M., Rambaut A., Pybus O.G., Lopez-Gatell H., Alpuche-Aranda C.M., Chapela I.B., Zavala E.P., Guevara D.M., Checchi F., Garcia E., Hugonnet S. and Roth C., “Pandemic Potential of a Strain of Influenza A (H1N1): Early Findings,” Science, vol. 324, no. 5934, pp. 1557–1561, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref2] [2].Neumann G., Noda T. and Kawaoka Y., “Emergence and Pandemic Potential of Swine-Origin H1N1 Influenza Virus”, Nature, vol. 459, no. 7249, pp. 931–999, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref3] [3].Gibbs A.J., Armstrong J.S. and Downie J.C., “From Where did the 2009 “Swine-Origin” Influenza A Virus (H1N1) Emerge?”, J. Virology, vol. 6, p. 207, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref4] [4].Dawood F.S., Jain S., Finelli L., Shaw M.W., Lindstrom S., Garten R.J., Gubareva L.V., Xu X., Bridges C.B. and Uyeki T.M., “Emergence of a Novel Swine-Origin Influenza A (H1N1) Virus in Humans”, The New England J. Medicine, vol. 360, no. 25, pp. 2605–2615, 2009. [DOI] [PubMed] [Google Scholar]

[ref5] [5].Chan M., “World Now at the Start of 2009 Influenza Pandemic”, [Online]. Available: www.who.int/mediacentre/news/statements/2009/h1n1_pandemic_phase6_20090611/en/index.html, 2009.

[ref6] [6].WHO “WHO Pandemic (H1N1) 2009 Update 98,” [Online]. Available: www.who.int/csr/don/2010_04_30a/en/index.html, 2009.

[ref7] [7].Webster R.G., Bean W.J., Gorman O.T., Chambers T.M. and Kawaoka Y., “Evolution and Ecology of Influenza A Viruses”, Microbiological Rev., vol. 56, no. 1, pp. 152–159, 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref8] [8].Nelson M.I. and Holmes E.C., “The Evolution of Epidemic Influenza”, Nature Rev. Genetics, vol. 8, no. 3, pp. 196–205, 2007. [DOI] [PubMed] [Google Scholar]

[ref9] [9].Fouchier R.A., Munster V., Wallensten A., Bestebroer T.M., Herfst S., Smith D., Rimmelzwaan G.F., Olsen B. and Osterhaus A.D., “Characterization of a Novel Influenza A Virus Hemagglutinin Subtype (H16) Obtained from Black-Headed Gulls”, J. Virology, vol. 79, no. 5, pp. 2814–2822, 2005. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref10] [10].Guan Y., Vijaykrishna D., Bahl J., Zhu H., Wang J. and Smith G.J.D., “The Emergence of Pandemic Influenza Viruses”, Protein and Cell, vol. 1, no. 1, pp. 9–13, Jan. 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref11] [11].Webster R., Bean W., Gorman O., Chambers T. and Kawaoka Y., “Evolution and Ecology Of Influenza-A Viruses”, Microbiological Rev., vol. 56, pp. 152–179, Mar. 1992. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref12] [12].Zhou N., Senne D., Landgraf J., Swenson S., Erickson G., Rossow K., Liu L., Yoon K., Krauss S. and Webster R., “Genetic Reassortment of Avian, Swine, and Human Influenza A Viruses in American Pigs”, J. Virology, vol. 73, pp. 8851–8856, Oct. 1999. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref13] [13].Campitelli L., Di Martino A., Spagnolo D., Smith G.J., Di Trani L., Facchini M., De Marco M.A., Foni E., Chiapponi C., Martin A.M., Chen H., Guan Y., Delogu M. and Donatelli I., “Molecular Analysis of Avian H7 Influenza Viruses Circulating in Eurasia in 1999-2005: Detection of Multiple Reassortant Virus Genotypes”, J. General Virology, vol. 89, pt. 1, pp. 48–59, 2008. [DOI] [PubMed] [Google Scholar]

[ref14] [14].Rabadan R., Levine A.J. and Krasnitz M., “Non-Random Reassortment in Human Influenza A Viruses”, Influenza and Other Respiratory Viruses, vol. 2, no. 1, pp. 9–22, 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref15] [15].Khiabanian H., Trifonov V. and Rabadan R., “Reassortment Patterns in Swine Influenza Viruses”, PLoS One, vol. 4, no. 10, p. e7366, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref16] [16].Huson D.H. and Bryant D., “Application of Phylogenetic Networks in Evolutionary Studies”, Molecular Biology and Evolution, vol. 23, no. 2, pp. 254–267, 2006. [DOI] [PubMed] [Google Scholar]

[ref17] [17].Makarenkov V. and Legendre P., “From a Phylogenetic Tree to a Reticulated Network”, J. Computational Biology, vol. 11, no. 1, pp. 195–212, 2004. [DOI] [PubMed] [Google Scholar]

[ref18] [18].Huson D.H. and Scornavacca C., “A Survey of Combinatorial Methods for Phylogenetic Networks”, Genome Biology and Evolution, vol. 3, pp. 23–35, 2011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref19] [19].Bokhari S.H. and Janies D.A., “Reassortment Networks for Investigating the Evolution of Segmented Viruses”, IEEE/ACM Trans. Computational Biology and Bioinformatics, vol. 7, no. 2, pp. 288–298, Apr-Jun 2010. [DOI] [PubMed] [Google Scholar]

[ref20] [20].Bao Y., Bolotov P., Dernovoy D., Kiryutin B., Zaslavsky L., Tatusova T., Ostell J. and Lipman D., “The Influenza Virus Resource at the National Center for Biotechnology Information”, J. Virology, vol. 82, no. 2, pp. 596–601, Apr-Jun 2008. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref21] [21].Gusfield D. Algorithms on Strings, Trees, and Sequences. Cambridge Univ. Press, 1997. [Google Scholar]

[ref22] [22].Cornish-Bowden A., “IUPAC-IUB Symbols for Nucleotide Nomenclature”, Nucleic Acids Research, vol. 13, pp. 3021–3030, 1985. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref23] [23].Alverson R., Callahan D., Cummings D., Koblenz B., Porterfield A. and Smith B., “The Tera Computer System”, in Proc. Fourth Int’l Conf. Supercomputing, pp. 1–6, 1990. [Google Scholar]

[ref24] [24].Snavely A., Carter L., Boisseau J., Majumdar A., Gatlin K.S., Mitchell N., Feo J. and Koblenz B., “Multi-Processor Performance on the Tera MTA”, in Proc. IEEE/ACM Conf. Supercomputing, pp. 1–8, 1998. [Google Scholar]

[ref25] [25].Bokhari S.H. and Sauer J.R., “Sequence Alignment on the Cray MTA-2”, Concurrency and Computation, vol. 16, pp. 823–839, 2004. [Google Scholar]

[ref26] [26].Bokhari S. and Sauer J., “Parallel Algorithms for Bioinformatics,” Parallel Computing for Bioinformatics, Zomaya A., Ed., pp. 509–529, Wiley, 2006. [Google Scholar]

[ref27] [27].Olsen C.W., “The Emergence of Novel Swine Influenza Viruses in North America”, Virus Research, vol. 85, no. 2, pp. 199–210, 2002. [DOI] [PubMed] [Google Scholar]

[ref28] [28].Trifonov V., Khiabanian H. and Rabadan R., “Geographic Dependence, Surveillance, and Origins of the 2009 Influenza A (H1N1) Virus”, The New England J. Medicine, vol. 361, no. 2, pp. 115–119, 2009. [DOI] [PubMed] [Google Scholar]

[ref29] [29].Kingsford C., Nagarajan N. and Salzberg S.L., “2009 Swine-Origin Influenza A (H1N1) Resembles Previous Influenza Isolates”, PLoS One, vol. 4, no. 7, p. e6402, 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref30] [30].Smith G.J., Vijaykrishna D., Bahl J., Lycett S.J., Worobey M., Pybus O.G., Ma S.K., Cheung C.L., Raghwani J., Bhatt S., Peiris J.S., Guan Y. and Rambaut A., “Origins and Evolutionary Genomics of the 2009 Swine-Origin H1N1 Influenza A Epidemic”, Nature, vol. 459, no. 7250, pp. 1122–1125, 2009. [DOI] [PubMed] [Google Scholar]

[ref31] [31].Karasin A.I., Schutten M.M., Cooper L.A., Smith C.B., Subbarao K., Anderson G.A., Carman S. and Olsen C.W., “Genetic Characterization of H3N2 Influenza Viruses Isolated from Pigs in North America, 1977-1999: Evidence for Wholly Human and Reassortant Virus Genotypes”, Virus Research, vol. 68, no. 1, pp. 71–85, 2000. [DOI] [PubMed] [Google Scholar]

[ref32] [32].Olsen C.W., Karasin A. and Erickson G., “Characterization of a Swine-Like Reassortant H1N2 Influenza Virus Isolated from a Wild Duck in the United States”, Virus Research, vol. 93, no. 1, pp. 115–121, 2003. [DOI] [PubMed] [Google Scholar]

[ref33] [33].Gray G.C., McCarthy T., Capuano A.W., Setterquist S.F., Olsen C.W., Alavanja M.C. and Lynch C.F., “Swine Workers and Swine Influenza Virus Infections”, Emerging Infectious Diseases, vol. 13, no. 12, pp. 1871–1878, Dec. 2007. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref34] [34].Yu H., Zhang P.-C., Zhou Y.-J., Li G.-X., Pan J., Yan L.-P., Shi X.-X., Liu H.-L. and Tong G.-Z., “Isolation and Genetic Characterization of Avian-Like H1N1 and Novel Ressortant H1N2 Influenza Viruses from Pigs in China”, Biochemical and Biophysical Research Comm., vol. 386, no. 2, pp. 278–283, 2009. [DOI] [PubMed] [Google Scholar]

[ref35] [35].Chen G. and Shih S., “Genomic Signatures of Influenza A Pandemic (H1N1) 2009 Virus”, Emerging Infectious Diseases, vol. 15, no. 12, pp. 1897–1903, [Online]. Available: www.cdc.gov/EID/content/15/12/1897.htm, Dec. 2009. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref36] [36].Ma W., Vincent A.L., Lager K.M., Janke B.H., Henry S.C., Rowland R.R.R., Hesse R.A. and Richt J.A., “Identification and Characterization of a Highly Virulent Triple Reassortant H1N1 Swine Influenza Virus in the United States”, Virus Genes, vol. 40, no. 1, pp. 28–36, Feb. 2010. [DOI] [PubMed] [Google Scholar]

[ref37] [37].Vijaykrishna D., Poon L.L., Zhu H.C., Ma S.K., Li O.T., Cheung C.L., Smith G.J., Peiris J.S. and Guan Y., “Reassortment of Pandemic H1N1/2009 Influenza A Virus in Swine”, Science, vol. 328, no. 5985, p. 1529, June 2010. [DOI] [PMC free article] [PubMed] [Google Scholar]

[ref38] [38].Wheeler W., “Fixed Character States and the Optimization of Molecular Sequence Data”, Cladistics, vol. 15, pp. 379–385, 1999. [DOI] [PubMed] [Google Scholar]

PERMALINK

Reassortment Networks and the Evolution of Pandemic H1N1 Swine-Origin Influenza

Shahid H Bokhari

Laura W Pomeroy

Daniel A Janies

Abstract

1. Introduction

2. Overview of Workflow

2.1. Creating the Virus Database

2.2. Notation (Adapted from [19])

2.3. Computing Intersegment Distances

2.4. The Reassortment Network

2.5. Motivation for Reassortment Algorithm

2.6. Finding Shortest Paths

2.7. The Cray XMT

2.8. Parallelization of Algorithm

Fig. 2. The code for labeling events from viruses.

2.9. Implementation Details

3. S-OIV Evolution Models

Fig. 3. A model of influenza evolution based on [27] and [29].

Fig. 4. The model presented by Trifonov et al. [28] is a subset of the model in Fig. 3; it omits a stage of swine H3N2 viruses.

Fig. 5. The model of Smith at al. [30] differs significantly from the preceding two models. Here, “Eurasian H1N1 swine” do not stand in isolation but are derived from Avian H1N1.

4. Validation of Model

4.1. Constraints

Fig. 8. Part of the in-tree for A/swine/Nebraska/209/98(H3N2). The lowest cost paths (blue box) include reassortments with avian viruses to obtain PA and PB1.

4.2. Interpreting In-Trees

Fig. 6. A reassortment (top) and a mutation (bottom).

4.3. A/Swine/Colorado/1/1977

Fig. 7. Part of the in-tree for target A/swine/Colorado/1/1977(H3N2). There is a clear differentiation between human and nonhuman ancestors (blue cut).

4.4. A/Swine/Nebraska/209/98

5. Results

5.1. Parameters of 32-Stage Run

5.2. Suppressing Intra-S-OIV Events

5.3. Viewing the In-Trees

5.4. Bottleneck Viruses

TABLE 1. Numbers of Paths through Bottleneck Viruses.

Fig. 10. The six bottleneck viruses (Column 2) reassort with A/swine/HongKong/NS29/2009 to reach the target.

5.5. Spreadsheet

6. Discussion

TABLE 2. Reassortments of Bottlenecks 1-6 with Virus 7 to Obtain Target .

TABLE 3. Absolute Pairwise Distances between Bottleneck Viruses.

TABLE 4. The Four Shortest Paths with Two, Three, and Four Edges Each (*** = Bottleneck) (All Reassortments Match Those in Figs. 3, 4, and 5).

TABLE 5. Shortest Paths with 5—10 Edges (*** = Bottleneck) (All Reassortments Except Those Noted Match Figs. 3, 4, and 5).

TABLE 6. The Five Shortest Paths by Decade†.

6.1. Significance of Bottleneck Viruses

6.2. Nonbottleneck Viruses of Interest

6.3. Validation of Reassortments

6.4. Future Research

Acknowledgments

Biographies

Funding Statement

Footnotes

Contributor Information

References

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases

TABLE 6. The Five Shortest Paths by Decade^†.