A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations

Drew Mellor; Elena Prieto; Luke Mathieson; Pablo Moscato

doi:10.1371/journal.pone.0013055

. 2010 Oct 18;5(10):e13055. doi: 10.1371/journal.pone.0013055

A Kernelisation Approach for Multiple d-Hitting Set and Its Application in Optimal Multi-Drug Therapeutic Combinations

Drew Mellor ^1,², Elena Prieto ^1,², Luke Mathieson ^1,², Pablo Moscato ^1,^2,^*

Editor: Maria A Deli³

PMCID: PMC2956629 PMID: 20976188

Abstract

Therapies consisting of a combination of agents are an attractive proposition, especially in the context of diseases such as cancer, which can manifest with a variety of tumor types in a single case. However uncovering usable drug combinations is expensive both financially and temporally. By employing computational methods to identify candidate combinations with a greater likelihood of success we can avoid these problems, even when the amount of data is prohibitively large. Hitting Set is a combinatorial problem that has useful application across many fields, however as it is NP-complete it is traditionally considered hard to solve exactly. We introduce a more general version of the problem (α,β,d)-Hitting Set, which allows more precise control over how and what the hitting set targets. Employing the framework of Parameterized Complexity we show that despite being NP-complete, the (α,β,d)-Hitting Set problem is fixed-parameter tractable with a kernel of size O(αdk^d) when we parameterize by the size k of the hitting set and the maximum number α of the minimum number of hits, and taking the maximum degree d of the target sets as a constant. We demonstrate the application of this problem to multiple drug selection for cancer therapy, showing the flexibility of the problem in tailoring such drug sets. The fixed-parameter tractability result indicates that for low values of the parameters the problem can be solved quickly using exact methods. We also demonstrate that the problem is indeed practical, with computation times on the order of 5 seconds, as compared to previous Hitting Set applications using the same dataset which exhibited times on the order of 1 day, even with relatively relaxed notions for what constitutes a low value for the parameters. Furthermore the existence of a kernelization for (α,β,d)-Hitting Set indicates that the problem is readily scalable to large datasets.

Introduction

Typically the selection of a drug therapy for a disease is limited to a single drug, however diseases such as cancer may present as a heterogeneous mix of subtypes of the general disease. In cases such as these multi-drug therapies may prove more effective than single drug therapies, and many trials have been conducted to this end [1]–[3]. Furthermore combinations of drugs may allow a more targeted approach for a selection of subtypes of a disease, while minimizing effects on unaffected cells. Unfortunately with the abundance of compounds available for the treatment of many conditions of interest, the time and expense in testing even all two drug combinations may be prohibitive. Therefore a smarter approach is needed. Vazquez [4] introduces the Hitting Set problem for this task in the context of oncological drug therapy. The Hitting Set problem is a combinatorial problem that proves extremely useful in modeling a large variety of problems in many domains including protein network discovery [5], metabolic network analysis [6], diagnostics [7]–[9], gene ontology [10] and gene expression analysis [11], [12].

The Hitting Set Problem

Hitting Set is a combinatorial problem that models the problem of selecting a small group of elements to represent or cover a collection of sets. Such a group that covers every set in the collection is called a hitting set. Finding such a set without any constraint is simple, however if we required that the size of the hitting set be relatively small, the problem becomes computationally challenging ( Inline graphic -complete in a formal sense). This difficulty in obtaining solutions with desirable qualities thus requires more thoughtful approaches.

We now give some technical details and formal definitions of the problems of interest.

Hitting Set is equivalent to the Set Cover problem [13], and when otherwise unrestricted, is equivalent to the Red/Blue Dominating Set [14] problem and is related to the Inline graphic -Feature Set [15] problem.

The decision version of the Hitting Set problem is defined as follows:

Hitting Set

Instance: A set and a collection and an integer .

Question: Is there a set with such that for every we have ?

The set Inline graphic is called a hitting set for , or simply a hitting set. For an element and an element if we say that hits . This problem is -complete even when the maximum size of each element of is two (by equivalence with Vertex Cover [13]) and -complete for parameter ; Cotta and Moscato [16] give a parameterized proof via Inline graphic -Feature Set and Paz and Moran [17] give a proof which along with the equivalence of Hitting Set and Set Cover leads to the same result, though predates the parameterized complexity framework. However if we restrict the cardinality of the elements of to the problem, while remaining -complete, becomes fixed-parameter tractable where Inline graphic is a constant and the parameter is [18]. In this case the problem is known as the Hitting Set for Sets of Size or -Hitting Set problem. We note that Hitting Set has several equivalent formulations, in particular we choose to use the bipartite graph representation where and form the two partite vertex sets of the graph and an edge Inline graphic corresponds to the element being an element of . This allows us to employ some simplifying graph theoretic terminology and techniques. We generalize this problem to include the case where we may want the elements of to be hit more than once. In particular this includes the case where we ask if all the sets of Inline graphic can be hit times, but extends to the case where the elements of can be hit up to times. We encode this by the use of a hitting function . Our problem then becomes the -Multiple -Hitting Set (or ()-Hitting Set):

-Hitting Set

Instance: A bipartite graph where for all we have , a hitting function and an integer .

Question: Is there a set with such that for every we have ?

When Inline graphic for all , ()-Hitting Set can be -approximated in time [19], but cannot be approximated with a factor of for any unless [20].

Results and Discussion

The Fixed-Parameter Tractability of ()-Hitting Set

As we prove in the Materials and Methods section, the ( Inline graphic )-Hitting Set problem is fixed-parameter tractable, and indeed a more general variant the ()-Hitting Set problem is also fixed parameter tractable when we take the maximum degree of the class vertices as a constant and the size of the hitting set and the maximum desired coverage as a joint parameter. Though the problem is formally hard - which would normally give the intuition that an exact solution would be too expensive to compute - the fixed-parameter tractability indicates that it is likely that we can obtain an exact solution efficiently. Armed with this knowledge we proceed with the experiments of the following section, where we use the drug response data of the NCI60 anti-tumor drug screening program to determine a sets of drugs that hit cancerous cell lines multiple times. These drug sets are than mathematically supportable candidates for combination chemotherapies. Moreover we are able to tune the nature of the hitting sets via the numbers Inline graphic , and , which allows us to control which cell lines are targetted (and which are specifically not) and how much each cell line is hit in the solution.

A Comparative Application

The NCI60 human tumor anti-cancer drug screen dataset [21] was established in the 1980s as an enabling tool for anti-cancer drug development. Included in this dataset is response data for over Inline graphic drugs against the cell lines of the dataset. Vazquez [4] highlights the utility of a hitting set approach in developing multi-drug therapies for heterogeneous malignancies; given the plethora of available compounds, testing multi-drug combinations exhaustively is prohibitive if not impossible. Applying hitting set to efficacy data measured on an individual basis for each compound allows us to determine possible drug combinations that would provide the best chance of efficacy against many cancer types. Using the GI50 response NCI60 dataset (available from the DTP website [22]) Vazquez uncovers a minimum hitting set with three compounds that cumulatively gives a good response with all cell lines in the dataset, where a response is considered good if it is more than two standard deviations above the mean of the z-transformed response data. Vazquez uses first a greedy highest-degree-first approach to give an estimate of the maximum size of a minimum hitting set, followed by either an exhaustive search or simulated annealing, depending on the size of the hitting set. Vazquez reports times for such approaches on the order of one day on a desktop computer.

We revisit Vasquez's experiment, using data reduction (though it is not necessary to employ the more complex rules given in the kernelization proof) with IBM ILOG CPLEX [23] as the kernel solver by framing the problem as a integer programming problem. We use the same threshold for the z-transformation to identify significant response levels. Using this approach we reduce the time to solve the instance to less than Inline graphic seconds, where most of the time is spent loading and reducing the data, with CPLEX solving the integer programming instance in approximately milliseconds. Furthermore this approach guarantees optimality in the size of the hitting set.

From here we employ more a more recent version of the NCI60 dataset (2009 as compared to Vazquez's 2006). At the time of writing, the latest NCI60 dataset includes 14 additional cell lines, however we remove these, as there is insufficient response data in the dataset, leading to inflated hitting set sizes. The latest data also includes a further Inline graphic compounds. We note that employing the new GI50 response data we are able to uncover element hitting sets involving compounds not available in the earlier dataset (an example is given in Table 1 and Figure 1), in particular Everolimus (NSC 733504) a drug now used for the treatment of advanced renal cancer which is also giving positive results in phase II trials for metastatic melanoma [24], [25]. However there have recently been some concerns over the provenance of some of the cell lines in the NCI60 dataset. In particular Lorenzi et al. [26] suggested that the MDA-N cell line, nominally a breast cancer cell line is in fact similar the M14 and MDA-MB-435 cell lines, and thus should be is in fact a melanoma cell line. Chambers [27] however suggests that although M14 and MDA-MB-435 are identical cell lines, they may not in fact be melanoma cell lines. We do not attempt to resolve this dispute, however with regard to this, and as a indication of the flexibility of the method we employ we consider both the case where MDA-N is a breast cancer cell line and the the case where MDA-N is a melanoma cell line.

Table 1. Minimal hitting set using 2009 NCI60 data.

NSC Number	Compound Name
174121	Methotrexate Derivative
691039	(S)-7-Hydroxy-1,2,3-trimethoxy-10-methylsulfanyl-6,7-dihydro-5H-benzo[a]heptalen-9-one
733504	Everolimus/Afinitor

Open in a new tab

Minimal hitting set for NCI60 GI50 response data from 2009.

This hitting set hits all cell lines at least once, but is further optimized to hit all target cell lines the maximal number of times. Of particular note are NSC 174121, a methotrexate derivative and NSC733504, Everolimus/Afinitor, both known anti-cancer agents.

Employing the ( Inline graphic )-Hitting Set model gives more flexibility in what kind of therapy we would like to pursue. For instance, by choosing for all vertices, we are able to find a hitting set that hits every cell line at least twice (see Table 2). However the size of this hitting set is , which is likely to be beyond the point where the trade off between anti-cancer efficacy and side effects is acceptable. Fortunately we can exploit ( Inline graphic )-Hitting Set more intelligently. For example we may wish to find a hitting set that specifically targets breast cancer cell lines – for which we set all breast cancer cell line vertices to have and all other cell lines to have . This gives a hitting set that hits only breast cancer cell lines, which may be useful in minimizing unwanted peripheral damage to non-breast cancer cells. This gives a hitting set with three elements. In the case where we considered MDA-N to be a breast cancer cell line (see Table 3 and Figure 2) this set includes the compound deoxypodophyllotoxin, which is known to induce apoptosis [28]. If we consider MDA-N as a melanoma cell line we obtain a different hitting set (see Table 4 and Figure 3). If we relax our requirements an allow other cell lines to be hit at most once we can obtain a hitting set that hits the breast cancer cell lines more (Table 5 and Figure 4). The results when we set Inline graphic to for all breast cancer lines are given in Table 6 and Figure 5 (including MDA-N) and Table 7 and Figure 6 (excluding MDA-N). We note particularly that in the case where MDA-N is included, the optimal hitting set uncovered includes Docetaxel, a well known anti-cancer agent [29] for several cancer types including breast cancer. Interestingly Docetaxel is also currently included in several clinical trials examining its potential as part of a multi-drug therapy [30]–[34].

Table 2. Minimal double hitting set.

NSC Number	Compound Name
147340	Anisomycin hydrochloride
174121	Methotrexate derivate
314018	Ansamitocin derivate TN-006
691039	(7S)-7-hydroxy-1,2,3-trimethoxy-10-methylsulfanyl-6, 7-dihydro-5H-benzo[a]heptalen-9-one
712807	Capecitabine
733504	Everolimus/Afinitor

Open in a new tab

Minimal hitting set hitting each cell line at least twice.