Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines

Jeremy Goecks; Bassel F El-Rayes; Shishir K Maithel; H Jean Khoury; James Taylor; Michael R Rossi

doi:10.1002/cam4.360

. 2015 Jan 4;4(3):392–403. doi: 10.1002/cam4.360

Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines

Jeremy Goecks ^1,^✉, Bassel F El-Rayes ², Shishir K Maithel ³, H Jean Khoury ², James Taylor ⁴, Michael R Rossi ⁵

PMCID: PMC4380965 PMID: 25594743

Abstract

We describe open, reproducible pipelines that create an integrated genomic profile of a cancer and use the profile to find mutations associated with disease and potentially useful drugs. These pipelines analyze high-throughput cancer exome and transcriptome sequence data together with public databases to find relevant mutations and drugs. The three pipelines that we have developed are: (1) an exome analysis pipeline, which uses whole or targeted tumor exome sequence data to produce a list of putative variants (no matched normal data are needed); (2) a transcriptome analysis pipeline that processes whole tumor transcriptome sequence (RNA-seq) data to compute gene expression and find potential gene fusions; and (3) an integrated variant analysis pipeline that uses the tumor variants from the exome pipeline and tumor gene expression from the transcriptome pipeline to identify deleterious and druggable mutations in all genes and in highly expressed genes. These pipelines are integrated into the popular Web platform Galaxy at http://usegalaxy.org/cancer to make them accessible and reproducible, thereby providing an approach for doing standardized, distributed analyses in clinical studies. We have used our pipeline to identify similarities and differences between pancreatic adenocarcinoma cancer cell lines and primary tumors.

Keywords: Analysis pipelines, bioinformatics, galaxy, genomic tumor profiles, pancreatic cancer

Background

A promising path toward personalizing cancer treatment is using genomic features of tumors to guide treatment. Tumor features such as gene mutations ¹^,², differential gene expression ³^,⁴, and structural variation ⁵^,⁶ have proven useful in predicting and personalizing cancer treatment. For the majority of tumors, though, finding a single feature that leads to a definitive treatment with durable response has been elusive. Therefore, developing effective treatments for many tumor types requires multiple targeted approaches informed by comprehensive tumor profiles merged with public and private patient data to identify precise targets ⁷.

Comprehensive genomic profiles of tumors derived from high-throughput sequencing data holds significant promise for better understanding the biology which drives their growth and resistance to standard therapies ⁸^,⁹. New information can be derived by combining data from multiple characteristics. For instance, mutations in overexpressed genes can be found by combining mutations from exome resequencing with gene expression computed from transcriptome sequencing. Activating mutations in genes that drive growth and proliferation are often promising drug targets and many current cancer therapies have been based on the concept of oncogene addiction ¹⁰^–¹².

For cancers with poor outcomes, using a multi-faceted tumor profile to identify better targeted agents or combinations of drugs is required. Large public databases that include cancer genome information such as COSMIC ¹³, the Drug–Gene Interaction (DGI) Database ¹⁴, and the Cancer Cell Line Encyclopedia ¹⁵, are making this task more feasible. Current approaches match a known genomic aberration to a known drug, such as the BRAF p.V600E mutation and vemurafenib ¹⁶^,¹⁷, but there is an increasing need to test combinations of drugs in clinical trials. However, many of these trials require preclinical models for evidence of efficacy, and most of these models currently fail to account for multiple somatic events that contribute to therapeutic response. Because the use of cell lines for personalized oncology appears to be a more cost effective approach than xenograph models ¹⁸, we have chosen to develop a tool that aligns individual tumor data with available cell lines in an effort to help accelerate precision investigation of preclinical models of therapeutic response.

Realizing an approach to personalized oncology that creates an integrated genomic profile of a tumor and then uses the profile together with large public databases is a challenging endeavor that requires pipelines with many steps and tools. Ensuring that these pipelines and their output are accessible to research-clinicians, especially those with limited computational skills, is critical. It is also important that these pipelines yield reproducible analyses so that their results can be used and also serve as a foundation for future work ¹⁹. For these reasons, a pipeline/workflow platform is ideal. Pipeline platforms such as GenePattern ²⁰, Taverna ²¹, and Synapse ²² have been used for cancer genomics but, to the best of our knowledge, not for personalized oncology.

We have developed three pipelines for personal oncology and integrated them into Galaxy, a popular Web-based genomic workbench that supports pipelines ²³^–²⁶. Collectively, these pipelines—an exome analysis pipeline, a transcriptome (RNA-seq) analysis pipeline, and an integrated variant analysis pipeline—analyze a tumor sample to identify rare and deleterious mutations, druggable mutations, and drugs which may be effective for a tumor. Integrating the pipelines into Galaxy makes the pipelines and the data produced from them widely accessible, reproducible, and sharable. Galaxy's Web interface ensures accessibility and reproducibility of the pipelines for a wide audience, especially those with limited programming skills. Galaxy's collaboration framework provides a channel for widely sharing the pipelines and ensuring that others can easily use them. Together, Galaxy and the pipelines facilitate standardization of a data analysis platform that can run locally with appropriate securities but the analyses can be easily shared and collated across sites in multicenter clinical trials.

We have validated our pipelines by analyzing high-throughput sequencing data from three well-characterized pancreatic cancer cell lines. Finally, we have used the pipelines to identify mutational similarities and differences between the cell lines and six primary pancreatic adenocarcinoma (PAC) tumors.

Implementation

We have created three general pipelines that work together (Fig.1):

An exome processing pipeline analyzes whole or targeted tumor exome resequencing data and identifies small variants (SNPs and indels).
A whole transcriptome (RNA-seq) processing pipeline analyzes tumor RNA-seq data (a) to find small variants and gene fusions and (b) computes gene expression.
An integrated variant analysis pipeline that processes variants from the exome or transcriptome pipelines, together with public databases, to identify (i) rare and deleterious (RD) variants; (ii) druggable RD variants and associated drugs. When gene expression data are available from the transcriptome pipeline, an integrated analysis is performed to identify (iii) RD variants in highly expressed genes; (iv) druggable RD variants in highly expressed genes and associated drugs.

The integrated analysis in the final pipeline focuses on variants in highly expressed mutant transcripts likely to be druggable targets. The tools chosen for these pipelines are widely used and well maintained, ensuring that they perform well on a variety of different data. However, alternative tools can also be incorporated to these pipelines as required.

Exome pipeline

This pipeline generates a list of small variants (SNPs, insertions, and deletions) from either whole or targeted tumor exome resequencing data. In order, reads are mapped using BWA ²⁷, PCR duplicates are removed using Picard (http://picard.sourceforge.net/), and variants are called using VarScan2 ²⁸. This approach—mapping reads, remove duplicates, and calling variants—is well established for obtaining variants from exome data. No matched normal exome sequencing data are required to run this pipeline.

Transcriptome (RNA-seq) pipeline

This pipeline uses RNA-seq data to characterize a tumor in three ways: small variants, gene fusions, and gene expression. The first step in the pipeline is mapping RNA-seq reads using Tophat2 ²⁹. Obtaining small variants from mapped reads is done the same way as it is in exomes: PCR duplicates are removed and variants are called. This approach for calling variants from RNA-seq has been validated previously ³⁰. The pipeline uses the popular Tophat-Cufflinks protocol ³¹ produce gene expression results and Tophat-Fusion ³² to detect potential fusions.

Integrated variant analysis pipeline

This pipeline analyzes variants from the exome or transcriptome pipeline to identify rare, deleterious (RD) variants and also druggable RD variants together with associated drugs. When transcriptome data are available, variants and drugs in highly expressed genes are computed as well.

The first step in this pipeline is removing variants resulting from sequencing errors, and this is done by removing variants with a low allele frequency. The minimum required allele frequency can be set when the workflow is run. We found that using a minimum allele frequency of 10% worked well for variants called from exome sequencing of a homogenous population of cells, such as a cultured cell line. Conversely, for variants called from transcriptome sequencing of a tumor, which has a mixed population of cells, we found that an allele frequency of 30% worked well.

Next, ANNOVAR ³³ is used to annotate variants with mutation type (e.g. synonymous, stop-gain, frameshift), allele frequencies in common public databases such as 1000 Genomes ³⁴ and the Exome Sequencing Project ³⁵, and COSMIC ¹³ annotations. To obtain rare and deleterious variants, variants with a minor allele frequency greater than 0.01% in either 1000 Genomes and the Exome Sequencing Project data are removed and only nonsynonymous mutations, frameshifts, and stop-gain/losses are kept. Additional annotations can be added and more aggressive annotation filtering can be applied as required by changing the filtering criteria.

Next, the rare and deleterious mutations are annotated using the DGI database, a meta-database that includes gene–drug interaction results from many expert-curated and automatically generated drug databases ¹⁴. For each gene that has a rare and deleterious mutation, the DGI database provides a list of drugs that are thought to be effective, along with the database source of the interaction and, if available, interaction type (e.g., inhibitor). By default, only expert-curated results are included in the pipeline's results.

The final outputs of the pipeline are (a) a list of rare and deleterious mutations; (b) a list of druggable rare and deleterious mutations; and (c) a list of potential drugs associated with the druggable mutations. When gene expression data are available from the transcriptome analysis pipeline, variants are annotated with the expression level for the gene where they occur. Then, variants with expression greater than 10 FPKM are labeled as highly expressed and the pipeline identifies rare, deleterious variants in highly expressed genes, the subset of these variants that are druggable, and associated drugs. As noted earlier, variants in expressed genes and drugs targeting variants in expressed genes may be especially interesting to clinician-researchers. The goal is to use these lists of rare, deleterious and druggable mutations to in tumors to direct rational preclinical testing of combination therapies in cell lines and eventually treatment of patients.

Pipeline separation

Variant analysis and interpretation is a challenging part of using molecular profiles to personalize cancer treatment, and we anticipate that investigators may try experimenting with and adapting the variant analysis workflow to meet their needs. The separation in our pipelines affords experimentation with the variant analysis workflow. The exome and transcriptome analysis pipelines perform operations that are self-contained, resource-intensive, and slow. Read mapping, variant calling, and quantifying gene expression are most resource-intensive steps, each requiring two or more hours on typical computing clusters to complete. On the other hand, the variant analysis pipeline integrates the outputs of the exome and transcriptome pipeline with public databases, requires few resources, and is very fast. Using a personal computer, the whole pipeline typically finishes in less than 30 min. By placing the slow steps in the exome and transcriptome analysis pipelines, it is fast and easy to experiment with the variant analysis workflow, such as by changing the allele frequency or the databases used.

Advantages of galaxy integration

We have implemented these pipelines as workflows in Galaxy (http://galaxyproject.org), a Web-based workbench for doing genomic analyses. Galaxy integration offers many benefits for investigators using these pipelines.

Galaxy can be accessed in a variety of different ways, depending on an investigator's bioinformatics skills. Investigators with limited bioinformatics experience can use Galaxy via a graphical Web-based interface for running workflows and visualizing and sharing data. Only a Web browser is required to use all of Galaxy's features and a novice bioinformatician can easily upload FASTQ or BAM files, execute workflows, and generate summary tables and graphics. For investigators with significant bioinformatics experience, Galaxy's API can be used to run workflows from scripts. Using the Galaxy API and Bioblend ³⁶, we developed Python scripts to automatically execute the pipelines on sequencing data from numerous pancreatic cancer samples, the results of which we discuss in detail below. We used Galaxy's Web interface to experiment with different settings for our workflows, visualize results, and share our workflows and data.

Galaxy records the inputs and parameters used for workflows and tools, so every pipeline run is recorded and reproducible. Data produced from our pipelines can be visualized in Galaxy's visual analysis framework ³⁷^,³⁸. Investigators can visualize the very large data sets produced by the pipelines in their Web browser using a genome browser, Circos plot ³⁹, and other visualizations. Investigators can also experiment with and visualize tool output using different parameter values in order to choose parameters best suited to their analyses.

We have used Galaxy's sharing features to make our pipelines widely available. We created a Galaxy Page (http://usegalaxy.org/cancer) as an online, interactive supplement for this work. The page briefly describes the workflows, and embedded in the page are the workflows themselves, analysis histories generated from the pipelines using cancer cell line data, and visualizations of data generated from the pipeline. From the page, investigators can copy embedded histories, workflows, and visualizations into their workspace and immediately start using them.

The workflows can also be downloaded and run on a local Galaxy instance. Because Galaxy workflows are portable, investigators in a large, distributed clinical trial can use the same standardized workflows in multiple locations. Using the same workflows is advantageous both for sharing data and for reproducing analyses. In addition, workflows that have proven successful in previous clinical trials can be widely disseminated and used in the future.

Finally, our workflows can be copied and modified to suit individual analysis needs. Using Galaxy's Web interface, any investigator can edit a workflow, regardless of their programming experience. Potential workflow edits include changing parameter settings and substituting a new tool into a workflow. For example, instead of using VarScan as a variant caller in, another variant caller could be used. As better performing tools become available in Galaxy, we intend to introduce them into our curated pipelines to ensure that our pipelines use robust algorithms.

Design philosophy

Our pipeline development approach is motivated by a few key principles. We used open-source tools to make our pipelines widely available and transparent. When available, established and/or best practices are used. We designed the pipelines to be modular so that different components could be substituted or added and parameters could be modified. For example, instead of automatically filtering (reducing) variants based on certain criteria such as minor allele frequency, variants are annotated and then an explicit filtering operation is applied. This is very useful within the context of Galaxy because investigators can easily modify workflows, such as by changing variant filtering criteria, using its graphical editor. Modularity ensures that the pipelines can evolve and incorporate new tools as they become available rather than requiring the development of new pipelines.

The exome and transcriptome analysis pipelines require vastly more time and computing resources than the variant analysis pipeline: the exome/transcriptome processing pipelines require about a day to complete on a small computing cluster, while the integrated variant analysis pipeline can be run in less than an hour. Also, there are established protocols for exome and transcriptome processing but less so for variant analysis. Hence, by splitting the pipelines up as we have and putting the pipelines in Galaxy, it is simple and fast to experiment with different settings in the variant analysis pipeline and find settings that are most useful for a particular set of samples.

Results

Validation using cell line data

To validate our pipelines, we analyzed targeted exome and whole transcriptome sequencing data from three well-characterized pancreatic cancer cell lines: MIA PaCa2 (MP), HPAC, and PANC-1. Exonic regions of 577 genes that are commonly included in cancer gene panels were sequenced. All three cell lines are included in the Cancer Cell Line Encyclopedia (CCLE) ¹⁵; the CCLE includes a mutational profile for known oncogenes and drug response information for each cell line. The goal of this analysis is to use our pipelines to process the cell line sequence data, compare the output from our pipelines to CCLE entries, and determine whether our pipelines produce results that concur with known findings. Concordance with known findings will validate our pipelines’ performance Figure2A shows an interactive Galaxy-Circos plot of data generated from analysis of the MIA PaCa2 cell line.

Galaxy Circos plot showing data produced from (A; at top) exome and transcriptome analysis of Mia PaCa2 cell line and (B; at bottom) transcriptome analysis of a pancreatic adenocarcinoma tumor. Starting at the innermost track, the data are: (i) mapped read coverage; (ii) mapped read coverage after PCR duplicates removed; (iii) called variants; (iv) rare and deleterious variants; (v) rare, deleterious, and druggable variants; (vi) rare and deleterious variants in highly expressed genes; (vii) rare, deleterious, and druggable variants in highly expressed genes. Read coverage data shown are for mapped exome reads for cell line and mapped transcriptome reads for tumor.

Table S1 summarizes the results obtained from the exome pipeline for each cell line's exome sequencing data, and Table S2 summarize results from the transcriptome pipeline for each cell line's transcriptome sequencing data. Between 6200 and 7000 variants were identified in each cell line's targeted exome, and 27-33% of genes in each cell line were expressed at greater than 10 FPKM, the threshold for highly expressed genes in the variant analysis pipeline.

Table1 lists gene fusions found by the transcriptome analysis pipeline and summarizes the results obtained from the variant analysis pipelines for each cell line. The CCLE includes 84 mutations—single-nucleotide polymorphisms (SNPs) and small insertions/deletions—in MP and 2 each for HPAC and PANC-1. 23 MP mutations and all mutations for the other cell lines fall within our targeted exome regions. All CCLE mutations, including SNPs and small insertions and deletions, were found in our cell lines. About 30 rare, deleterious mutations were found in each cell line, with 4-6 found in the COSMIC cancer database for each line. Rare, deleterious, and druggable mutations were reported in many genes associated with cancer, including ALK, CDKN2A, KRAS, NOTCH1, TOP1 (topoisomerase 1), and TP53. Reported druggable mutations in highly expressed genes occurred in CDKN2A, KRAS, NOTCH1, and TP53.

Table 1.

Results obtained using molecular profiling and drug targeting pipeline on three common pancreatic cancer cell lines

	MIA PaCa2	HPAC	PANC-1
Gene fusions	CRIM1-IQCA1	IRAK3-RBMS1	None
Gene fusions	BCAR3-GCLM	IRAK3-RBMS1	None
Variants (ts/tv ratio)	6214 (2.14)	6990 (2.13)	6821 (2.15)
Rare and deleterious (RD) variants	31	31	25
RD variants in COSMIC	6:	6:	4:
	516	521	521
	10656	12479	10660
	28763	132780	28763
	132780	256119	1133963
	256119	1133963
	431727	1182405
Genes with RD variants	20	21	18
RD and druggable variants [in COSMIC]	5 [3 in COSMIC]	4 [2]	4 [3]
Druggable genes	5:	4:	4:
	BCR	CDKN2A	ALK
	BIRC3	KCNH2	KRAS
	KRAS	KRAS	NOTCH1
	NOTCH1	TP53	TP53
	TP53
Potential drugs	31 drugs	28	29
RD variants (expression filtered)	8	10	10
RD variants in COSMIC (expression filtered)	3:	3:	3:
	516	521	521
	10656	12479	10660
	28763	1133963
Genes with RD variants (expression filtered)	5	9	8
RD, and druggable variants [in COSMIC] (expression filtered)	3 [3 in COSMIC]	2 (2)	2 (2)
Druggable genes (expression filtered)	3:	2:	2:
	KRAS	CDKN2A	KRAS
	NOTCH1	KRAS	TP53
	TP53
Potential drugs (expression filtered)	22 drugs	19	18

Open in a new tab

Molecular profiling includes mutations and gene expression data obtained by analyzing high-throughput sequencing data from targeted exome (577 genes often included in cancer panels) and whole transcriptome sequencing assays. The ts/tv metric is the ratio between mutation transitions versus transversions. RD, rare and deleterious.

These mutation results are consistent with the CCLE data. The CCLE includes drug response data for MP, HPAC, and for two cell lines that have a mutational profile similar to PANC-1: KP-1N and KP-1NL. As expected, all cell lines show deleterious mutations in KRAS ⁴⁰^,⁴¹. Although KRAS has long be considered an undruggable target, new strategies that look beyond canonical Ras-Raf-MEK-ERK pathway signaling to target mutant KRAS are promising ⁴²^–⁴⁴. CCLE drug response data indicate that all cell lines appear responsive to MEK inhibitors, which are supported by animal models ⁴⁵. Curiously, the outputs for CDKN2A and TP53, known tumor suppressor genes, was contrary to the expectation of loss of function and indication of their druggability highlights the requirement for tertiary filtering of actionable changes by knowledgeable end users.

For MIA PaCa2, rare and deleterious variants were found in five potential druggable genes, with variants in three expressed genes: KRAS, TP53, and NOTCH1. CCLE drug response data for MIA PaCa2 indicate that it is also sensitive to compounds that target MEK. Comparing overall drug response profiles, the KP-1N and KP-1NL cell lines show less response than MIA PaCa2 and HPAC, which agree with the data in Table1 showing fewer known druggable mutations and genes in PANC-1 as compared to MIA PaCa2 and HPAC. Gene fusions were found in MIA PaCa2, and a single fusion was found in HPAC.

Comparing primary PAC tumors with cell lines

We have applied our pipelines to compare six primary PAC tumors with the three cell lines discussed previously. We sequenced six primary PAC tumors using whole transcriptome sequencing. Exome sequencing was not performed for these tumors, which provided an opportunity to use RNA-seq exclusively for characterizing PAC tumors. Figure3 shows data generated from the analysis of one tumor using a Circos plot generated in Galaxy.

Shared mutations between three pancreatic cancer cell lines—MIA PaCa2, HPAC, and PANC-1—and three primary pancreatic adenocarcinoma (PAC) tumors. From top to bottom, left to right: (A) all shared rare and deleterious mutations; (B) shared rare, deleterious, and druggable mutations; (C) shared rare and deleterious mutations in highly expressed genes; and (D) shared rare, deleterious, and druggable mutations in highly expressed genes. Asterisks (*) indicate that multiple cell lines or tumors showed mutations in the gene. Shared mutations were found in genes specifically associated with PAC (*KRAS*, *PDE4DIP*) and also genes implicated in many cancers (*CDKN2A, TP53*). Potentially druggable mutations in highly expressed genes shared between cell lines and tumors reside in two oncogenes: *KRAS* and *TP53*. Lastly, druggable mutations are significantly more limited in the cell lines as compared to tumor mutations.

Table S2 summarizes results obtained from using the transcriptome pipeline to analyze tumor sequence data, including mapped reads, gene expression, and called variants. Table2 lists gene fusions found and summarizes results obtained from the variant analysis pipelines for each tumor; input to this pipeline was the gene expression and variant datasets produced from the tumor transcriptome pipeline. Figure2B shows an interactive Galaxy-Circos plot of data generated from analysis of tumor 2.

Table 2.

Results obtained using molecular profiling and drug targeting pipeline on RNA-seq data for six primary pancreatic cancer tumors

	T1	T2	T3	T4	T5	T6
Gene fusions	KANSL1-ARL17A	TRIM2-ANXA2	None	None	None	None
Gene fusions	KANSL1-ARL17A	KANSL1-ARL17A	None	None	None	None
Variants (ts/tv ratio)	57,388 (3.02)	119,188 (2.88)	96,212 (2.91)	54,960 (2.88)	77,385 (2.84)	62,532 (2.87)
Rare and deleterious (RD) variants	454	664	665	516	521	372
RD variants in COSMIC	13	15	12	11	8	8
Genes with RD variants	358	522	513	401	395	292
RD and druggable variants [in COSMIC]	4 [0]	15 [2]	16 [2]	11 [0]	10 [1]	6 [0]
Druggable genes	4:	14:	15:	9:	8:	6:
	CEACAM1	CDH1	ERBB2	CA4	ATM	F5
	HDAC7	CDKN2A	F8	CLCN2	CHEK2	FGF2
	MYC	CENPE	HCAR2	COL1A1	CSF2RA	NOTCH3
	UGCG	COL1A1	JAK2	CSF2RA	ITGAL	SCNN1D
		ERBB2 F8	KRAS	FN1	MAPK14	SIRT1
		HDAC9	MMP13	HDAC1	SCTR	TGFB1
		KDR	MUC16	HTR2B	TP53
		KRAS	MYC	PSENEN	XIAP
		MUC16	PRKAB1	ZHX2
		NOTCH4	PSMC6
		PIK3C2B	RICTOR
		PRKCA	SCNN1D
		PSMD1	SCTR
			TP53
			TRPV6
Potential drugs	14 drugs	117	65	31	21	9
RD variants (expression filtered)	149	181	196	175	176	89
RD variants in COSMIC (expression filtered)	1	2	5	3	6	2
Genes with RD variants (expression filtered)	126	151	155	141	135	71
RD, and druggable variants (in COSMIC) (expression filtered)	3 (0)	5 (1)	6 (2)	5 (0)	5 (1)	0 (0)
Druggable genes (expression filtered)	3:	5:	6:	5:	4:	0
	CEACAM	CDH1	ERBB2	COL1A1	ITGAL
	HDAC7	COL1A1	KRAS	FN1	MAPK14
	MYC	ERBB2	PRKAB1	HDAC1	SCTR
		KRAS	PSMC6	HTR2B	TP53
		PSMD1	SCTR	PSENEN
Potential drugs (expression filtered)	13 drugs	42	43	24	6	0

Open in a new tab

The ts/tv metric is the ratio between mutation transitions versus transversions. RD, rare and deleterious.

These results show the challenges inherent in sequencing PAC tumors. PAC tumors are very difficult to biopsy or remove cleanly, and sequenced tumor samples nearly always include significant amounts of stromal (normal) tissue. Sequence data obtained from a mixed population of tumor and stromal cells often masks signals, and we found this to be true for our tumors as well. Although the majority of PAC tumors show KRAS mutations ⁴⁶, we found KRAS mutations in only two tumors analyzed. This appeared to be due to lack of read coverage for KRAS in the other tumors sequenced, and we asked for re-evaluation of the tumor sections by a certified pathologist. This re-review by a pathologist identified very low tumor percentage in two of the four KRAS wild-type samples that we sequenced, which emphasizes the invaluable caveat of appropriate sample quality control assessment by a trained pathologist prior to nucleic acid preparation and sequencing ⁴⁷. Finally, false positives for mutations in homologous genes are common when using RNA-seq because mapping spliced reads is difficult. However, the stringent variant filtering in our pipeline is designed to effectively remove the great majority of these false positives.

Comparing mutations between cell lines and sequenced PAC tumors (Fig.3) shows important similarities and differences. All three cell lines and two tumors share KRAS mutations, with the HPAC and PANC-1 cell lines and two tumors sharing the exact mutation (COSMIC521). This observation aligns with general consensus about the importance of the KRAS pathway in PAC. Tumors and cell lines also share rare, deleterious mutations in the following genes: AKAP9, CDKN2A, PDE4DIP, PMS1, TP53, TSC2, and ZMYM2. TP53 mutations appeared in two cell lines and three tumors, and rare and deleterious mutations in PDE4DIP were most prevalent, appearing in all tumors and MIA PaCa2 and PANC-1 cell lines. A recent whole-genome sequencing study of PAC also found evidence of mutations in PDE4DIP ⁴⁸, which has been reported as being highly expressed in esophageal squamous cell carcinoma ⁴⁹. Also, an intronic SNP (rs2863344) in PDE4DIP has been associated with response to capecitabine ⁵⁰, a common therapy for gastric and breast cancers, which indicates that expression and mutation status of this gene may be relevant to an informed treatment strategy for pancreatic cancer.

Mutations in the genes COL1A1, ERBB2 (aka HER2), and SCTR were found in two tumors each but not in any cell lines. ERBB2 mutations may be of particular interest because the two tumors that include KRAS mutations also include ERBB2 mutations, and KRAS and ERBB2 are thought to function jointly to drive tumor growth ⁵¹. Because ERBB2 is a druggable target ⁵², cell lines that exhibit only KRAS but not ERBB2 mutations may not be appropriate models for PAC tumors with both KRAS and ERBB2 mutations. Two tumors also show evidence of a KANSL1-ARL17A gene fusion, but there is no evidence of this fusion in the cell lines. Other gene fusions in cancer cell lines include ARL17A ³², so the presence of a KANSL1-ARL17A in PAC tumors may warrant additional investigation.

Performance and scalability

We analyzed the tumor cell line sequence data using the public Galaxy server (http://usegalaxy.org). To ensure privacy of patient sequence data, patient tumor data were analyzed using Galaxy installed on a local computing cluster. Galaxy integrates well with many different high-performance computing clusters and can scale to use all available computing resources to process very large tumor sequencing datasets. Given high-performance computing resources, then, Galaxy and our pipelines can analyze arbitrarily large tumor sequence data sets.

The main public Galaxy instance (http://usegalaxy.org) runs jobs on TACC, which is part of the XSEDE national high-performance computing environment (https://www.xsede.org). The largest cell line data set is the PANC-1 exome sequence, which includes ∽151 million 100bp paired-end reads or ∽30 billion bases. The exome analysis pipeline is perhaps the most time-intensive pipeline with three long processes—read alignment, duplicate removal, and variant calling. On the main public server, this pipeline ran in ∽36 h, although 50% of this time was spent waiting for computing resources to become available. Thus, compute time for exome analysis pipeline was ∽18 h. The largest RNA-seq data set came from Mia PaCa2, which contains ∽31 million 100bp paired-end reads or ∽6 billion bases, and the transcriptome analysis pipeline ran in ∽24 h, including waiting time. Compute time for the transcriptome pipeline, then, is ∽12 h. The integrated variant analysis pipeline runs in ∽15 min for all cell line data sets and has no waiting time because analysis steps are not compute intensive.

Patient tumor RNA-seq data are smaller than the cell line RNA-seq data, averaging ∽28 million 100bp paired-end reads or ∽5.6 billion bases. On two dedicated compute nodes, each with 24 compute processors, the transcriptome analysis pipeline and variant calling completed in ∽18 h and the integrated variant analysis ran in ∽15 min.

Discussion

Using Galaxy as a platform for cancer genome analysis pipelines has important advantages for translational cancer research and applications. Galaxy pipelines provide completely specified analyses that can be used as standardized analysis protocols to generate uniform data. Standardized analyses and uniform data can improve clinical studies by making it possible to do reproducible analyses across different sites and to share and aggregate data.

Galaxy also provides other features necessary for doing high-quality cancer genome clinical studies. Galaxy can serve as an analysis hub for clinical studies, which often include a mixture of personnel, only some of which have programming expertize. Bioinformaticians can automate analyses using Galaxy's API, while investigators without programming knowledge can use Galaxy's Web interface to view and run pipelines using only a small number of mouse clicks. Galaxy, then, makes analysis tools and workflows available to all personnel in a clinical study.

Galaxy pipelines are modular so that investigators can update pipelines as new tools become available; however, pipelines are also versioned so that previous iterations are saved and recoverable. Finally, Galaxy provides infrastructure for visualizing, reproducing, and sharing analyses, all of which are essential for clinical studies.

Despite the value of these computational tools, this investigation also highlights the challenges in interpreting and using tumor genomic features to guide treatment. Our pipelines identified KRAS and TP53 as potentially druggable targets for both the cell lines and tumors, which can be misleading, particularly in a clinical context. This emphasizes the critical need to have knowledgeable end users to interpret the data as well as the necessity for more robust and comprehensive druggable mutation databases. However, we are confident that this tool can be used to find actionable targets as well as identify the most appropriate cell lines, particularly those that are nontraditional research models, to be used as preclinical models that more closely match tumor genotypes. In fact, using CCLE data with our tumor data and this analysis pipeline (data not shown), it appeared that the less common KP-1N and KP-1NL cells, may be better preclinical models for the tumors that we tested than the more commonly used MIA PaCa2, HPAC, and PANC-1 cell lines.

Conclusions

There are many computational challenges that arise when developing translational cancer genomics applications, particularly those with the goal of personalized oncology. Multi-tool pipelines are required to analyze and integrate different types of -omic data and to combine private patient data with public databases. These pipelines require appropriate securities to protect patient privacy and need to be accessible to investigators without programming experience. Finally, they should be completely reproducible and transparent so that the pipelines can be used for standardized analysis and data produced from the pipelines can be readily compared across clinical settings.

The pipelines discussed in this paper are a first attempt to meet these criteria for tumor variant analysis. Our pipelines provide end-to-end support for analyzing variants from high-throughput exome and transcriptome tumor sequencing. The exome analysis pipeline call variants, and the transcriptome analysis pipeline call variants, computes gene expression, and identifies fusion genes. The variant analysis pipeline annotates and filters variants to identify rare, deleterious variants that are likely associated with disease, and further provides lists of rare, deleterious variants in expressed genes as well as those that are druggable. These pipelines are made widely accessible and reproducible via their integration with Galaxy. Galaxy also provides useful visualization and sharing features for pipelines and produced data.

We used these pipelines to analyze sequence data from six PAC tumors and three common cell lines. We validated previously published mutational and drug response data for the cell lines. Our analysis of the tumors showed that they shared common KRAS mutations with the cell lines. However, the tumors also exhibited ERBB2 mutations not found in the MIA PaCa2, HPAC, and PANC-1 cell lines, indicating the need to re-evaluate preclinical models of therapeutic response in the context of genomic medicine.

Methods

Cell Line and tumor tissue acquisition and processing

The MIA PaCa2, HPAC, and PANC-1 cell lines were obtained as frozen aliquots from ATCC (http://www.atcc.org/). A total of six de-identified pancreatic tumor frozen specimens were available for this study through an IRB approved tissue banking protocol. Genomic DNA and total RNA were isolated using Omega BioTek (http://www.omegabiotek.com/) chemistries according to the manufacturer's protocols. DNA was quantitated using NanoDrop and Qubit, and RNA was quantitated using NanoDrop and Agilent BioAnalyzer.

Library preparation and sequencing

Total RNA from pancreatic cell lines and tumor tissue all had RIN>8.0 and were prepared using the Illumina TruSeq RNA kit (v1) according to manufacture's protocols. Final RNA-Seq libraries were quantitated using qPCR and Agilent BioAnalyzer and sequenced using 100 bp paired-end reads at 100,000 reads per sample with an Illumina HiSeq 2000 instrument.

Custom cancer exome sequencing (WES) was performed using genomic DNA prepared from the three pancreatic cell lines. Libraries were prepared using a 577 gene cancer exome panel designed and run in duplicate using Agilent SureSelect and NimbleGen SeqCapEZ library preparation methods. All three cell lines were run as SureSelect and SeqCapEZ libraries, for a total of six libraries that were sequenced in a single lane of a 100 bp paired-end run on a HiSeq 2000.

FASTQ file generation and initial data QC were performed using a CASAVA v1.8.1 software (Illumina) for both the RNA-Seq and cancer exome data sets. Uniformity of coverage and overall data quality for the cancer exomes was consistent with what has been reported previously for Agilent SureSelect and NimbleGen SeqCapEZ whole exome sequencing kits ⁵². FASTQ files were used as the input data for the Galaxy analysis pipeline. Cell line exome and transcriptome sequencing data is available in two places: (a) in the NCBI SRA under accessible numbers SRX472933 and SRX472980 (Mia PaCa2), SRX472944 and SRX473000 (HPAC), and SRX472948 and SRX473014 (PANC-1); and (b) in the main public Galaxy instance at http://usegalaxy.org in a data library named ‘Cancer Cell Lines.’

Acknowledgments

Efforts of the Galaxy Team (E. Afgan, D. Baker, D. Blankenberg, D. Bouvier, M. Cech, J. Chilton, D. Clements, N. Coraor, Carl Eberhard, D. Francheteau, S. Guerler, J. Jackson, G. Von Kuster, N. Stoler, R. Lazarus) were instrumental in making this work happen. This project was supported in part by American Recovery and Reinvestment Act (ARRA) funds through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health, as well as grants HG005133 and HG006620.

Conflict of Interest

None declared.

Supporting Information

Table S1. Analysis results from targeted exome sequencing of cancer cell lines. Targeted exome sequenced 577 genes commonly found in cancer panels.

Table S2. Analysis results from whole transcriptome sequencing (RNA-seq) of cancer cell lines.

Table S3. Analysis results from whole transcriptome sequencing (RNA-seq) from six primary pancreatic cancer tumors. Variant analysis and expression analyses were performed.

cam40004-0392-sd1.docx^{(112.9KB, docx)}

References

Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science. 2004;304:1497–1500. doi: 10.1126/science.1099314. [DOI] [PubMed] [Google Scholar]
Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 2011;364:2507–2516. doi: 10.1056/NEJMoa1103782. [DOI] [PMC free article] [PubMed] [Google Scholar]
Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]
Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet. 2003;362:362–369. doi: 10.1016/S0140-6736(03)14023-8. [DOI] [PubMed] [Google Scholar]
Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007;448:561–566. doi: 10.1038/nature05945. [DOI] [PubMed] [Google Scholar]
Slamon DJ, Leyland-Jones B, Shak S, Fuchs H, Paton V, Bajamonde A, et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N. Engl. J. Med. 2001;344:783–792. doi: 10.1056/NEJM200103153441101. [DOI] [PubMed] [Google Scholar]
Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]
Stratton MR. Exploring the genomes of cancer cells: progress and promise. Science. 2011;331:1553–1558. doi: 10.1126/science.1204040. [DOI] [PubMed] [Google Scholar]
Roychowdhury S, Iyer MK, Robinson DR, Lonigro RJ, Wu YM, Cao X, et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. Sci. Transl. Med. 2011;3:111ra121. doi: 10.1126/scitranslmed.3003161. [DOI] [PMC free article] [PubMed] [Google Scholar]
Weinstein IB. Joe A. Oncogene addiction. Cancer Res. 2008;68:3077–3080. doi: 10.1158/0008-5472.CAN-07-3293. discussion 3080. [DOI] [PubMed] [Google Scholar]
Torti D. Trusolino L. Oncogene addiction as a foundational rationale for targeted anti-cancer therapy: promises and perils. EMBO Mol. Med. 2011;3:623–636. doi: 10.1002/emmm.201100176. [DOI] [PMC free article] [PubMed] [Google Scholar]
Luo J, Solimini NL. Elledge SJ. Principles of cancer therapy: oncogene and non-oncogene addiction. Cell. 2009;136:823–837. doi: 10.1016/j.cell.2009.02.024. [DOI] [PMC free article] [PubMed] [Google Scholar]
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–D950. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]
Griffith M, Griffith OL, Coffman AC, Weible JV, McMichael JF, Spies NC, et al. DGIdb: mining the druggable genome. Nat. Methods. 2013;10:1209–1210. doi: 10.1038/nmeth.2689. [DOI] [PMC free article] [PubMed] [Google Scholar]
Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. [DOI] [PMC free article] [PubMed] [Google Scholar]
Jang S. Atkins MB. Treatment of BRAF-mutant melanoma: the role of vemurafenib and other therapies. Clin. Pharmacol. Ther. 2014;95:24–31. doi: 10.1038/clpt.2013.197. [DOI] [PubMed] [Google Scholar]
Jang S. Atkins MB. Which drug, and when, for patients with BRAF-mutant melanoma? Lancet Oncol. 2013;14:e60–e69. doi: 10.1016/S1470-2045(12)70539-9. [DOI] [PubMed] [Google Scholar]
Dudley JT, Chen R. Butte AJ. Matching cancer genomes to established cell lines for personalized oncology. Pac. Symp. Biocomput. 2011:243–252. doi: 10.1142/9789814335058_0026. [DOI] [PMC free article] [PubMed] [Google Scholar]
Collins FS. Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014;505:612–613. doi: 10.1038/505612a. [DOI] [PMC free article] [PubMed] [Google Scholar]
Reich M, Liefeld T, Gould J, Lerner J, Tamayo P. Mesirov JP. GenePattern 2.0. Nat. Genet. 2006;38:500–501. doi: 10.1038/ng0506-500. [DOI] [PubMed] [Google Scholar]
Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20:3045–3054. doi: 10.1093/bioinformatics/bth361. [DOI] [PubMed] [Google Scholar]
Omberg L, Ellrott K, Yuan Y, Kandoth C, Wong C, Kellen MR, et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 2013;45:1121–1126. doi: 10.1038/ng.2761. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goecks J, Nekrutenko A, Taylor J. Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blankenberg D, Taylor J, Nekrutenko A. Galaxy T. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011;27:2426–2428. doi: 10.1093/bioinformatics/btr398. [DOI] [PMC free article] [PubMed] [Google Scholar]
Blankenberg D, Coraor N, Von Kuster G, Taylor J, Nekrutenko A. Galaxy T. Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database (Oxford) 2011;2011:bar011. doi: 10.1093/database/bar011. [DOI] [PMC free article] [PubMed] [Google Scholar]
Afgan E, Baker D, Coraor N, Goto H, Paul IM, Makova KD, et al. Harnessing cloud computing with Galaxy Cloud. Nat. Biotechnol. 2011;29:972–974. doi: 10.1038/nbt.2028. [DOI] [PMC free article] [PubMed] [Google Scholar]
Li H. Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R. Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]
Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, et al. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data. PLoS One. 2013;8:e58815. doi: 10.1371/journal.pone.0058815. [DOI] [PMC free article] [PubMed] [Google Scholar]
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim D. Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. doi: 10.1186/gb-2011-12-8-r72. [DOI] [PMC free article] [PubMed] [Google Scholar]
Wang K, Li M. Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]
Genomes Project C. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME. McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]
Sloggett C, Goonasekera N. Afgan E. BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics. 2013;29:1685–1686. doi: 10.1093/bioinformatics/btt199. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goecks J, Eberhard C, Too T, Galaxy T, Nekrutenko A. Taylor J. Web-based visual analysis for high-throughput genomics. BMC Genom. 2013;14:397. doi: 10.1186/1471-2164-14-397. [DOI] [PMC free article] [PubMed] [Google Scholar]
Goecks J, Coraor N, Galaxy T, Nekrutenko A. Taylor J. NGS analyses by visualization with Trackster. Nat. Biotechnol. 2012;30:1036–1039. doi: 10.1038/nbt.2404. [DOI] [PMC free article] [PubMed] [Google Scholar]
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
Cox AD. Der CJ. Ras history: the saga continues. Small GTPases. 2010;1:2–27. doi: 10.4161/sgtp.1.1.12178. [DOI] [PMC free article] [PubMed] [Google Scholar]
Schubbert S, Shannon K. Bollag G. Hyperactive Ras in developmental disorders and cancer. Nat. Rev. Cancer. 2007;7:295–308. doi: 10.1038/nrc2109. [DOI] [PubMed] [Google Scholar]
Baker NM. Der CJ. Cancer: drug for an ‘undruggable’ protein. Nature. 2013;497:577–578. doi: 10.1038/nature12248. [DOI] [PMC free article] [PubMed] [Google Scholar]
Tse MT. Anticancer drugs: a new approach for blocking KRAS. Nat. Rev. Drug Discov. 2013;12:506. doi: 10.1038/nrd4054. [DOI] [PubMed] [Google Scholar]
Zimmermann G, Papke B, Ismail S, Vartak N, Chandra A, Hoffmann M, et al. Small molecule inhibition of the KRAS-PDEdelta interaction impairs oncogenic KRAS signalling. Nature. 2013;497:638–642. doi: 10.1038/nature12205. [DOI] [PubMed] [Google Scholar]
Mackenzie GG, Bartels LE, Xie G, Papayannis I, Alston N, Vrankova K, et al. A novel Ras inhibitor (MDC-1016) reduces human pancreatic tumor growth in mice. Neoplasia. 2013;15:1184–1195. doi: 10.1593/neo.131368. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kim ST, Lim do H, Jang KT, Lim T, Lee J, Choi YL, et al. Impact of KRAS mutations on clinical outcomes in pancreatic cancer patients treated with first-line gemcitabine-based chemotherapy. Mol. Cancer Ther. 2011;10:1993–1999. doi: 10.1158/1535-7163.MCT-11-0269. [DOI] [PubMed] [Google Scholar]
McDonald SA, Mardis ER, Ota D, Watson MA, Pfeifer JD. Green JM. Comprehensive genomic studies: emerging regulatory, strategic, and quality assurance challenges for biorepositories. Am. J. Clin. Pathol. 2012;138:31–41. doi: 10.1309/AJCPXBA69LNSCVMH. [DOI] [PMC free article] [PubMed] [Google Scholar]
Liang WS, Craig DW, Carpten J, Borad MJ, Demeure MJ, Weiss GJ, et al. Genome-wide characterization of pancreatic adenocarcinoma patients using next generation sequencing. PLoS One. 2012;7:e43192. doi: 10.1371/journal.pone.0043192. [DOI] [PMC free article] [PubMed] [Google Scholar]
Shimada H, Kuboshima M, Shiratori T, Nabeya Y, Takeuchi A, Takagi H, et al. Serum anti-myomegalin antibodies in patients with esophageal squamous cell carcinoma. Int. J. Oncol. 2007;30:97–103. [PubMed] [Google Scholar]
O'Donnell PH, Stark AL, Gamazon ER, Wheeler HE, McIlwee BE, Gorsic L, et al. Identification of novel germline polymorphisms governing capecitabine sensitivity. Cancer. 2012;118:4063–4073. doi: 10.1002/cncr.26737. [DOI] [PMC free article] [PubMed] [Google Scholar]
Kelber JA, Reno T, Kaushal S, Metildi C, Wright T, Stoletov K, et al. KRas induces a Src/PEAK1/ErbB2 kinase amplification loop that drives metastatic growth and therapy resistance in pancreatic cancer. Cancer Res. 2012;72:2554–2564. doi: 10.1158/0008-5472.CAN-11-3552. [DOI] [PMC free article] [PubMed] [Google Scholar]
Montemurro F. Scaltriti M. Biomarkers of drugs targeting HER-family signalling in cancer. J. Pathol. 2014;232:219–229. doi: 10.1002/path.4269. [DOI] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Table S1. Analysis results from targeted exome sequencing of cancer cell lines. Targeted exome sequenced 577 genes commonly found in cancer panels.

Table S2. Analysis results from whole transcriptome sequencing (RNA-seq) of cancer cell lines.

Table S3. Analysis results from whole transcriptome sequencing (RNA-seq) from six primary pancreatic cancer tumors. Variant analysis and expression analyses were performed.

cam40004-0392-sd1.docx^{(112.9KB, docx)}

[b1] Paez JG, Janne PA, Lee JC, Tracy S, Greulich H, Gabriel S, et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science. 2004;304:1497–1500. doi: 10.1126/science.1099314. [DOI] [PubMed] [Google Scholar]

[b2] Chapman PB, Hauschild A, Robert C, Haanen JB, Ascierto P, Larkin J, et al. Improved survival with vemurafenib in melanoma with BRAF V600E mutation. N. Engl. J. Med. 2011;364:2507–2516. doi: 10.1056/NEJMoa1103782. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b3] Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, et al. A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N. Engl. J. Med. 2004;351:2817–2826. doi: 10.1056/NEJMoa041588. [DOI] [PubMed] [Google Scholar]

[b4] Chang JC, Wooten EC, Tsimelzon A, Hilsenbeck SG, Gutierrez MC, Elledge R, et al. Gene expression profiling for the prediction of therapeutic response to docetaxel in patients with breast cancer. Lancet. 2003;362:362–369. doi: 10.1016/S0140-6736(03)14023-8. [DOI] [PubMed] [Google Scholar]

[b5] Soda M, Choi YL, Enomoto M, Takada S, Yamashita Y, Ishikawa S, et al. Identification of the transforming EML4-ALK fusion gene in non-small-cell lung cancer. Nature. 2007;448:561–566. doi: 10.1038/nature05945. [DOI] [PubMed] [Google Scholar]

[b6] Slamon DJ, Leyland-Jones B, Shak S, Fuchs H, Paton V, Bajamonde A, et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. N. Engl. J. Med. 2001;344:783–792. doi: 10.1056/NEJM200103153441101. [DOI] [PubMed] [Google Scholar]

[b7] Weinstein JN, Collisson EA, Mills GB, Shaw KR, Ozenberger BA, Ellrott K, et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 2013;45:1113–1120. doi: 10.1038/ng.2764. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b8] Stratton MR. Exploring the genomes of cancer cells: progress and promise. Science. 2011;331:1553–1558. doi: 10.1126/science.1204040. [DOI] [PubMed] [Google Scholar]

[b9] Roychowdhury S, Iyer MK, Robinson DR, Lonigro RJ, Wu YM, Cao X, et al. Personalized oncology through integrative high-throughput sequencing: a pilot study. Sci. Transl. Med. 2011;3:111ra121. doi: 10.1126/scitranslmed.3003161. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b10] Weinstein IB. Joe A. Oncogene addiction. Cancer Res. 2008;68:3077–3080. doi: 10.1158/0008-5472.CAN-07-3293. discussion 3080. [DOI] [PubMed] [Google Scholar]

[b11] Torti D. Trusolino L. Oncogene addiction as a foundational rationale for targeted anti-cancer therapy: promises and perils. EMBO Mol. Med. 2011;3:623–636. doi: 10.1002/emmm.201100176. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b12] Luo J, Solimini NL. Elledge SJ. Principles of cancer therapy: oncogene and non-oncogene addiction. Cell. 2009;136:823–837. doi: 10.1016/j.cell.2009.02.024. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b13] Forbes SA, Bindal N, Bamford S, Cole C, Kok CY, Beare D, et al. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 2011;39:D945–D950. doi: 10.1093/nar/gkq929. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b14] Griffith M, Griffith OL, Coffman AC, Weible JV, McMichael JF, Spies NC, et al. DGIdb: mining the druggable genome. Nat. Methods. 2013;10:1209–1210. doi: 10.1038/nmeth.2689. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b15] Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, et al. The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012;483:603–607. doi: 10.1038/nature11003. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b16] Jang S. Atkins MB. Treatment of BRAF-mutant melanoma: the role of vemurafenib and other therapies. Clin. Pharmacol. Ther. 2014;95:24–31. doi: 10.1038/clpt.2013.197. [DOI] [PubMed] [Google Scholar]

[b17] Jang S. Atkins MB. Which drug, and when, for patients with BRAF-mutant melanoma? Lancet Oncol. 2013;14:e60–e69. doi: 10.1016/S1470-2045(12)70539-9. [DOI] [PubMed] [Google Scholar]

[b18] Dudley JT, Chen R. Butte AJ. Matching cancer genomes to established cell lines for personalized oncology. Pac. Symp. Biocomput. 2011:243–252. doi: 10.1142/9789814335058_0026. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b19] Collins FS. Tabak LA. Policy: NIH plans to enhance reproducibility. Nature. 2014;505:612–613. doi: 10.1038/505612a. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b20] Reich M, Liefeld T, Gould J, Lerner J, Tamayo P. Mesirov JP. GenePattern 2.0. Nat. Genet. 2006;38:500–501. doi: 10.1038/ng0506-500. [DOI] [PubMed] [Google Scholar]

[b21] Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, et al. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics. 2004;20:3045–3054. doi: 10.1093/bioinformatics/bth361. [DOI] [PubMed] [Google Scholar]

[b22] Omberg L, Ellrott K, Yuan Y, Kandoth C, Wong C, Kellen MR, et al. Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas. Nat. Genet. 2013;45:1121–1126. doi: 10.1038/ng.2761. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b23] Goecks J, Nekrutenko A, Taylor J. Galaxy T. Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010;11:R86. doi: 10.1186/gb-2010-11-8-r86. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b24] Blankenberg D, Taylor J, Nekrutenko A. Galaxy T. Making whole genome multiple alignments usable for biologists. Bioinformatics. 2011;27:2426–2428. doi: 10.1093/bioinformatics/btr398. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b25] Blankenberg D, Coraor N, Von Kuster G, Taylor J, Nekrutenko A. Galaxy T. Integrating diverse databases into an unified analysis framework: a Galaxy approach. Database (Oxford) 2011;2011:bar011. doi: 10.1093/database/bar011. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b26] Afgan E, Baker D, Coraor N, Goto H, Paul IM, Makova KD, et al. Harnessing cloud computing with Galaxy Cloud. Nat. Biotechnol. 2011;29:972–974. doi: 10.1038/nbt.2028. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b27] Li H. Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b28] Koboldt DC, Zhang Q, Larson DE, Shen D, McLellan MD, Lin L, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b29] Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R. Salzberg SL. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 2013;14:R36. doi: 10.1186/gb-2013-14-4-r36. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b30] Quinn EM, Cormican P, Kenny EM, Hill M, Anney R, Gill M, et al. Development of strategies for SNP detection in RNA-seq data: application to lymphoblastoid cell lines and evaluation using 1000 Genomes data. PLoS One. 2013;8:e58815. doi: 10.1371/journal.pone.0058815. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b31] Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b32] Kim D. Salzberg SL. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 2011;12:R72. doi: 10.1186/gb-2011-12-8-r72. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b33] Wang K, Li M. Hakonarson H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 2010;38:e164. doi: 10.1093/nar/gkq603. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b34] Genomes Project C. Abecasis GR, Altshuler D, Auton A, Brooks LD, Durbin RM, Gibbs RA, Hurles ME. McVean GA. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–1073. doi: 10.1038/nature09534. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b35] Tennessen JA, Bigham AW, O'Connor TD, Fu W, Kenny EE, Gravel S, et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science. 2012;337:64–69. doi: 10.1126/science.1219240. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b36] Sloggett C, Goonasekera N. Afgan E. BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics. 2013;29:1685–1686. doi: 10.1093/bioinformatics/btt199. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b37] Goecks J, Eberhard C, Too T, Galaxy T, Nekrutenko A. Taylor J. Web-based visual analysis for high-throughput genomics. BMC Genom. 2013;14:397. doi: 10.1186/1471-2164-14-397. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b38] Goecks J, Coraor N, Galaxy T, Nekrutenko A. Taylor J. NGS analyses by visualization with Trackster. Nat. Biotechnol. 2012;30:1036–1039. doi: 10.1038/nbt.2404. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b39] Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b40] Cox AD. Der CJ. Ras history: the saga continues. Small GTPases. 2010;1:2–27. doi: 10.4161/sgtp.1.1.12178. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b41] Schubbert S, Shannon K. Bollag G. Hyperactive Ras in developmental disorders and cancer. Nat. Rev. Cancer. 2007;7:295–308. doi: 10.1038/nrc2109. [DOI] [PubMed] [Google Scholar]

[b42] Baker NM. Der CJ. Cancer: drug for an ‘undruggable’ protein. Nature. 2013;497:577–578. doi: 10.1038/nature12248. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b43] Tse MT. Anticancer drugs: a new approach for blocking KRAS. Nat. Rev. Drug Discov. 2013;12:506. doi: 10.1038/nrd4054. [DOI] [PubMed] [Google Scholar]

[b44] Zimmermann G, Papke B, Ismail S, Vartak N, Chandra A, Hoffmann M, et al. Small molecule inhibition of the KRAS-PDEdelta interaction impairs oncogenic KRAS signalling. Nature. 2013;497:638–642. doi: 10.1038/nature12205. [DOI] [PubMed] [Google Scholar]

[b45] Mackenzie GG, Bartels LE, Xie G, Papayannis I, Alston N, Vrankova K, et al. A novel Ras inhibitor (MDC-1016) reduces human pancreatic tumor growth in mice. Neoplasia. 2013;15:1184–1195. doi: 10.1593/neo.131368. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b46] Kim ST, Lim do H, Jang KT, Lim T, Lee J, Choi YL, et al. Impact of KRAS mutations on clinical outcomes in pancreatic cancer patients treated with first-line gemcitabine-based chemotherapy. Mol. Cancer Ther. 2011;10:1993–1999. doi: 10.1158/1535-7163.MCT-11-0269. [DOI] [PubMed] [Google Scholar]

[b47] McDonald SA, Mardis ER, Ota D, Watson MA, Pfeifer JD. Green JM. Comprehensive genomic studies: emerging regulatory, strategic, and quality assurance challenges for biorepositories. Am. J. Clin. Pathol. 2012;138:31–41. doi: 10.1309/AJCPXBA69LNSCVMH. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b48] Liang WS, Craig DW, Carpten J, Borad MJ, Demeure MJ, Weiss GJ, et al. Genome-wide characterization of pancreatic adenocarcinoma patients using next generation sequencing. PLoS One. 2012;7:e43192. doi: 10.1371/journal.pone.0043192. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b49] Shimada H, Kuboshima M, Shiratori T, Nabeya Y, Takeuchi A, Takagi H, et al. Serum anti-myomegalin antibodies in patients with esophageal squamous cell carcinoma. Int. J. Oncol. 2007;30:97–103. [PubMed] [Google Scholar]

[b50] O'Donnell PH, Stark AL, Gamazon ER, Wheeler HE, McIlwee BE, Gorsic L, et al. Identification of novel germline polymorphisms governing capecitabine sensitivity. Cancer. 2012;118:4063–4073. doi: 10.1002/cncr.26737. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b51] Kelber JA, Reno T, Kaushal S, Metildi C, Wright T, Stoletov K, et al. KRas induces a Src/PEAK1/ErbB2 kinase amplification loop that drives metastatic growth and therapy resistance in pancreatic cancer. Cancer Res. 2012;72:2554–2564. doi: 10.1158/0008-5472.CAN-11-3552. [DOI] [PMC free article] [PubMed] [Google Scholar]

[b52] Montemurro F. Scaltriti M. Biomarkers of drugs targeting HER-family signalling in cancer. J. Pathol. 2014;232:219–229. doi: 10.1002/path.4269. [DOI] [PubMed] [Google Scholar]

PERMALINK

Open pipelines for integrated tumor genome profiles reveal differences between pancreatic cancer tumors and cell lines

Jeremy Goecks

Bassel F El-Rayes

Shishir K Maithel

H Jean Khoury

James Taylor

Michael R Rossi

Abstract

Background

Implementation

Figure 1.

Exome pipeline

Transcriptome (RNA-seq) pipeline

Integrated variant analysis pipeline

Pipeline separation

Advantages of galaxy integration

Design philosophy

Results

Validation using cell line data

Figure 2.

Table 1.

Comparing primary PAC tumors with cell lines

Figure 3.

Table 2.

Performance and scalability

Discussion

Conclusions

Methods

Cell Line and tumor tissue acquisition and processing

Library preparation and sequencing

Acknowledgments

Conflict of Interest

Supporting Information

References

Associated Data

Supplementary Materials

ACTIONS

PERMALINK

RESOURCES

Similar articles

Cited by other articles

Links to NCBI Databases