Abstract
Paired DNA and RNA profiling is increasingly employed in genomics research to uncover molecular mechanisms of disease and to explore personal genotype and phenotype correlations. here, we introduce Simul-seq, a technique for the production of high-quality whole-genome and transcriptome sequencing libraries from small quantities of cells or tissues. We apply the method to laser-capture-microdissected esophageal adenocarcinoma tissue, revealing a highly aneuploid tumor genome with extensive blocks of increased homozygosity and corresponding increases in allele-specific expression. Among this widespread allele-specific expression, we identify germline polymorphisms that are associated with response to cancer therapies. We further leverage this integrative data to uncover expressed mutations in several known cancer genes as well as a recurrent mutation in the motor domain of KIF3B that significantly affects kinesin–microtubule interactions. Simul-seq provides a new streamlined approach for generating comprehensive genome and transcriptome profiles from limited quantities of clinically relevant samples.
Integration of both DNA and RNA sequencing data enables a variety of analyses that are useful for exploring the genetics of normal phenotypic variation and disease. In addition to enumerating global patterns of gene expression, RNA sequencing data provides an orthogonal verification of DNA variant calls and can be used to prioritize expressed candidates, which are more likely to exert biologic effects. In cancer, for example, roughly a third of the somatic single-nucleotide variants (SNVs) that fall within coding regions can also be observed in the RNA1, providing a biologic filter for candidate driver mutations. Furthermore, combined DNA and RNA profiling is useful for characterizing regulatory variation2–4, RNA editing5 and allele-specific expression6–8, important contributors to phenotypic diversity and disease.
Currently, most integrative experiments are performed in parallel and on distinct cell populations, a strategy that requires lengthy library preparation times and potentially exacerbates variability on account of sample heterogeneity. Single-cell integrative sequencing approaches, genome and transcriptome sequencing (G&T-seq)9 and gDNA and mRNA sequencing (DR-seq)10, have recently produced the first genome-wide glimpses of the correlation between copy number and expression at a cellular level. However, due to the large technical variance and coverage gaps inherent in current single-cell sequencing approaches, these new methods have limited utility in contexts where more comprehensive genomes and transcriptomes are required. Moreover, both methods still require the DNA and RNA libraries to be generated independently.
Our simultaneous DNA and RNA sequencing method, Simul-seq, leverages the enzymatic specificities of the Tn5 transposase and RNA ligase to produce whole-genome and transcriptome libraries without physical separation of the nucleic acid species (Fig. 1a), reducing the library preparation time compared with that of standard independent library approaches (Supplementary Fig. 1a). Simul-seq also employs a ribosomal depletion step, thereby maintaining many biologically relevant classes of noncoding RNAs. Additionally, Simul-seq incorporates dual 5′ and 3′ indices specific for both DNA and RNA molecules, minimizing cross contamination caused by spurious ligation and tagmentation or by template switching during pooled PCR. Finally, differential amplification from distinct RNA and DNA adapter sequences can be used to adjust the read outputs derived from either library.
Results
Simul-seq efficiently produces distinct RNA-seq and DNA-seq data
To rigorously assess the specificity of the Simul-seq method, we first produced libraries derived from a mixture of 50 ng of human genomic DNA and 100 ng of yeast mRNA (Supplementary Fig. 1b). We quantified the presence of both DNA-seq and RNA-seq libraries in the pool using droplet digital PCR (ddPCR; Supplementary Fig. 1c,d). Subsequent sequencing and alignment of the dual-indexed reads to the yeast and human genomes revealed cross-species mapping rates that were similar to those observed in yeast RNA-seq and human DNA-seq libraries produced independently (Fig. 1b), indicating that the Simul-seq method specifically barcodes the DNA and RNA with distinct adapters. Next, we leveraged these adapters to optimize read outputs for various applications and starting material inputs using differential PCR. To verify this approach, we varied the number of PCR cycles with RNA primers alone while holding the number of cycles with both DNA and RNA primers constant. Inclusion of RNA-specific cycles increased the fraction of the total library derived from RNA, as measured by ddPCR (Fig. 1c). Moreover, ddPCR quantification of the DNA and RNA constituents before sequencing was also highly correlated with subsequent read outputs (Fig. 1d), enabling users to perform quality control on the mixed libraries before high-throughput sequencing.
Simul-seq DNA sequencing data is of high quality
To benchmark Simul-seq against established library preparation methods, we next applied the approach to fibroblasts derived from an individual who had previously been subjected to whole-genome sequencing11. In parallel, we also prepared independent RNA-seq libraries from these cells using an analogous RNA-ligase-based protocol. For the Simul-seq library, we obtained 560,218,621 and 57,091,162 dual-indexed DNA and RNA 101-bp paired-end reads, respectively (Supplementary Table 1). 93% of Simul-seq DNA reads mapped to the genome, producing an average genomic depth of 31.9 × (Fig. 2a). Although the Simul-seq coverage distribution was consistent with the distribution obtained from a library previously generated using an established DNA-seq method11 (Fig. 2a), the distribution exhibited some sequencing bias characteristic of the Tn5 transposase12. To further explore potential coverage biases, we generated Lorenz curves comparing the cumulative fraction of mapped bases with the cumulative fraction of the genome covered. Both the Simul-seq and the DNA-seq control genomes exhibited comparable read distributions (Fig. 2b), indicating that pooled DNA and RNA library preparation and sequencing does not introduce sequencing bias in excess of standard methods.
Whole-genome sequencing is generally performed to identify variants that are polymorphic among populations or associated with disease. Therefore, we next compared variant calls between the Simul-seq and control DNA-seq genomes. Of the 3,635,954 SNVs determined in the Simul-seq genome, 95.6% were concordant with SNVs called in the standard DNA-seq genome (Fig. 2c). In addition, the identity and size distribution of small insertions and deletions (indels) identified in the Simul-seq genome were similar to those obtained from the DNA-seq genome, with 87.5% of Simul-seq-derived indels exhibiting concordance with the standard genome (Fig. 2d). These degrees of concordance were comparable to those observed from previously published biologic replicates using a standard DNA-seq approach11 (Supplementary Fig. 2a,b), demonstrating that Simul-seq produces high-quality whole-genome data.
Simul-seq RNA sequencing data is of high quality
Next, we examined the quality of the RNA sequencing data. Similar to RNA-seq control data, Simul-seq RNA reads were effectively depleted for ribosomal sequences and mapped primarily to transcribed regions of the genome (Fig. 3a). Simul-seq RNA reads were also highly strand specific and evenly distributed across the length of transcripts (Fig. 3b,c), enabling accurate transcriptome quantification and isoform analysis. As a control, Simul-seq DNA reads mapped primarily to intronic and intergenic regions of the genome and were evenly distributed between each DNA strand, as expected (Fig. 3a,b). To rigorously assess the technical variation of transcript quantification, External RNA Controls Consortium (ERCC) RNA standards13 were spiked into the total nucleic acid mixture. Simul-seq produced ERCC transcript measurements that were both highly correlated with the known ERCC concentrations as well as with RNA-seq control ERCC measurements (Fig. 3d). The Simul-seq-derived transcriptome contained 7,992 protein-coding genes as well as an additional 1,123 noncoding genes that would be largely undetected with poly-A enrichment (Fig. 3e and Supplementary Fig. 3). Moreover, fragments per kilobase of transcript per million fragments mapped (FPKM) measurements were both reproducible and well correlated with RNA-seq control FPKMs (Fig. 3f and Supplementary Fig. 4). Taken together, these experiments demonstrate that the Simul-seq protocol efficiently produces high-quality whole-genome sequencing data and RNA sequencing data, allowing for the comprehensive profiling of genomic and transcriptomic variation from the same cell population. In addition, we have applied the method to as few as 50,000 fibroblasts, obtaining coverage distributions and variant calls (Supplementary Fig. 5a,b) as well as FPKM and ERCC expression data (Supplementary Fig. 5c,d) that were both reproducible and well correlated with our previous results.
Application of Simul-seq to cancer
Integrative DNA and RNA profiling is increasingly employed in cancer genomics to distinguish driver mutations of various types (e.g., protein coding, regulatory, structural variants, etc.) from the multitude of passenger mutations1,14,15. To test Simul-seq in this tissue context, we applied the method to laser-capture-microdissected material (∼150 μg) isolated from a male subject with metastatic esophageal adenocarcinoma (EAC). Deep sequencing of the Simul-seq EAC library produced 727,341,682 DNA and 191,398,961 RNA 101-bp dual-indexed paired-end reads, with 95.1% and 79.4% of the reads mapping to the genome and transcriptome, respectively (Supplementary Table 1). Similarly to the data acquired from fibroblasts, the Simul-seq RNA reads primarily mapped to transcribed regions, were highly strand specific and evenly distributed over transcripts (Supplementary Fig. 6a,b). However, the percentage of reads mapping to introns was increased for this library, suggesting an increased rate of intron retention and/or number of unspliced transcripts in this tumor specimen (Supplementary Fig. 6c). The tumor genome was sequenced to an average coverage of 38× and displayed a skewed coverage distribution indicative of large-scale copy-number alterations (Fig. 4a).
Comparing the Simul-seq tumor genome with a DNA-seq paired normal genome revealed a highly aneuploid genomic landscape, with somatic evidence for 142 structural variants and 9 expressed gene fusions as well as 15,607 SNVs and 2,904 indels (Fig. 4b and Supplementary Tables 2–5). Globally, the ratio of heterozygous to homozygous SNPs for the tumor genome was 0.49, an exceptional deviation from the typically observed ratio of ∼1.5 (Fig. 2c) that indicated widespread loss of heterozygosity (LOH) (Fig. 4c). Analysis of allele-specific expression using the Simul-seq EAC transcriptome data provided further support for extensive LOH, with 92.9% of the identified allele-specific transcripts exhibiting average major allele frequencies of greater than or equal to 0.9 (Fig. 4c and Supplementary Table 6). Given the high levels of LOH-induced allele-specific expression (ASE) in the tumor, we hypothesized that damaging germline variants in tumor suppressor genes might be specifically expressed in the tumor. Indeed, we identified eight nonsynonymous variants in tumor suppressor genes (as defined by the TSGene 2.0 database16) where a PolyPhen-2 (ref. 17)- and SIFT18-predicted damaging allele was predominantly expressed (Supplementary Table 7).
To distill the 15,607 somatic SNVs into potential oncogenic mutations, we integrated the Simul-seq DNA and RNA data to identify 29 expressed nonsynonymous somatic mutations (Table 1 and Supplementary Table 4). In addition to representing potential driver mutations, these expressed protein-altering mutations are also possible neoantigens from which patient-specific immunotherapies may be derived19–21. Notably, three Cosmic Cancer census genes22 (TP53, ATM and ESWR1) were found to harbor expressed somatic missense mutations. While ESWR1 is typically a constituent of an oncogenic fusion protein, and the R45W mutation in the ATM serine/threonine kinase tumor suppressor is not yet characterized, the Y220C mutation is a known TP53 hotspot that decreases protein stability23,24. Moreover, we found that the TP53 locus exclusively expressed the damaging allele (Table 1), exacerbating the loss of TP53 function and likely underpinning the widespread genomic instability observed in this tumor specimen. Interestingly, this patient also exhibited ASE for common germline polymorphisms in the epidermal growth factor receptor gene (EGFR, rs2227983) as well as the cyclin D1 gene (CCND1, rs9344) (Supplementary Table 6), polymorphisms that are associated with response to chemotherapeutic treatments25–28.
Table 1. Selected expressed somatic nonsynonymous variants in cancer-related genes.
Gene | DNA (ref/alt) | RNA counts (ref/alt) | Protein | Cosmic census |
---|---|---|---|---|
TP53 | T/C | 0/76 | Y220C | Yes |
ATM | C/T | 102/37 | R45W | Yes |
EWSR1 | C/T | 26/9 | P122L | Yes |
KIF3B | C/T | 170/64 | R293W | No |
MCM3AP | G/A | 5/127 | R1207C | No |
FAT1 | C/T | 11/44 | V1274I | No |
MADD | G/A | 59/19 | R225Q | No |
LRP1 | G/T | 16/3 | D2106Y | No |
H2AFY | G/A | 13/43 | R4C | No |
ZNF615 | T/C | 0/10 | N154S | No |
CSTF1 | G/A | 51/68 | G26S | No |
Characterization of a recurrent mutation in a kinesin family gene
In addition to discovering clinically relevant alterations in known cancer genes, we observed an expressed arginine-to-tryptophan mutation in KIF3B (R293W), a type II kinesin motor protein. Although several kinesin family members have established roles in cancer29, KIF3B somatic coding mutations have not been previously described. KIF3B has been linked to the intracellular trafficking of several tumor suppressor genes29,30, and biochemical data have shown that substitution of specific arginine and lysine residues within the kinesin motor domain negatively impacts kinesin-microtubule association31. To further explore KIF3B mutation frequency in EAC, we performed targeted resequencing of the KIF3B locus in a cohort of 49 EAC samples, with 25 paired normals. Overall, KIF3B harbored verified nonsynonymous mutations in ∼6% of the tumor samples, and the R293W mutation was observed in a second independent patient (Fig. 5a and Supplementary Fig. 7a,b). To investigate the functional consequences of this recurrent R293W mutation, we purified recombinant wild-type and mutant KIF3B motor domains (Supplementary Fig. 8a,b). When compared with the wild-type domain, the mutant motor domain displayed a significantly reduced rate of ATP hydrolysis upon incubation with various concentrations of microtubules, suggesting that the R293W mutation abrogates kinesin–microtubule binding (Fig. 5b). Together, these results demonstrate the benefits of Simul-seq in providing comprehensive DNA and RNA data sets, leading to the annotation of several clinically important variants as well as the description of a functionally significant recurrent mutation.
Discussion
As sequencing technologies advance and more individuals are profiled in both clinical and research settings, straightforward methods for generating comprehensive and accurate whole-genome and transcriptome sequencing data will become increasingly valuable. The combined sequencing of both DNA and RNA from single cells was recently enabled by the development of two methods, DR-seq32 and G&T-seq33. Simul-seq provides a complementary approach that focuses on producing comprehensive DNA and RNA profiles from limited quantities of tissues or cells rather than single cells. In contrast to previous dual-sequencing approaches, Simul-seq generates a single pooled library, and thus both reduces the library preparation time and keeps paired data sets physically linked. Importantly, whereas DR-seq and G&T-seq depend upon polyadenylation to distinguish RNA transcripts from genomic DNA, the use of RNA ligase in Simul-seq allows for a ribosomal RNA depletion step. Therefore, Simul-seq retains biologically and clinically important nonpolyadenylated RNA transcripts and may reduce 3′ bias for samples with lower RNA quality34,35. Overall, Simul-seq produces high-quality DNA and RNA sequencing data, enabling genotype and phenotype comparisons in a single workflow.
Cancer genome interpretation is one scenario where integration of precise and comprehensive DNA and RNA landscapes has proven useful but can be challenging on account of limited starting material. Moreover, tumor heterogeneity increases the likelihood of discrepancies between genome and transcriptome profiles prepared in parallel on separate cell populations. Applying Simul-seq to laser-capture-microdissected tumor tissue revealed a highly aneuploid somatic landscape, including a recurrent R293W mutation in KIF3B that dramatically reduced kinesin–microtubule interaction. Although the ∼6% mutation frequency that we observed is consistent with recently published data from whole-genome sequencing of 22 esophageal adenocarcinomas36, KIF3B has not been classified as a cancer gene in large-scale EAC exome sequencing studies37,38. These efforts, however, are still largely statistically underpowered14. Intriguingly, overexpression of C-terminal truncations of KIF3B-induced aneuploidy in NIH3T3 cells39. Moreover, KIF3B has been linked to the intracellular trafficking of several tumor suppressors, including the adenomatous polyposis coli (APC)30 and von Hippel–Lindau (VHL)29 proteins. Together, our findings suggest that additional experiments are warranted to delineate specific functional roles for KIF3B mutation in esophageal tumorigenesis.
In addition to the novel KIF3B mutation, we also identified a number of clinically relevant variants in this EAC patient sample. We observed a known TP53 hotspot mutation (Y220C) that destabilizes the TP53 protein at body temperatures24 and is also a target of several small molecules designed to restore TP53 function in tumors23,40. TP53 inactivation followed by whole-genome duplication and chromosomal catastrophe is a frequent trajectory for EAC development36,41 and is consistent with our observations for this tumor. Among the widespread LOH induced by this genomic instability, we detected ASE for germline variants with pharmacogenomic links to the efficacy of cancer therapies used in EAC. The EGFR polymorphism (rs2227983) observed in this patient is associated with increased survival of colorectal cancer patients treated with Cetuximab27,28, perhaps via attenuation of EGFR pathway signaling42. In contrast, the patient harbored a second variant in CCND1 (rs9344) that is inversely correlated with overall survival in colorectal cancer patients treated with Cetuximab43. In both cases, however, the beneficial allele was predominantly expressed in the tumor, suggesting a positive overall response. Taken together, our results in this EAC patient both highlight the utility of Simul-seq as well as the many benefits of acquiring combined DNA and RNA profiles for genome interpretation and personalized medicine.
Online Methods
Sample acquisition
The male-patient-derived fibroblasts used in this study were collected and derived with informed patient consent under a protocol approved by the Institutional Review Board at Stanford University Medical Center (IRB17576). Cells tested negative for mycoplasma and were cultured with DMEM supplemented with 10% fetal bovine serum (FBS). The deidentified male esophageal cancer sample was obtained from Stanford Cancer Institute's Tissue Repository and was exempt from IRB requirements by the Stanford Research Compliance Office. Investigators were not blinded to experimental groups, and no power calculation was performed before experiments to ensure detection of a prespecified effect size.
DNA/RNA extraction
For the mixing experiments, yeast mRNA was obtained from Clontech (Clontech: 636312) and human genomic DNA was isolated using the DNA Mini kit (Qiagen: 51304). For all other Simul-seq experiments, total nucleic acids were extracted using the RNeasy Mini kit (Qiagen: 74104) per manufacturer's instructions, except the optional DNase I treatment was not performed. DNA and RNA were then quantified using the Qubit DNA HS and RNA HS (Thermo Fisher: Q32851, Q32852), respectively. For fibroblast experiments, extraction began with 1 × 106 cells, whereas the laser-capture-microdissected (LCM) tumor library started with approximately 150 μg of tissue (based on isolating ∼150 × 106 μm3 and assuming an average tissue density of 1.0 g/cm3). The quality of the starting total RNA was measured using Bioanalyzer, with RNA integrity number (RIN) values ranging from 8 for LCM-isolated tissue to 10 for LCM-isolated cells. For Simul-seq library preparations, ERCC spike in mixture A (Life Technologies: 4456740) was added per manufacturer's instructions before the ribosomal RNA depletion step.
Ribosomal depletion
Ribosomal RNA sequences were depleted from the total nucleic acid mixture using Ribo-Zero gold (Illumina: MRZG126) and following the manufacturer's instructions. To reduce potential hybridization to genomic DNA sequences; however, the standard 70 °C hybridization step was changed to 65 °C. Ribosomal RNA depletion began with the recommended amount of total RNA (1 μg for LCM tissue to 5 μg for fibroblasts). For 50,000 fibroblast experiments, ∼400 ng of total RNA was used. Following ribosomal RNA depletion, the total nucleic acid mixture was purified using RNA Clean and Concentrator 5 columns (Zymo Research: R1015) and quantified using high-sensitivity DNA and RNA Qubit reagents as above.
Simul-seq protocol
Unless otherwise noted, reagents were from New England Biosciences (NEB: E7330S) or Illumina (Illumina: FC-121-1031). Simultaneous RNA fragmentation and DNA tagmentation was achieved by mixing 25 μl of TD buffer, 5 μl of TDE, 1 μl RNase III (0.5 U, NEB: E6146S) and 19 μl of DNA/RNA consisting of 30-50 ng of genomic DNA and 10–100 ng of ribodepleted RNA. This reaction was incubated for 5 min at 55 °C, and the thermocycler was cooled to 10 °C before the reaction was placed on ice. 100 μl Ampure XP RNAclean beads (Beckman Coulter: A63987), or 2× the reaction volume, were then added to the reaction and incubated for 10–15 min to bind the nucleic acids. The beads were placed on a magnet stand until clear, washed twice with 400 μl of 80% ethanol and dried for 10 min at room temperature The total nucleic acids were eluted from the dried beads using 7 μl of H2O. To remove secondary RNA structure, 6 μl of the eluate and 1 μl of the 3′ ligation adapter were first heated to 65 °C for 5 min and then immediately placed on ice. For ligation of the 3′ adapter to the RNA molecules, 10 μl of 3′ ligation buffer and 3 μl of 3′ ligation enzyme mix were added and incubated for 1 h at 25 °C in a thermal cycler with the lid heated to 50 °C. To reduce adapter–adapter ligation products, 1 μl of the reverse transcription primer (SR RT primer) and 4.5 μl of H2O were added to the 3′ adapter ligation reaction and incubated in a PCR machine for 5 min at 65 °C, 15 min at 37 °C, 15 min at 25 °C and held at 4 °C until the next step. To ligate the 5′ adapter, 1 μl of 5′ SR adapter, which had been previously heated to 70 °C and then placed on ice, along with 1 μl of 5′ ligation buffer and 2.5 μl of 5′ ligase enzyme mix were added to the 3′ adapter-ligated and SR-RT-primer-hybridized RNA. This reaction was incubated for 1 h at 25 °C with the lid heated to 50 °C and then placed on ice. First-strand cDNA synthesis was performed by adding 8 μl of first-strand reaction buffer, 1 μl of murine RNase inhibitor and 1μl of ProtoScript II reverse transcriptase to the previous mixture and incubating the reaction for 1 h at 42 °C with the lid heated to 50 °C. 48 μl of Ampure XP beads (Beckman Coulter: A63880), or 1.2× of the reaction volume, were then used to clean up the cDNA and transposed genomic DNA. The beads were incubated for 5–10 min with the DNA, washed twice with 80% ethanol and mixed with 26.5 μl of H2O to elute the DNA. PCR conditions varied depending on whether differential PCR was performed. DNA libraries were amplified using standard Nextera indexing primers. RNA libraries were amplified with a custom I5 indexing primer AATGATACGGCGACCACCGAGATCTA CACTATCCTCTGTTCAGAGTTCTACAGTCCG-s-A, where -s- indicates a phosphorothioate bond, and a standard I7 indexing primer. For differential PCR, 25.5 μl of the eluate was combined with 1.25 μl of each RNA indexing primer (10 mM stock) and 12 μl Nextera PCR Master Mix (NPM) and then thermocycled as follows: 72 °C for 3 min; 98 °C for 30 s; then two to seven cycles of 98 °C for 10 s, 62 °C for 30 s and 72 °C for 3 min; before a final hold at 4 °C. After this hold, the reaction was removed from the thermocycler and combined with 12.5 μl of a master mix comprising 2.5 μl of each DNA indexing PCR primer (5 mM stock), 5 μl of PPC and 5 μl NPM. This combined reaction was then subjected to five additional cycles using the same program described above. The fibroblast, LCM and 50,000 fibroblast Simul-seq libraries used two, four and seven cycles of RNA-specific PCR, respectively. The final libraries were cleaned using 66 μl Ampure XP beads as described above and eluted in 12 μl of H20. To quality control the dual-indexed libraries, we performed high-sensitivity Qubit DNA and Bioanalyzer assays prior to sequencing of paired-end 101 bp reads on Illumina HiSeq or MiSeq machines. A typical Simul-seq library will be approximately 10 ng/ml, with an average size distribution of ∼350 bp (Supplementary Fig. 1b). A detailed description of Simul-seq reagents, equipment and a step-by-step protocol can be found in the Supplementary Note.
Read processing and alignment
For both DNA and RNA reads, Cutadapt v1.8.1 (ref. 44) was used to trim the paired-end adapter sequences. Only trimmed reads longer than 30 bases and with a quality score >20 were aligned. For the DNA barcoded reads, 5′-CTGTCTCTTATACACATCTCCGAGCCCACGAGAC-3′ and 5′- CTGTCTCTTATACACATCTGACGCTGCCGACGA-3′ sequences were used to trim the adapter sequences. For RNA bar-coded reads, 5′-AGATCGGAAGAGCACACGTCTGAACTCCAG TCAC- 3′ and 5′-GATCGTCGGACTGTAGAACTCTGAACGTG TAGATC-3′ sequences were used to trim the adapter sequences.
DNA libraries were processed and analyzed using the Bina Technologies whole-genome analysis workflow with default settings. Briefly, libraries were mapped with BWA mem 0.7.5 software45 to hg19 and then realigned around indels with GATK IndelRealigner46. Next, base recalibration was performed with GATK BaseRecalibrator taking into account the read group, quality scores, cycle and context covariates. Variants were called with GATK HaplotypeCaller with the parameters–variant_index_ type LINEAR-variant_index_parameter 128000. VQSR was used to recalibrate the variants, first with GATK VariantRecalibrator and then ApplyRecalibration. For the cross-contamination analysis shown in Figure 1b, Simul-seq DNA-seq-indexed reads were mapped to hg19 and SacCer3 using Bowtie2 (ref. 47) with default settings.
RNA libraries were also processed and analyzed using Bina Technologies RNA analysis using default settings. Briefly, TopHat 2.0.11 (ref. 48) was used to map libraries to hg19, and Cufflinks49 was then used to perform per-sample gene expression analysis. Finally, Cuffdiff was used to find differential expression between replicates and different library types. For cross-contamination analysis shown in Figure 1b, Simul-seq RNA-indexed reads were mapped with TopHat to hg19 and SacCer3 using default settings.
DNA and RNA QC analysis
Coverage plots were calculated from the Bina output. SNV and indel concordance between sequencing libraries was calculated using VCFtools v0.1.12 (ref. 50) on all variants annotated with a ‘passed’ filter. Summary statistics for SNVs were also calculated with VCFtools. Read fractions were calculated with Picard v1.92 (http://broadinstitute.github.io/picard) for the DNA and RNA sequencing libraries. Strand specificity and gene-body coverage were calculated with RSeQC 2.6.2 (ref. 51). For the analysis transcripts biotypes, the Simul-seq RNA data was mapped with TopHat using the Ensembl GENCODE annotations and quantitated with Cufflinks. Genes with FPKM values ≥5 were counted. Cuffdiff was used to compare log10(FPKM + 1) expression values between Simul-seq RNA libraries and control RNA-seq libraries.
Lorenz curves
Duplicates were removed from hg19-aligned reads using Picard v1.92, and Bedtools v2.18.0 (ref. 52) was used to calculate the coverage at every position in the genome. The file was then sorted by coverage, and cumulative sums for the fraction of the covered genome and the fraction of total mapped bases were calculated using custom scripts.
ERCC analysis
TopHat was used to align reads to ERCC reference using default settings. Next, duplicate reads were removed using Picard MarkDuplicates, and FeatureCounts53 was used to determine the total read counts for each ERRC transcript. Read counts were then normalized across transcripts and libraries using the RPKM methodology (i.e., reads per kb of transcript per million mapped reads). ERCC RPKM measurements for Simul-seq and RNA-seq replicates were averaged, zero values were set to one and then log10 transformed. ERCC transcript data for Simul-seq and RNA-seq replicates is shown (Supplementary Table 8).
Droplet digital PCR
DNA:RNA ratios of between 5:1 to 10:1 are optimal for whole-genome and whole-transcriptome sequencing of human samples. ddPCR experiments were performed according to manufacturer's guidelines (Droplet Digital PCR Application Guide, Bulletin 6407 Rev A) using a Bio-Rad QX200 system. Briefly, custom qPCR assays were designed to the unique the DNA-seq and RNA-seq library adapter sequences and purchased from IDT as PrimeTime Std qPCR Assays (Supplementary Fig. 1c,d). These assays incorporated HPLC-purified probes with 5′ HEX or 6-FAM fluorophores and internal ZEN and 3′ Iowa Black FQ dual quenchers. 20 μl ddPCR reactions were assembled using diluted Simul-seq libraries (2 μl of a 10−6 dilution was typically sufficient but will vary depending on the starting library concentration). The ddPCR reactions were then subjected to the following cycling program: 10 min at 95 °C; 40 cycles of 30 s at 95 °C and 1 min at 60 °C, 10 min at 98 °C; and a hold at 4 °C. Triplicate reactions were done for each sample, and quantitation was performed using QuantaSoft version 1.3.2.
Laser-capture microdissection
For LCM, 7 μm cryosections were placed onto 76 × 26 PEN glass slides (Leica: 11505158) and stored at −80 °C for up to 4 d. To guide the isolation process, serial sections were immunofluorescently stained with Keratin 8 (1:100; Abcam: ab668-100) and counterstained with Hoechst 33342 dye (2 mg/ml in PBS), marking the tumor epithelium and nuclei, respectively. On the day of laser capture, the LCM slides were stained with Cresyl violet according to the manufacturer's protocol (LCM staining kit, Ambion: AM1935). Immediately following staining, a Leica AS LMD system was used to isolate ∼150 × 106 μm3 (or ∼150 μg) of esophageal adenocarcinoma tumor tissue. The LCM-isolated tissue was then subjected to the Simul-seq protocol; and 727,341,682 DNA and 191,398,961 RNA 101 bp paired-end reads were obtained using an Illumina HiSeq2000 machine. For all transcriptome analyses using Simul-seq RNA tumor data, 116,217,162 reads were analyzed.
Somatic variant analysis
Somatic variant analysis was performed using Bina tumor-normal whole-genome calling workflow. Briefly, somatic variants with a Bina ONCOSCORE of greater than or equal to 5 were considered high confidence and reported. To identify somatic variants and generate the ONCOSCORE, Bina integrates JointSNVMix 0.7.5 (ref. 54), Mutect 2014.3-24-g7dfb931 (ref. 55), Somatic Indel Detector 2014.3-24-g7dfb931, Somatic Sniper 1.0.4 (ref. 56) and Varscan 2.3.7 (ref. 57) outputs. GATK ASEReadCounter was used to determine the variant and reference expression counts for somatic SNV positions in the tumor transcriptome data. The resultant somatic SNVs and indels are annotated in Supplementary Tables 4 and 5.
To determine large somatic structural variants (SVs), CREST58 was run on the tumor-normal paired genomic data. To refine the variant calls, we only reported SVs with greater than five supporting reads on both the 3′ and 5′ arms of the variant, which resulted in 142 total potential genomic SVs (Supplementary Table 2). Somatic SVs resulting in expressed gene fusions were independently determined using the INTEGRATE software package59, which incorporates tumor RNA sequencing data along with paired tumor-normal genome sequencing data. To refine this expressed fusion list, we only reported fusions with no evidence in the normal DNA data and at least one read of evidence for both the tumor DNA and RNA, which resulted in 9 potential expressed gene fusions (Supplementary Table 3). Circos software 0.63 (ref. 60) was used to display somatic variation in Figure 4b.
Loss of heterozygosity
For the LOH analysis, heterozygous positions in the normal were selected in the VCF file using SNPsift61. GATK SelectVariants was then used to interrogate these heterozygous positions in the tumor VCF, classifying them as heterozygous or homozygous alternative. Heterozygous positions in the normal that were not present in the tumor VCF were considered homozygous reference and counted as LOH positions.
Allele-specific expression
To examine LOH at the level of gene expression, allele-specific expression (ASE) in the tumor RNA was calculated for heterozygous positions called in the normal using ASEQ62. Briefly, GENOTYPE mode was run on a bam file derived from the paired normal genome with the following options: mbq = 20 mrq = 1 mdc = 5 htperc = 0.2. Next, ASE mode was run using a bam file from the tumor RNA with the following options: mbq = 20 mrq = 20 mdc = 10 pht = 0.01 pft = 0.01. This analysis was performed using an hg19 Ensembl transcript model and identified 21,797 transcripts—corresponding to 6,698 independent gene symbols—as exhibiting ASE (Supplementary Table 6). Circos was used to display the number of ASE transcripts in 100 kb bins in Figure 4b.
Targeted resequencing of KIF3B locus
Overlapping primer sets were designed to capture all of the coding exons of the KIF3B locus (Supplementary Tables 9 and 10). Genomic DNA was isolated from 50 formalin-fixed paraffin embedded (FFPE) tumor samples as well as 26 paired normal samples using an AllPrep DNA/RNA FFPE kit (Qiagen: 80204) according to manufacturer's instruction. The original sample (02-28923-C9) that was subjected to the Simul-seq protocol was included as a positive control. The gDNA concentrations were normalized to 50 ng/μl and subjected to amplification on a Fluidigm Axess Array system, following manufacturer's recommendation (FC1 Cycler v1.0 User Guide rev A4). The resultant libraries were pooled, sequenced on a single HiSeq2000 lane and mapped using bowtie (see Supplementary Fig. 7a). SAMtools63 was used to generate a pileup, and SNVs were identified using four criteria: mapped to a targeted region, allele read fraction of ≥10%, mapping quality of ≥10 and coverage of ≥500. Using these criteria, three variants in KIF3B were identified and subsequently validated using pyrophosphate sequencing (see Supplementary Fig. 7b). A single tumor-normal pair (00-18224-A2) displayed a substantially higher number of variant calls yet a lower number of uniquely mapped reads, suggesting that these samples harbored increased rates of PCR errors induced by low-quality genomic DNA. Therefore, variants identified in these samples were not reported.
Kinesin-microtubule interaction assays
Full-length kinesin proteins exhibit poor solubility in bacteria64. Therefore, wild-type and R293W mutant motor domains (amino acids 1–365) were amplified using the following primers: CATATGTCAAAGTTGAAAAGCTCAG and CTCGAGCTAGAGCCGAGCAAT CTCTTCCT. The PCR products were digested with NdeI/XhoI restriction enzymes and cloned into NdeI/XhoI-digested pET28a backbone, tagging the KIF3B motor domains on the N terminus. Recombinant KIF3B was purified using nickel affinity purification (Supplementary Fig. 8a,b). Briefly, bacterial pellets were lysed for 30 min on ice in lysis buffer (50 mM PIPES, pH 8.0, 1 mM MgCl2, 250 mM NaCl2, 250 μg/ml lysozyme, 250 mM ATP and protease inhibitors (Roche: 04693132001)). Lysates were pulse sonicated for three cycles of 18% amplitude (Bronson) for 5 s (0.5 s on and 1 s off), followed by 1 min on ice. Lysates were then cleared by centrifugation for 10 min at 4 °C and maximum speed. Cleared lysates were incubated with His-tag magnetic beads (Life Technologies: 10103D) for 1 h at 4 °C, washed 2× in washing buffer (50 mM PIPES, pH 8.0, 1 mM MgCl2, 250 mM NaCl2, 50 mM imidazole) supplemented with 250 mM ATP followed by an additional two washes in buffer excluding ATP. Beads were subsequently eluted in 25 mM PIPES, pH 8.0, 2 mM MgCl2, 125 mM NaCl2, and 250 mM imidazole. Kinesin ATPase end-point biochemical assays (Cytoskeleton: BK053) were performed in duplicate according to manufacturer's instructions with 0.4 μg of recombinant protein and increasing amounts of polymerized microtubules (see Fig. 5b).
Supplementary Material
Acknowledgments
We thank C. Araya, C. Cenik, P. Dumesic, D. Phanstiel and D. Webster for many helpful discussions and input regarding the manuscript and analyses. We acknowledge J. Churko from the laboratory of J. Wu at Stanford University for providing the fibroblasts as well as the work of both the sequencing core at the Stanford Center for Genomics and Personalized Medicine and the Genetics Bioinformatics Service Center, with special thanks to G. Euskirchen, L. Ramirez, C. Eastman, N. Watson and N. Hammond. Finally, we would like to thank H. Chen from Bina Technologies.
Footnotes
Author Contributions: J.A.R., D.V.S. and M.P.S. conceived the project, designed experiments and wrote the manuscript. J.A.R. and D.V.S. performed analyses and experiments. R.K.P. provided pathology expertise and formalin-fixed paraffin-embedded esophageal adenocarcinoma specimens. Work in the Snyder lab is supported by NIH grants to M.P.S. (1P50HG00773501 and 8U54DK10255602). J.A.R. was supported by the Damon Runyon Cancer Research Foundation, and D.V.S. was supported by an NIH T32 fellowship (HG000044) and a Genentech Graduate Fellowship.
Competing Financial Interests: The authors declare competing financial interests.
Accession codes. Primary sequencing data files are deposited under the database of Genotypes and Phenotypes (dbGaP) and the Sequence Read Archive (SRA) under accession numbers phs001214.v1.p1 and SRP077004, respectively.
Note: Any Supplementary Information and Source Data files are available in the online version of the paper.
References
- 1.Shah SP, et al. The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012;486:395–399. doi: 10.1038/nature10933. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2.Grubert F, et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell. 2015;162:1051–1065. doi: 10.1016/j.cell.2015.07.048. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 3.Stranger BE, et al. Population genomics of human gene expression. Nat Genet. 2007;39:1217–1224. doi: 10.1038/ng2142. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Ongen H, et al. Putative cis-regulatory drivers in colorectal cancer. Nature. 2014;512:87–90. doi: 10.1038/nature13602. [DOI] [PubMed] [Google Scholar]
- 5.Li JB, et al. Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science. 2009;324:1210–1213. doi: 10.1126/science.1170995. [DOI] [PubMed] [Google Scholar]
- 6.Tuch BB, et al. Tumor transcriptome sequencing reveals allelic expression imbalances associated with copy number alterations. PLoS One. 2010;5:e9317. doi: 10.1371/journal.pone.0009317. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Su AI, et al. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004;101:6062–6067. doi: 10.1073/pnas.0400782101. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Lappalainen T, et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature. 2013;501:506–511. doi: 10.1038/nature12531. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 9.Macaulay IC, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods. 2015;12:519–522. doi: 10.1038/nmeth.3370. [DOI] [PubMed] [Google Scholar]
- 10.Dey SS, Kester L, Spanjaard B, Van A. Integrated genome and transcriptome sequencing from the same cell. Nat BiotechnoL. 2015;33:1–19. doi: 10.1038/nbt.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 11.Lam HYK, et al. Performance comparison of whole-genome sequencing platforms. Nat BiotechnoL. 2011;30:78–82. doi: 10.1038/nbt.2065. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Adey A, et al. Rapid, low-input, low-bias construction of shotgun fragment libraries by high-density in vitro transposition. Genome Biol. 2010;11:R119. doi: 10.1186/gb-2010-11-12-r119. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 13.Baker SC, et al. The External RNA Controls Consortium: a progress report. Nat Methods. 2005;2:731–734. doi: 10.1038/nmeth1005-731. [DOI] [PubMed] [Google Scholar]
- 14.Lawrence MS, et al. Discovery and saturation analysis of cancer genes across 21 tumour types. Nature. 2014;505:495–501. doi: 10.1038/nature12912. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 15.Weinstein JN, et al. Cancer Genome Atlas Research Network. Comprehensive molecular characterization of urothelial bladder carcinoma. Nature. 2014;507:315–322. doi: 10.1038/nature12965. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16.Zhao M, Kim P, Mitra R, Zhao J, Zhao Z. TSGene 2.0: an updated literature-based knowledgebase for tumor suppressor genes. Nucleic Acids Res. 2015;4:D1023–D1031. doi: 10.1093/nar/gkv1268. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17.Adzhubei IA, et al. A method and server for predicting damaging missense mutations. Nat Methods. 2010;7:248–249. doi: 10.1038/nmeth0410-248. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4:1073–1081. doi: 10.1038/nprot.2009.86. [DOI] [PubMed] [Google Scholar]
- 19.Yadav M, et al. Predicting immunogenic tumour mutations by combining mass spectrometry and exome sequencing. Nature. 2014;515:572–576. doi: 10.1038/nature14001. [DOI] [PubMed] [Google Scholar]
- 20.Robbins PF, et al. Mining exomic sequencing data to identify mutated antigens recognized by adoptively transferred tumor-reactive T cells. Nat Med. 2013;19:747–752. doi: 10.1038/nm.3161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 21.Schumacher TN, Schreiber RD. Neoantigens in cancer immunotherapy. Science. 2015;348:69–74. doi: 10.1126/science.aaa4971. [DOI] [PubMed] [Google Scholar]
- 22.Futreal PA, et al. A census of human cancer genes. Nat Rev Cancer. 2004;4:177–183. doi: 10.1038/nrc1299. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Joerger AC, Ang HC, Fersht AR. Structural basis for understanding oncogenic p53 mutations and designing rescue drugs. Proc Natl Acad Sci USA. 2006;103:15056–15061. doi: 10.1073/pnas.0607286103. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 24.Bullock AN, Henckel J, Fersht AR. Quantitative analysis of residual folding and DNA binding in mutant p53 core domain: definition of mutant states for rescue in cancer therapy. Oncogene. 2000;19:1245–1256. doi: 10.1038/sj.onc.1203434. [DOI] [PubMed] [Google Scholar]
- 25.Gautschi O, et al. Cyclin D1 (CCND1) A870G gene polymorphism modulates smoking-induced lung cancer risk and response to platinum-based chemotherapy in non-small cell lung cancer (NSCLC) patients. Lung Cancer. 2006;51:303–311. doi: 10.1016/j.lungcan.2005.10.025. [DOI] [PubMed] [Google Scholar]
- 26.Absenger G, et al. The cyclin D1 (CCND1) rs9344 G>A polymorphism predicts clinical outcome in colon cancer patients treated with adjuvant 5-FU-based chemotherapy. Pharmacogenomics J. 2014;14:130–134. doi: 10.1038/tpj.2013.15. [DOI] [PubMed] [Google Scholar]
- 27.Gonçalves A, et al. A polymorphism of EGFR extracellular domain is associated with progression free-survival in metastatic colorectal cancer patients receiving cetuximab-based treatment. BMC Cancer. 2008;8:169. doi: 10.1186/1471-2407-8-169. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Hsieh YY, Tzeng CH, Chen MH, Chen PM, Wang WS. Epidermal growth factor receptor R521K polymorphism shows favorable outcomes in KRAS wild-type colorectal cancer patients treated with cetuximab-based chemotherapy. Cancer Sci. 2012;103:791–796. doi: 10.1111/j.1349-7006.2012.02225.x. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Yu Y, Feng YM. The role of kinesin family proteins in tumorigenesis and progression: potential biomarkers and molecular targets for cancer therapy. Cancer. 2010;116:5150–5160. doi: 10.1002/cncr.25461. [DOI] [PubMed] [Google Scholar]
- 30.Jimbo T, et al. Identification of a link between the tumour suppressor APC and the kinesin superfamily. Nat Cell Biol. 2002;4:323–327. doi: 10.1038/ncb779. [DOI] [PubMed] [Google Scholar]
- 31.Woehlke G, et al. Microtubule interaction site of the kinesin motor. Cell. 1997;90:207–216. doi: 10.1016/s0092-8674(00)80329-3. [DOI] [PubMed] [Google Scholar]
- 32.Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A. Integrated genome and transcriptome sequencing of the same cell. Nat BiotechnoL. 2015;33:285–289. doi: 10.1038/nbt.3129. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 33.Macaulay IC, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods. 2015;12:519–522. doi: 10.1038/nmeth.3370. [DOI] [PubMed] [Google Scholar]
- 34.Adiconis X, et al. Comparative analysis of RNA sequencing methods for degraded or low-input samples. Nat Methods. 2013;10:623–629. doi: 10.1038/nmeth.2483. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 35.Zhao W, et al. Comparison of RNA-Seq by poly (A) capture, ribosomal RNA depletion, and DNA microarray for expression profiling. BMC Genomics. 2014;15:419. doi: 10.1186/1471-2164-15-419. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 36.Nones K, et al. Genomic catastrophes frequently arise in esophageal adenocarcinoma and drive tumorigenesis. Nat Commun. 2014;5:5224. doi: 10.1038/ncomms6224. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 37.Agrawal N, et al. Comparative genomic analysis of esophageal adenocarcinoma and squamous cell carcinoma. Cancer Discov. 2012;2:899–905. doi: 10.1158/2159-8290.CD-12-0189. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 38.Dulak AM, et al. Exome and whole-genome sequencing of esophageal adenocarcinoma identifies recurrent driver events and mutational complexity. Nat Genet. 2013;45:478–486. doi: 10.1038/ng.2591. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 39.Haraguchi K, Hayashi T, Jimbo T, Yamamoto T, Akiyama T. Role of the kinesin-2 family protein, KIF3, during mitosis. J Biol Chem. 2006;281:4094–4099. doi: 10.1074/jbc.M507028200. [DOI] [PubMed] [Google Scholar]
- 40.Liu X, et al. Small molecule induced reactivation of mutant p53 in cancer cells. Nucleic Acids Res. 2013;41:6034–6044. doi: 10.1093/nar/gkt305. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 41.Stachler MD, et al. Paired exome analysis of Barrett's esophagus and adenocarcinoma. Nat Genet. 2015;47:1047–1055. doi: 10.1038/ng.3343. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 42.Moriai T, Kobrin MS, Hope C, Speck L, Korc M. A variant epidermal growth factor receptor exhibits altered type alpha transforming growth factor binding and transmembrane signaling. Proc Natl Acad Sci USA. 1994;91:10217–10221. doi: 10.1073/pnas.91.21.10217. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 43.Zhang W, et al. Cyclin D1 and epidermal growth factor polymorphisms associated with survival in patients with advanced colorectal cancer treated with Cetuximab. Pharmacogenet Genomics. 2006;16:475–483. doi: 10.1097/01.fpc.0000220562.67595.a5. [DOI] [PubMed] [Google Scholar]
- 44.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet journal. 2011;17:10–12. [Google Scholar]
- 45.Li H, Durbin R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics. 2009;25:1754–1760. doi: 10.1093/bioinformatics/btp324. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 46.DePristo MA, et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat Genet. 2011;43:491–498. doi: 10.1038/ng.806. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 47.Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 48.Trapnell C, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–578. doi: 10.1038/nprot.2012.016. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 49.Trapnell C, et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat BiotechnoL. 2010;28:511–515. doi: 10.1038/nbt.1621. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 50.Danecek P, et al. The variant call format and VCFtools. Bioinformatics. 2011;27:2156–2158. doi: 10.1093/bioinformatics/btr330. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 51.Wang L, Wang S, Li W. RSeQC: quality control of RNA-seq experiments. Bioinformatics. 2012;28:2184–2185. doi: 10.1093/bioinformatics/bts356. [DOI] [PubMed] [Google Scholar]
- 52.Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–842. doi: 10.1093/bioinformatics/btq033. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 53.Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–930. doi: 10.1093/bioinformatics/btt656. [DOI] [PubMed] [Google Scholar]
- 54.Roth A, et al. JointSNVMix: a probabilistic model for accurate detection of somatic mutations in normal/tumour paired next-generation sequencing data. Bioinformatics. 2012;28:907–913. doi: 10.1093/bioinformatics/bts053. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 55.Cibulskis K, et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat BiotechnoL. 2013;31:213–219. doi: 10.1038/nbt.2514. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 56.Larson DE, et al. SomaticSniper: identification of somatic point mutations in whole genome sequencing data. Bioinformatics. 2012;28:311–317. doi: 10.1093/bioinformatics/btr665. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 57.Koboldt DC, et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 2012;22:568–576. doi: 10.1101/gr.129684.111. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 58.Wang J, et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat Methods. 2011;8:652–654. doi: 10.1038/nmeth.1628. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 59.Zhang J, et al. INTEGRATE: gene fusion discovery using whole genome and transcriptome data. Genome Res. 2016;26:108–118. doi: 10.1101/gr.186114.114. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 60.Krzywinski M, et al. Circos: an information aesthetic for comparative genomics. Genome Res. 2009;19:1639–1645. doi: 10.1101/gr.092759.109. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 61.Cingolani P, et al. Using Drosophila melanogaster as a model for genotoxic chemical mutational studies with a new program, SnpSift. Front Genet. 2012;3:35. doi: 10.3389/fgene.2012.00035. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 62.Romanel A, Lago S, Prandi D, Sboner A, Demichelis F. ASEQ: fast allele-specific studies from next-generation sequencing data. BMC Med Genomics. 2015;8:9. doi: 10.1186/s12920-015-0084-2. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 63.Li H, et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics. 2009;25:2078–2079. doi: 10.1093/bioinformatics/btp352. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 64.Stock MF, Hackney DD. Expression of kinesin in Escherichia coli. Methods Mol Biol. 2001;164:43–48. doi: 10.1385/1-59259-069-1:43. [DOI] [PubMed] [Google Scholar]
Associated Data
This section collects any data citations, data availability statements, or supplementary materials included in this article.