Skip to main content
. Author manuscript; available in PMC: 2016 Sep 23.
Published in final edited form as: Cell Syst. 2015 Sep 23;1(3):210–223. doi: 10.1016/j.cels.2015.08.015

Table 2.

Key findings and recommendations for tumor genome sequencing and analysis.

Sample and case selection
  • -

    Avoid low yield/heavily degraded DNA from FFPE if possible.

  • -

    Be aware that pathology assessments often overestimate tumor cellularity.

Matched normal samples
  • -

    Sequencing matched normal tissue is essential for removing germline variants and identifying mapping artifacts or sequencing errors.

  • -

    For hematologic cancers, skin normals should be collected at remission to reduce tumor contamination of the normal.

  • -

    For solid tumors, use blood instead of adjacent normals to avoid tumor infiltration.

  • -

    In the absence of a matched normal, use as many unmatched normal samples as possible (e.g. a pool of healthy individuals).

Library construction
  • -

    Improve coverage, reduce amplification-related errors, and improve SV detection by constructing multiple independent libraries per sample. This approach resulted in PCR error rates below those detectable from the assays that were performed (< 0.23–0.35%).

  • -

    A large amount (>1 μg) of starting input DNA allows for multiple libraries, decreases duplication rates, and enables adequate sampling of rare subclonal populations.

Sequencing platform
  • -

    Choose a platform that allows for cost-effective generation of high depth data.

  • -

    Orthogonal sequencing methods have value for confirmation of low-frequency variants.

  • -

    Single cell sequencing can be useful for resolving tumor phylogeny.

Sequencing depth
  • -

    Greater depth is needed in the case of impure tumors, tumor contamination of the normal sample, aneuploidy, and clonal heterogeneity.

  • -

    Expect non-uniform coverage across the genome. Total coverage levels may need to be increased to ensure adequate depth in certain regions (e.g. GC rich promoter regions).

  • -

    30× WGS was insufficient for inferring clonal architecture or identifying variants with <15% VAF, even in a tumor with >90% purity.

  • -

    50× WGS was insufficient to detect variants at <10% VAF, including many important for relapse.

  • -

    An increase in coverage from ~30× to ~300× (coupled with a less-contaminated normal) resulted in the identification of 4 additional subclones and over 11× as many variants in this case.

Whole genome sequencing
  • -

    WGS is essential for detection of CNVs and other SVs.

  • -

    Difficult to capture coding regions may be better covered in WGS.

  • -

    WGS enables detection of non-coding mutations that may be biologically relevant or serve as clonal markers.

Targeted Sequencing
  • -

    WGS should be accompanied by either commercial exome or custom capture for increased coverage of key cancer genes.

  • -

    “Spiking in” oligonucleotide probes allows for more coverage (>1,000×) and improved sensitivity in critical ‘hotspot’ regions (can be cancer specific or pan-cancer). We achieved ~5-fold greater coverage across 264 genes recurrently mutated in AML with little exome-wide loss of coverage.

Sequence alignment
  • -

    The choice of reference sequence and alignment algorithm impacts variant calling. VAFs calculated from the same data aligned with alternate algorithms had Spearman correlations that varied from 0.56 to 0.99.

  • -

    Local assembly of indels and realignment can produce more accurate VAF estimates, especially for multi-basepair events.

Variant calling
  • -

    Current SNV callers are not optimized for detecting low VAF events in high-depth data. Optimization of parameters may help, but new algorithms are probably needed in the long term.

  • -

    Using multiple variant callers is a viable strategy for improving performance. Intersections improve PPV, while unions improve sensitivity.

  • -

    Match the goals of a project to algorithms that provide the right balance of sensitivity and specificity.

  • -

    Indels and SVs are harder to detect – expect poorer performance.

  • -

    Samples from multiple time points increase confidence and enable detection of key low-VAF variants that are enriched during clonal evolution.

Subclonal inference
  • -

    Accurate estimation of tumor VAFs requires high depths to overcome sampling error. Plan for 500–1,000× coverage or more if detailed inference of subclonal populations is important.

  • -

    Exomes or targeted assays may not provide enough variants for accurate clonal clustering, especially in cancers with low mutation rates.

  • -

    Temporally and/or spatially separated samples aid in subclonal inference and tracking tumor evolution.

RNA sequencing
  • -

    Variants detected in both DNA-seq and RNA-seq have high confidence because they are confirmed by orthogonal library and alignment strategies.

  • -

    RNA-seq may be used to assess expression status of coding somatic variants and fusions as well as the functional impact of regulatory variants.

Overall recommended strategy Sequencing strategy will always be dependent on the goals of the project and budget, but an ideal tumor profiling study might include:
  • -

    WGS to a depth of 200–300×

  • -

    Exome sequencing to a depth of 1,000× (possibly with spike-in probes for mutational hotspots)

  • -

    Analysis with multiple alignment strategies and variant callers

  • -

    Validation of variants with custom capture and deep sequencing (1,000–10,000×)

  • -

    RNA-seq with 250–300 million mapped 2×100 reads or greater for robust integration with DNA-seq data.