Transcriptional profiling of Dictyostelium with RNA sequencing

Edward Roshan Miranda; Gregor Rot; Marko Toplak; Balaji Santhanam; Tomaz Curk; Gad Shaulsky; Blaz Zupan

doi:10.1007/978-1-62703-302-2_8

. Author manuscript; available in PMC: 2014 Jan 15.

Published in final edited form as: Methods Mol Biol. 2013;983:139–171. doi: 10.1007/978-1-62703-302-2_8

Transcriptional profiling of Dictyostelium with RNA sequencing

Edward Roshan Miranda ^1,^2,^*, Gregor Rot ^3,^*, Marko Toplak ³, Balaji Santhanam ^1,⁴, Tomaz Curk ³, Gad Shaulsky ^1,^2,⁴, Blaz Zupan ^1,³

PMCID: PMC3892559 NIHMSID: NIHMS543724 PMID: 23494306

Summary

Transcriptional profiling methods have been utilized in the analysis of various biological processes in Dictyostelium. Recent advances in high-throughput sequencing have increased the resolution and the dynamic range of transcriptional profiling. Here we describe the utility of RNA-sequencing with the Illumina technology for production of transcriptional profiles. We also describe methods for data mapping and storage as well as common and specialized tools for data analysis, both online and offline.

Keywords: Dictyostelium, RNA-sequencing, multiplexing, web-based applications, visual programming, data mining, differential expression, Orange, dictyExpress, PIPA

1. INTRODUCTION

In the past decade a significant understanding of Dictyostelid transcriptomes has been achieved thanks to techniques such as RACE (rapid amplification of cDNA ends), Sanger sequencing of cDNAs, and microarrays (1–4). The recent development of RNA-sequencing (RNAseq) has lead to further appreciation of the complexity of Dictyostelid transcriptomes and to vast improvements in transcriptome quantification (5). RNAseq is a high throughput method that employs massive parallel sequencing of cDNA fragments generated from RNA (6). The method generates millions of short sequencing reads that represent fragments of the transcriptome. These fragments are then mapped to the genome of interest or assembled de novo. The number of fragments that map to a specific gene is directly proportional to the abundance of the respective RNA in the sample. The large number of sequencing reads enables the landscaping of transcriptomes at unprecedented depth and resolution.

RNAseq has been used to improve existing gene models, including predicting exon-intron boundaries and untranslated regions, identify alternative splicing of transcripts and to discover new genes (7, 8). Determination of quantitative and qualitative changes in RNA is possible at a wide dynamic range. RNAseq has supplanted microarrays as the technique of choice for understanding genome wide expression patterns. It yields a digital output of RNA quantity, as opposed to the analog output of microarrays, and it is free of some microarray limitations, including variable hybridization kinetics and cross hybridization among different hybridization targets. Due to the high reproducibility of RNAseq, technical replications are no longer needed – only biological replications are required.

Next generation sequencing technologies have improved appreciably since their introduction, yielding improved read quality and quantity. Currently, each sequencing run yields more reads than needed for most applications, so multiplexing is employed as a means of cost reduction (9). In this chapter we describe the techniques of RNAseq, with and without multiplexing, using the Illumina platform.

mRNA accounts for about 2% of the total RNA in Dictyostelium cells so it must be enriched before the analysis. Here we describe a method that begins with the isolation of polyA⁺ mRNA by hybridization to oligo dT beads. We describe the preparation of cDNA from the enriched mRNA and the preparation of either single-sample libraries or pools of samples with multiplexing.

Analysis of RNAseq data consists of deconvolution in the case of multiplexed data, mapping the reads to the genome, and processing the data into values that represent transcript abundance. We describe the process of data analysis and storage as well as several examples of downstream data analysis, such as differential gene expression.

2. MATERIALS

2.1. Reagents

2.1.1. RNA purification and cDNA synthesis

The reagents must be RNAse free. Use disposable sterile plasticware and clean the work areas and the pipettors with RNAseZap (Ambion) before each procedure. Always wear gloves, mask and lab coat when handling RNA (see ^{Note 1}).

Water and aqueous solutions used for RNA work should be treated with diethylpyrocarbonate (DEPC) to inactivate RNAse. Add 0.1% DEPC to the solution, incubate overnight at room temperature and autoclave (15–25 min, liquid cycle). Do not DEPC-treat solutions that contain Tris.

Trizol® (Life Technologies).
10× MOPS buffer: 0.1 M MOPS, 5 mM EDTA, 25 mM sodium acetate; adjust to pH7.0 with acetic acid and treat with DEPC.
Dynabeads mRNA Purification Kit (Life Technologies) supplied with oligo(dT) beads, binding buffer, washing buffer and 10 mM Tris-HCl.
10× Fragmentation buffer (Ambion).
Stop buffer (Ambion).
Glycogen (Ambion): 5 μg/μL.
3 M sodium acetate, pH5.2, DEPC treated.
100% and 70% Ethanol.
Random hexamer primers (Invitrogen): 3 μg/μL.
100 mM dNTP set (Life Technologies).
10 mM dNTP mix. Mix 10 μL of each dNTP from the 100 mM dNTP set and 60 μL of water.
RNaseOUT (Invitrogen): 40 U/μL.
SuperScript II (Invitrogen): 200 U/μL, supplied with 5× first strand buffer and 100 mM DTT.
10× second strand buffer: 500 mM Tris-HCl, pH7.8, 50 mM MgCl₂, 10 mM DTT.
RNaseH (Invitrogen): 2 U/μL.
E. coli DNA polymerase I (Invitrogen): 10 U/μL.
Microcentrifuge test tubes (1.5-mL, 0.5-mL, 2-mL) and sterile aerosol-resistant pipette tips (10-μL, 200-μL, 1-mL) (see ^{Note 2}).

2.1.2. Single sample library preparation

Genomic DNA Sample Prep Kit (Illumina). Components of this kit can be replenished using the reagents mentioned below. Adapter oligonucleotides and PCR primers can also be ordered separately. Their sequences are available from the manufacturer.
100 mM ATP (Sigma Aldrich): in water.
10 mM dNTP mix. See Subheading 2.1.1, item 11.
T4 DNA polymerase (Invitrogen): 5 U/μL.
Klenow DNA polymerase (Invitrogen): 5 U/μL, supplied with 10× Klenow buffer.
T4 Polynucleotide kinase (Invitrogen): 10 U/μL.
1 mM dATP: dilute from the 100 mM dNTP set (see Subheading 2.1.1, item 10) in water.
DNA ligase (Invitrogen): 5 U/μL, supplied with 5× DNA ligase buffer.
25 mM dNTPs: mix equal volumes of all four dNTPs from the 100 mM dNTP set (see Subheading 2.1.1, item 10).
Phusion DNA polymerase (New England BioLabs): 2 U/μL, supplied with 5× Phusion-HF buffer.
QIAquick PCR spin kit (Qiagen) supplied with EB solution.
QIAquick MinElute kit (Qiagen) supplied with EB solution.
QIAquick gel extraction kit (Qiagen) supplied with EB solution.
100 bp DNA ladder (Life Technologies).
Agarose (Calbiochem)
1xTAE buffer (50x stock solution, per liter: 242g Tris base, 57.1 mL glacial acetic acid, 100 mL 0.5M EDTA pH 8.0)
Ethidium bromide (Sigma, 10 mg/ml stock solution)
Bioanalyzer DNA 1000 chip (Agilent).

2.1.3. Multiplexed library preparation

Agencourt AMPure XP 60 mL Kit (Beckman Coulter; this kit includes carboxyl-coated magnetic beads).
100% and 70% Ethanol.
Tween 20 (Fisher Scientific).
10× Buffer Tango (Thermo Scientific).
25 mM dNTPs (mix equal volumes of all four dNTPs from the 100 mM dNTP set (see Subheading 2.1.1, item 10).
100 mM ATP (Sigma Aldrich) in water.
T4 DNA ligase (Fermentas): 5 U/μL, supplied with 10× T4 DNA ligase buffer and 50% PEG-4000 solution.
T4 DNA polymerase (Fermentas): 5 U/μL.
T4 polynucleotide kinase (Fermentas):10 U/μL.
Bst DNA polymerase, large fragment (New England BioLabs) supplied with 10× ThermoPol reaction buffer.
Agarose (Calbiochem)
1xTAE buffer (50x stock solution, per liter: 242g Tris base, 57.1 ml glacial acetic acid, 100 ml 0.5M EDTA ph 8.0)
Ethidium bromide (Sigma, 10 mg/ml stock solution)
Quantitative PCR kit with SYBRE green such as SYBR® Green PCR Master Mix (Life Technologies).
Phusion Hot Start high-fidelity DNA polymerase (New England BioLabs) supplied with 5× Phusion HF buffer.
25 bp DNA Ladder (Life Technologies).
EB buffer, supplied with QIAquick PCR spin kit (Qiagen) or QIAquick gel extraction kit (Qiagen). This buffer can be prepared as 10 mM Tris-HCl, pH 8.5.
EBT: EB with 0.05% (v/v) Tween 20.
Oligonucleotide hybridization buffer: 500 mM NaCl, 10 mM Tris-HCl, pH 8.0, 1 mM EDTA in water.
Wide orifice pipette tips (VWR).
Hard-shell Thin walled 96 well skirted PCR plates for Quantitative-PCR (Bio-Rad)
Microseal ‘B’ Film PCR sealers ((Bio-Rad)
Kit and reagents for DNA sequencing (Illumina).
Cluster generation kit (Illumina)
Multiplexing sequencing primer kit (Illumina). Alternatively, the following primers may be used for sequencing:
1. Read 1 sequencing primer: 5′-ACACTCTTTCCCTACACGACGCTCTTCCGATCT-3′
2. Index read sequencing primer: 5′-GATCGGAAGAGCACACGTCTGAACTCCAGTCAC-3′
3. Read 2 sequencing primer: 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′
Oligonucleotides for library preparation:
- Adapter_A1: A*C*A*C*TCTTTCCCTACACGACGCTCTTCCG*A*T*C*T
- Adapter_A2: G*T*G*A*CTGGAGTTCAGACGTGTGCTCTTCCG*A*T*C*T
- Adapter_A3: A*G*A*T*CGGAA*G*A*G*C
- Primer_P1: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTT

The sequences correspond to the order from 5′ to 3′ from left to right; * indicates a phosphothioate bond. All of the oligonucleotides should be ordered as HPLC purified and dissolved in water. Ask the supplier to synthesize and purify each primer in a separate batch to avoid cross contamination. Adapters A1, A2 and A3 are dissolved at 500 μM, and Primer_P1 at 10 μM. Order the primers in a 96 well plate to facilitate multichannel pipetting. The sequences of the primers, the criteria used to design them and additional information are available in ref. (10).

2.2. Equipment

Two water incubators, one at 65ºC and one at 80ºC.
Heating blocks at different temperatures
DynaMag™-2 magnet (Life Technologies).
Agencourt SPRIPlate Super Magnet Plate (Beckman Coulter) for 96-well plates or DynaMag™-2 magnet (Life Technologies) for individual microcentrifuge tubes.
Rotisserie-style shaker/rotator with clamps for microcentrifuge test tubes (e.g. DiaMag Rotator, Diagenode).
Nanodrop spectrophotometer (Thermo scientific)
Thermal cycler such as PTC 100 (MJ Research) capable of holding 0.2-mL PCR test tubes or 0.5-mL test tubes.
Pipettors capable of dispensing 0.2 μL, 20 μL, 200 μL and 1 mL (see ^{Note 1}).
Agarose gel elctrophoresis equipment (e.g. BioRad)
UV transiluminator (e.g. Kodak)
96-well plate centrifuge (e.g. Eppendorf 5810R).
Microcentrifuge (e.g. Eppendorf 5415D)
Illumina Cluster Station
Bioanalyzer
Real time PCR machine (e.g. DNA engine Opticon 2, MJ Research)

2.3. Analysis software

PIPA (http://pipa.biolab.si), a web-based tool for sequencing data management and bioinformatics analysis.
dictyExpress (http://dictyexpress.biolab.si), a web-based interactive gene expression analysis program.
Orange (http://pipa.biolab.si), a general purpose interactive data analysis environment.

3. METHODS

3.1. RNA purification and cDNA synthesis

3.1.1. Preparation of total RNA

Dictyostelium cells are grown and developed under standard conditions (11) or as required by the desired experimental design.
Prepare total RNA using the Trizol® reagent according to the manufacturer’s recommendations (see ^{Note 3}).
Store the cell lysates in the Trizol reagent at −80ºC until all the samples are ready for the next step (see ^{Note 4}).
Dissolve the total RNA in 1× MOPS buffer.
Measure the RNA concentration using a spectrophotometer (1 AU₂₆₀ = 40 μg/μL).
Adjust the concentration to 1 μg/μL.
Store the total RNA samples in aliquots at −80ºC. Do not thaw and re-freeze the samples more than three times.

3.1.2. mRNA purification

mRNA isolation is performed using the Dynabeads mRNA Purification Kit from Life Technologies. Perform two rounds of mRNA purification to ensure that more than 90% of the sequencing reads are from mRNA. Use the same aliquot of beads twice with an intermediate cleaning step to eliminate traces of sample from the first round. We recommend using 5–50 μg of total RNA as the starting material (see ^{Note 5}).

Put 10 μg of total RNA in a 1.5-mL RNAse-free microcentrifuge tube. Adjust the volume to 25 μL with DEPC-treated water.
Incubate the sample at 65ºC for 5 min to disrupt secondary structures. Place the test tube on ice.
Aliquot 50 μL of Dynal oligo(dT) beads into a fresh 1.5-mL RNAse-free microcentrifuge tube.
Wash the beads twice with 50 μL of binding buffer. Place the microcentrifuge tube on the Dynal magnet and allow the beads to settle for 30 seconds. Once the supernatant is clear, remove it by pipetting with a plastic tip.
Resuspend the beads in 25 μL of binding buffer and add the 25 μL of total RNA from step 2. Rotate the tube at room temperature for 5 minutes, remove and discard the supernatant as described in step 4.
Wash the beads twice with 50 μL of washing buffer B as described in step 4.
Prepare for second round of purification by aliquoting 25 μL of binding buffer to a fresh 1.5-mL RNAse-free microcentrifuge tube.
Remove as much of the supernatant as possible from the beads of step 6. It is very important not to leave any supernatant in the test tube.
Add 25 μL of 10 mM Tris-HCl and incubate the samples at 80ºC for 2 min to elute the mRNA. Immediately place the test tube in the Dynal magnet stand and transfer the supernatant (mRNA) to the test tube from step 7. Add 50 μL of washing buffer B to the remaining beads.
Incubate the mRNA sample from step 9 at 65ºC for 5 min and place the test tube on ice.
Resuspend the beads from step 9 by finger flicking the test tube. Place the test tube on the Dynal magnet and remove the supernatant. Wash the beads once with 50 μL of binding buffer as in step 4 and remove the supernatant. Resuspend the beads in 25 μL of binding buffer.
Add 25 μL of the RNA sample from step 10 back into the tube from step 11. Rotate the test tube at room temperature for 5 min and discard the supernatant.
Wash the beads once with 50 μL of washing buffer B as in step 4 and remove the supernatant as in step 8.
Add 12 μL of 10 mM Tris-HCl and incubate the test tube at 80ºC for 2 min to elute the mRNA. Immediately place the test tube in the magnet stand and transfer the supernatant (mRNA) to a fresh microcentrifuge test tube.
Quantify the mRNA with a Nanodrop spectrophotometer (see ^{Note 6}).

Typically, 10 μg of total Dictyostelium RNA yield 100–200 ng of mRNA. Alternatively, one can start with 100 ng of mRNA if any other method of mRNA purification is used. Lower amounts of mRNA are also compatible with the next steps (see ^{Note 7}).

3.1.3. mRNA fragmentation

mRNA fragmentation relies on metal ion based catalysis and high temperature. Other protocols use heat alone but we have observed that Dictyostelium mRNA is surprisingly stable at high temperatures, so we optimized the combination of chemical catalysis and high temperature to produce the desired fragment size of approximately 200 bases (see ^{Note 8}). We process 8 samples at one time for fragmentation and deal with any higher number in batches.

Start with 100 ng of purified mRNA (Subheading 3.1.2). Adjust the volume to 9 μL with water.
Add 1 μL of 10× fragmentation buffer and incubate at 70ºC for 5 min.
Add 1 μL of stop buffer, mix by repeated pipetting, and place the test tube on ice.

3.1.4. Precipitation of fragmented mRNA

Transfer 11 μL of the fragmented mRNA solution from Subheading 3.1.3 into an ice-cold 1.5-mL microcentrifuge test tube.
Add 1 μL of 3M sodium acetate pH 5.2, 2 μL of glycogen (5 μg/μL) and 30 μL of 100% ethanol.
Mix by repeated pipetting and incubate at −80ºC for 30 min.
Centrifuge at 18,000 x g in an Eppendorf centrifuge for 25 min at 4ºC. A pellet should be visible.
Discard the supernatant, wash the pellet once with 70% ethanol (do not disturb the pellet during the addition of 70% ethanol) and centrifuge for 10 min as in step 4.
Discard the supernatant and air-dry the pellet for 2–3 min.
Resuspend the pellet in 10.5 μL of water. The pellet should be easily soluble.

3.1.5. First strand cDNA synthesis

Add 1 μL of random hexamer primers (3 μg/μL) into the sample from Subheading 3.1.4.
Incubate at 65ºC for 5 min; snap cool on ice.
In the meantime prepare the following mix:

Reagent Volume (μL) per sample

5× first strand buffer 4

100 mM DTT 2

10 mM dNTP mix 1

RNaseOUT (40 U/μL) 0.5

Open in a new tab
Add the mixture of step 3 (7.5 μL) to the test tube containing the mRNA sample.
Mix well and incubate at 25ºC for 2 min.
Add 1 μL of SuperScript II (200 U/μL), mix by repeated pipetting.
Incubate in a thermal cycler as follows: 10 min at 25ºC, 50 min at 42ºC, 15 min at 70ºC, and then 4ºC until the next step. If a thermal cycler without a heating bonnet is used, centrifuge the reaction tubes to collect any condensate before proceeding to the next step.

3.1.6. Second strand synthesis

RNaseH is used to partially digest the template RNA. The RNA fragments are then used as primers to initiate the synthesis of the second DNA strand by DNA polymerase I. Since we deal with DNA from here on, the following reagents need not be DEPC-treated.

Place the test tubes from Subheading 3.1.5 on ice and add 61 μL of ice cold water.
Add 10 μL of 10× second strand buffer and 3 μL of 10 mN dNTP mix.
Incubate on ice for 5 min.
Add 1 μL of RNaseH (2 U/μL) and 5 μL of DNA polymerase I (10 U/μL).
Mix gently by repeated pipetting and incubate at 16ºC for 2.5 h.
Purify the resulting double stranded DNA either using a Qiagen PCR purification spin kit or Solid Phase Reversible Immobilization (SPRI) as described below (see Subheading 3.3.2)(see ^{Note 9}).

3.2. Single sample library preparation

In this section we describe the preparation of libraries for RNA-sequencing using the cDNA obtained in Subheading 3.1.6 and the adapters designed and marketed by Illumina. This technique of library preparation can be considered when exceedingly high numbers of reads are desired for a given sample. When the library is prepared using the following method, a single sample library is sequenced per lane in an Illumina flow cell. For applications such as differential expression and transcriptional phenotype analysis, a sufficient number of reads can result from pooling of multiplexed samples, which saves considerable time and money. Preparation of a multiplexed library is described in Subheading 3.3.

We recommend performing all the reactions detailed below with a positive control DNA sample along with the cDNA sample from Subheading 3.1.6. The positive control helps determine the success of the library preparation. It can be 500 ng of a specific 200–300 bp DNA fragment from a PCR reaction dissolved in 10 mM Tris-HCl, pH 8.5. The positive control DNA should be generated with plain (unmodified) primers.

3.2.1. End repair

The cDNA from Subheading 3.1.6 should be eluted in 30 μL of EB solution.

Prepare the following reaction mix:

Reagent	Volume (μL) per sample	Final concentration in 100 μL reaction
Water	27
5× T4 DNA ligase buffer	20	1×
10 mM ATP	10	1 mM
0 mM dNTP mix	4	0.4 mM
T4 DNA polymerase (3 U/μL)	3	0.09 U/μL
Klenow DNA polymerase (5 U/μL)	1	0.05 U/μL
T4 polynucleotide kinase (10 U/μL)	5	0.5 U/μL

Open in a new tab

Add 70 μL of the reaction mix to 30 μL of the purified cDNA and mix by finger flicking the microcentrifuge tube.
Incubate at 20ºC for 30 min.
Purify the end repaired DNA with a QIAquick PCR spin column and elute with 32 μL of EB solution.

3.2.2. Addition of a single A base

Prepare the following reaction mix:

Reagent Volume (μL) per sample Final concentration in 50 μL reaction

10× Klenow buffer 5 1×

1 mM dATP 10 0.5 mM

Klenow DNA polymerase (5 U/μL) 3 0.33 U/μL

Open in a new tab
Add 18 μL of the reaction mix into the 32 μL of end-repaired DNA from Subheading 3.2.1.
Incubate at 37ºC for 30 min.
Purify the resulting DNA with a QIAquick MinElute column, and elute in 24 μL of EB solution.

3.2.3. Adapter ligation

Prepare the following reaction mix (see ^{Note 10}):

Reagent Volume (μL) per sample Final concentration in 50 μL reaction

water 10

5× DNA ligase buffer 10 1×

Adapter oligo mix 1

DNA ligase (1 U/μL) 5 0.1 U/μL

Open in a new tab
Add 26 μL of reaction mix to the microcentrifuge tube containing 24 μL of DNA from Subheading 3.2.2 and mix by finger flicking.
Incubate at room temperature for 15 min.
Purify the adapter-ligated DNA with a QIAquick MinElute column and elute in 15 μL of EB solution.

3.2.4. Gel purification

Prepare a 2% agarose gel in 1× TAE buffer such that the thickness of the gel is about 0.5 cm. Include ethidium bromide in the gel.
Load 15 μL of the sample from Subheading 3.2.3 next to a well containing 100 bp DNA ladder (see ^{Note 11}). For handling multiple samples, leave at least 2 blank wells between samples to prevent cross contamination.
Run the gel at 100 V until the 100 bp and 200 bp bands of the DNA ladder are well separated.
Cut a gel slice at 200 bp +/− 25 bp and purify the cDNA with a QIAquick gel extraction kit.
Elute cDNA in 30 μL of EB.
Dilute the positive control DNA sample in 75 μL of EB.
Prepare a 2% agarose gel in 1× TAE buffer such that the gel thickness is about 0.5 cm. Include ethidium bromide in the gel.
Load 30 μL of the diluted positive control DNA next to 150 ng of positive control DNA that has not been subjected to library preparation.
Load the 100 bp ladder in a separate well, run the gel as in step 3. Successful reactions should result in a 70 bp increase in the size of the treated positive control DNA due to adapter ligation. Alternatively, run the positive control samples on an Agilent Bioanalyzer DNA 1000 chip (see ^{Note 12}).

3.2.5. PCR Enrichment

Set up the following PCR reaction mix and aliquot a 20 μL portion into a PCR tube:

Reagent	Volume (μL) per sample	Final concentration in 50 μL reaction
water	7
5× Phusion-HF Buffer	10	1×
PCR primer 1.1	1
PCR primer 2.1	1
25 mM dNTP mix	0.5	0.25 mM
Phusion DNA polymerase	0.5	0.02 U/μL

Open in a new tab

Add 30 μL of the DNA from the Subheading 3.2.4 and mix by repeated pipetting.
Incubate with the following PCR program: 30 sec at 98ºC; 15 cycles of: 10 sec at 98ºC, 30 sec at 65ºC, 30 sec at 72ºC; a final extension cycle of 5 min at 72ºC.
Purify the resulting DNA with a QIAquick PCR spin column and elute in 30 μL of EB solution.
Prepare a 2% agarose gel containing ethidium bromide in 1x TAE such that the thickness of the gel is about 0.5 cm and load 25 μL of PCR-enriched positive control DNA next to 30 μL of the remaining positive control DNA obtained after adapter ligation and 150 ng of the original positive control DNA. Include a well containing 100 bp DNA ladder (see ^{Note 11}).
Run the gel at 100 V until sufficient resolution is obtained between 100 and 200 bp of the 100 bp ladder. A distinct shift in the positive control DNA size should be visible compared to the adapter ligated positive control DNA after PCR enrichment.
Analyze 1 μL of the PCR-enriched DNA on an Agilent Bioanalyzer DNA 1000 chip to assess the quality of the final product and to determine the DNA concentration. Successful preparations should yield a distinct band at ~200 bp. This material is processed further for cluster generation on the Illumina Cluster Station using the manufacturer’s recommended protocol.

3.3. Multiplexed library preparation

In this section we describe a multiplexing technique in which up to 228 samples can be pooled into one lane for sequencing. We adopted and standardized this method for transcriptomic sequencing from ref. (10), which was originally described for pooling genomic samples. The overall strategy of library preparation is outlined in Fig. 1. In this method, DNA barcodes that label unique samples are attached to one of the adapters. Barcoding is performed at the final step of indexing PCR amplification. These barcodes are identified in a separate short sequencing run after the sequencing of the actual cDNA. We have successfully performed as many as 24-fold multiplexing. Pooling fewer than 4 libraries is not recommended (10).

Lines indicate DNA strands. Grey indicates the target DNA molecules to be sequenced and black indicates adapters. The adapters are ligated to the ends of the target DNA molecules and filled in to make them blunt-ended. Indexing is performed at the last step of library PCR amplification. The indices are depicted as striped segments within the adapters. In the sequencing reaction, they are identified in a separate short sequencing run (Index read) after the initial sequencing of the DNA (Read 1). Adapted from ref. (10).

3.3.1. Preparation of adapter mix

The following reaction produces adapter mixes that are sufficient for 500 samples.

Assemble the following hybridization reactions in separate PCR tubes for each hybridization mix:

Reagent	Volume (μL)	Final concentration in 100 μL reaction
Hybridization mix for adapter A1 (200 μM):
Adapter_A1 (500 μM)	40	200 μM
Adapter_A3 (500 μM)	40	200 μM
Oligo hybridization buffer (10×)	10	1×
Water	10
Hybridization mix for adapter A2 (200 μM):
Adapter_A2 (500 μM)	40	200 μM
Adapter_ A3 (500 μM)	40	200 μM
Oligo hybridization buffer (10×)	10	1×
Water	10

Open in a new tab

Mix the contents by repeated pipetting.
Incubate the reactions in a thermal cycler with a heating bonnet for 10 sec at 95ºC, followed by a ramp down from 95ºC to 12ºC at a rate of 0.1ºC/sec.
Combine both reactions to obtain a ready-to-use adapter mix (100 μM each adapter). Adapters can be aliquoted into 4 tubes, stored at −20ºC and thawed repeatedly for subsequent use.

3.3.2. Reaction cleanup using Solid Phase Reversible Immobilization (SPRI)

Purify the cDNA from Subheading 3.1.6 using carboxyl-coated magnetic beads (SPRI beads) as explained below. The given ratio of the SPRI beads to the volume of DNA solution is ideal for DNA molecules above 150 bp. The size cut off for the cleanup reactions can be controlled by varying the amount of beads (refer to the manufacturer’s protocol). A 25 bp DNA ladder may be used as a control to standardize the purification protocol. This procedure can be performed using 96-well plates or individual microcentrifuge tubes, depending on the application. A magnetic apparatus suitable for tubes, such as a DynaMag™-2 magnet, should be used in place of a magnetic plate if individual microcentrifuge tubes are used.

Resuspend the stock solution of SPRI beads by vortexing. Add 0.05% Tween 20 to the suspension to facilitate subsequent pipetting.
Add the SPRI bead suspension to the reactions as follows, using wide orifice pipette tips.
1. Add 1.8 volumes of the SPRI bead suspension to each cDNA sample.
2. Seal the wells with caps and vortex for several seconds. Ensure that the beads are uniformly suspended.
3. Let the plate stand for 5 min at room temperature.
4. Collect the liquid to the bottom of the wells by brief centrifugation in a plate centrifuge to 800 x g. Avoid cross contamination while opening and closing the caps.
Place the plate on a 96-well magnetic plate, and let it stand for 5 min to separate the beads from the solution. Discard the supernatant without removing the beads.
Leave the plate on the magnetic rack, add 150 μL of 70% ethanol to wash the beads, wait 1 min and then remove the supernatant.
Repeat step 4.
Remove residual traces of ethanol using a multichannel pipette. Allow the beads to air-dry for 20 min at room temperature without caps.
Elute as follows:
1. Add 30 μL of EBT to the wells and seal the plate with caps.
2. Remove the plate from the magnetic rack and resuspend the beads by vortexing.
3. Wait 1 min and then collect the liquid in the bottom of the wells by briefly centrifuging the plate at 800 x g. The beads may become clumpy but this appearance does not affect DNA recovery.
4. Place the plate back on the 96-well magnetic plate, wait 1 min, and transfer the supernatant to a new 96-well reaction plate. Carryover of small amounts of beads will not adversely affect subsequent reactions.

3.3.3. End repair

We recommend performing all the reactions with a positive control DNA sample and a negative control along with the cDNA sample from Subheading 3.3.2. The positive control DNA will help determine the success of library preparation. It can be 300 ng of any DNA of about 200–300 bp dissolved in 10 mM Tris HCl, pH 8.50. If produced by PCR, the positive control DNA should be generated by Taq-DNA polymerase with unmodified primers and purified as in Subheading 3.3.2. The negative control is 30 μL of EB solution.

Prepare the following reaction master mix for the required number of reactions:

Reagent	Volume (μL) per sample	Final concentration in 50 μL reaction
Water	10.8
Buffer Tango (10×)	5	1×
dNTPs (25 mM each)	0.2	100 μM each
ATP (100 mM)	0.5	1 mM
T4 polynucleotide kinase (10 U/μL)	2.5	0.5 U/μL
T4 DNA polymerase (5 U/μL)	1.0	0.1 U/μL

Open in a new tab

Add 20 μL of the reaction mix into 30 μL of each cDNA sample from Subheading 3.3.2.
Mix the solutions thoroughly by repeated pipetting using a multichannel pipette. Avoid vortexing after the addition of enzymes.
Incubate at 25ºC for 15 min followed by incubation at 12ºC for 5 min.
Clean up the reaction using SPRI beads as in Subheading 3.3.2 and elute the end repaired DNA in 20 μL EBT solution.

3.3.4. Adapter ligation

Prepare a master mix of adapter ligation reagents for the required number of reactions. Pipette PEG using a wide orifice pipette tip. Vortex the reaction mix containing all the reaction ingredients before adding the enzyme, to mix the viscous PEG. Dissolve any white precipitate in the ligase buffer by vortexing before adding it to reaction mix. If the amount of template DNA is higher than 100 ng, increase the amount of adapter mix to 1 μL.

Reagent	Volume (μL) per sample	Final concentration in 40 μL reaction
Water	10.6
T4 DNA Ligase buffer (10×)	4	1×
PEG-4000 (50%)	4	5%
Adapter mix from Subheading 3.3.1 (100 μM each)	0.4	1 μM each
T4 DNA ligase (5 U/μL)	1	0.125 U/μL

Open in a new tab

Add 20 μL of master mix to 20 μL of end-repaired DNA from Subheading 3.3.3.
Incubate at 22ºC for 30 min.
Clean up the reaction using SPRI beads as in Subheading 3.3.2 and elute with 20 μL of EBT solution.

3.3.5. Adapter fill-in reaction

Prepare a master mix for the required number of reaction as shown below.

Reagent	Volume (μL) per sample	Final concentration in 40 μL reaction
Water	14.1
ThermoPol reaction buffer (10×)	4	1×
dNTPs (25 mM each)	0.4	250 μM each
Bst polymerase, large fragment (8 U/μL)	1.5	0.3 U/μL

Open in a new tab

Add 20 μL master mix to the adapter-ligated DNA from Subheading 3.3.4.
Incubate at 37ºC for 20 min.
Clean up the reaction using SPRI beads as in Subheading 3.3.2 and elute with 20 μL of EBT solution.

3.3.6. Library quality control and characterization

Prepare a 2% agarose gel in 1× TAE buffer such that the thickness of the gel is about 0.5 cm. Include ethidium bromide in the gel.
Load 10 μL of the treated positive control DNA next to the original positive control DNA to verify the success of the library preparation reactions. Also load the 10 μL of negative control DNA. Include a well containing 100 bp DNA ladder (see ^{Note 11}).
Run the gel at 100 V until sufficient resolution is obtained between 100 and 200 bp of the 100 bp ladder. Successful library preparation will cause the positive control DNA size to shift by 67 bp (see ^{Note 13}). We recommend carrying over the positive control DNA through the next step of indexing PCR and running another 2% gel after the final step. Expect to see a further 36 bp shift in the DNA size after incorporation of the index oligonucleotides.

3.3.7. Library quantification

Quantify the library by measuring the DNA concentration by quantitative PCR. We recommend using a commercially available quantitative PCR kit containing SYBRE green. This step will ensure equal representation of samples during pooling for multiplexed sequencing.

Use a previously quantified indexed library, if available, as a positive control. Dilute this positive control sample in TE buffer to yield an adequate range of concentrations in order to quantify samples that are at least two fold on either side of the probable library concentration. We recommend a range 10⁻⁸ to 10⁻¹⁴ g/μL.
If no such library is available, positive control DNA from Subheading 3.3.5 can be amplified using indexing PCR primers as in Subheading 3.3.8 and purified as in Subheading 3.3.2. Determine the DNA concentration of the positive control using a spectrophotometer.
Use 1 μL of the library for quantification in a 30 or 50 μL reaction condition. Use 1 μL of positive control DNA at different dilutions as mentioned in step 1 in a 30 or 50-μL reaction for producing a standard curve to quantify the samples. Amplify the library, the positive control and the negative control using Primer P1 and one of the indexing primers. Use 60ºC as the annealing temperature during the quantitative PCR cycle.

The negative control mentioned in section 3.3.3., which is processed along with the positive control DNA through every step of the library preparation, should yield at least two fold less DNA than the library samples. The positive control library DNA can be used to measure the degree of DNA carryover from previous reactions and purifications.

3.3.8. Indexing PCR and sample pooling

Use equal amounts of DNA from each sample for the indexing PCR. A small portion of the sample DNA is sufficient since the number of amplification cycles can be altered to suit the amount of starting material. We usually perform PCR using 0.1 to a 10 ng of template DNA. This strategy allows saving template DNA in case the indexing PCR reaction fails with the current barcode and a different barcode has to be chosen. Run positive control DNA side by side with the original positive control DNA and pre-indexed positive control DNA to test the success of the library preparation reactions.

Prepare the master mix for a sufficient number of reactions:

Reagent	Volume (μL) per sample	Final concentration in 50 μL reaction
Water	37.1 – A
Phusion HF buffer (5×)	10	1×
dNTPs (25 mM each)	0.4	200 μM each
Primer_P1 (10 μM)	1	200 nM
Phusion Hot Start High-Fidelity DNA Polymerase (2 U/μL)	0.5	0.02 U/μL
Add separately to each well Indexing primer (10 μM)	1	200 nM
Template DNA (library)	A

Open in a new tab

Add the master mix to each well and perform PCR with the following temperature profile: initial denaturation at 98ºC, 30 sec; denaturation at 98ºC, 10 sec; annealing at 60ºC, 20 sec; elongation at 72ºC, 20 sec; final extension at 72ºC, 10 min. The number of cycles that would result in a plateau of the PCR reaction can be determined from the quantitative PCR step in Subheading 3.3.7. Alternatively, adjust the cycle number depending on the template DNA concentration as follows: ≥100 ng: 10 cycles; ≥10 ng: 12 cycles, ≥1 ng: 15 cycles, ≥100 pg: 18 cycles.
Clean up the reaction using SPRI beads as in Subheading 3.3.2 and elute the indexed DNA in 40 μL of EB buffer (see ^{Note 14}).
Remove any leftover magnetic beads before pooling of the samples.
Quantify the indexed library. Performing quantitative PCR is the best way to quantify the indexed library, but spectrophotometric quantification may suffice. Pool equal quantities (100–300 ng) of library DNA from each sample. Analyze 1 μL of the pooled product on an Agilent Bioanalyzer DNA 1000 chip to assess the quality of the final product and to quantify the DNA concentration. All the samples should yield similar DNA concentrations at the end unless there was a significant difference in fragment size between the samples. This material is processed further for cluster generation on the Illumina Cluster Station using the manufacturer’s recommended protocol. Most laboratories (including ours) submit their materials to a core facility for Illumina sequencing. This material is ready for submission to the sequencing service for the Illumina sequencing procedure.

3.4. Multiplexing: simulation and empirical results

Transcriptome profiling data can be used for investigating multiple patterns of individual gene expression as well as a molecular phenotyping tool (5, 12). The vast amounts of data produced by each sequencing run may sometimes exceed the need, especially for molecular phenotyping and for the analysis of transcript abundance. Multiplexing allows processing of many samples in one sequencing run, thus reducing the cost per sample. The assumption in multiplexing is that the loss of information is uniform across all genes, but we were not sure whether the Dictyostelium transcriptome, with its uniquely high A:T content, may behave differently. We tested this assumption by simulations and empirically.

We first analyzed the potential effect of multiplexing by simulation on previously published non-multiplexed data. We then performed a direct experiment with 24-fold multiplexing, which matched our experimental needs, using the RNA samples that were used to obtain the non-multiplexed data. The non-multiplexed dataset was obtained by collecting RNA samples at 4-h intervals during the 24-h developmental program in two independent replicates in D. discoideum and the mRNA samples were analyzed using RNAseq (5). To calculate the similarity between the transcriptional profiles at different time points, we performed hierarchical clustering on the expression vectors consisting of all the genes from each time point and visualized the results as a dendrogram (Fig. 2) (5). The expression vectors from each of the time points were scaled to one million counts of all the polyA⁺ genes, averaged between the two replicates and log transformed to minimize the effects of outliers. We used Pearson’s correlation (PC) to calculate the distance (D=1-PC) and complete linkage as the clustering criterion. Two objects (individual time points or joints) are joined by a horizontal line if they are more similar to one another than to any other object in the dataset. The vertical distance between objects is inversely proportional to the similarity between them. The horizontal distances in the dendrogram are meaningless.

The dendrograms depict the distances between the transcriptional profiles at each of the time points (hours). (a) Samples analyzed by RNAseq without multiplexing (5). (b) Simulated data at 24× multiplexing. Simulation was performed on the data used to generate panel a. (c) Samples analyzed by RNAseq with 24× multiplexing. The RNA samples used to generate the data for panel a were multiplexed 24 fold and sequenced.

We simulated multiplexed data by assuming equal loss of information from all samples. We performed hierarchical clustering on the simulated multiplexed data and observed that the structures of the dendrograms obtained were essentially identical to those obtained with the non-multiplexed data up to 512-fold multiplexing. Fig. 2b shows the similarity between time points of the simulated multiplexed data with 24-fold multiplexing. Though there is no theoretical limit to multiplexing 512-fold, the protocol allows only up to 228-fold multiplexing.

For the empirical test, the mRNA samples that were previously analyzed without multiplexing were analyzed with 24-fold multiplexing. We performed hierarchical clustering on the multiplexed data and visualized the similarities between the different time points using dendrograms. We observed that the structure of the empirical data (Fig. 2c) was similar to that obtained when no multiplexing was done (Fig. 2a). The only exception was clustering of the 16-h sample with the 8–12-h clade in the original and simulated data, whereas the 16-h sample was clustered with the 20–24-h clade in the empirically multiplexed data. In either case, the temporal order of the time points was correct. These results indicate that multiplexing does not introduce systematic errors into the data.

As the sequencing technology is improving regularly, we are currently able to obtain more data from each one of the multiplexed samples than we were able to obtain from a single sample in the non-multiplexed method just 2 years ago (5). In the future we may be able to increase the fold of multiplexing further.

3.5. Software tools

It is nearly impossible to provide a complete protocol for analyzing RNAseq data because the methods vary with the research needs. We therefore provide a few examples of routine analyses and the tools we use to perform them.

3.5.1. Input data

The pipeline’s principal input are next-generation sequencing (NGS) reads in QSEQ or FASTQ format:

line1	@1
line2	GAGACCCTCTACAATTCAATGAAAAAGATTTTAGCTTTACCAGAGGATGT
line3	+
line4	bbbeeeeegggggiihiagcgiighhdggffhiiaefgcc’ebghffhii

Open in a new tab

where line1 is sequence identifier, line2 is raw nucleotide sequence, line3 is sequence identifier/description and line4 is quality values. If reads are different from the reference data used by the pipeline they can be complemented with the sequence of the reference genome in FASTA format:

line1	>chromosome_1
line2	TTTGGTACAAATGGTTTAACTTCTTCTGGCATACGAAGAGCAATTTCACC...
line3	>chromosome_2
line4	GTTCAAGAAGCCAAACAACAAACCGGCGCTAATGCCACAGTTATTTATGT...

Open in a new tab

and genome annotation (gene features with their locations in GTF format, for example, the position of 3 exons, gene DDB_G0267698):

1 dictyBase exon 624027 624219 . - . gene_id “DDB_G0267698”; transcript_id “DDB0305284”;

1 dictyBase exon 623830 623910 . - . gene_id “DDB_G0267698”; transcript_id “DDB0305284”;

1 dictyBase exon 623530 623627 . - . gene_id “DDB_G0267698”; transcript_id “DDB0305284”;

Open in a new tab

3.5.2. PIPA: a Dictyostelium RNAseq data management pipeline

PIPA (http://pipa.biolab.si, Fig. 3) is a web-based software tool for NGS data management and bioinformatics analysis. Its main task is to manage, map and preprocess the data. PIPA supports data storage and management, experiment annotation and bioinformatics analysis including de-multiplexing, sequence mapping, estimation of transcript abundance, differential expression analysis and quality control. It uses a server-based architecture, in which the data analysis runs on the server and the results are rendered in an interactive web-client with a graphical user interface. PIPA employs standard bioinformatics procedures and implementations, such as FASTQC (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/), Bowtie (13) and Bioconductor (14). The results (mapped reads, counts and transcript abundance) can be either downloaded and analyzed by a third party program or analyzed in dictyExpress (15) or Orange (16), which can access the data directly.

A view of a list of experiments with raw data, mapping information and gene expression (background grids) and a display of the RNA-seq read distribution for a selected data set (center).

3.5.3. dictyExpress: web-based gene expression analytics

The web-based interactive gene expression analysis program dictyExpress (http://dictyexpress.biolab.si) can query PIPA and render either public or proprietary gene expression data. Its analytics toolbox (Fig. 4) includes visualization of expression profiles, enrichment analysis of Gene Ontology (GO) terms, hierarchical clustering, search of co-expressed profiles, and navigation through gene co-expression networks.

Experiment selection (top left), enrichment analysis (top center), co-expression network display (top right), hierarchical clustering (bottom left) and display of gene expression profiles (bottom right).

3.5.4. Orange with a bioinformatics add-on: a visual programming suite for gene expression data analysis

Orange (http://orange.biolab.si, Fig. 5) is a general-purpose interactive data analytics environment, where data flow schemas can be built from computational units called widgets. Gene expression analysis is implemented through the bioinformatics add-on. The bioinformatics widgets implement various data analysis and visualization tasks, including gene selection, enrichment analysis, exploration of KEGG pathways (http://www.genome.jp/kegg), and access to publicly available data such as Biomart (www.biomart.org) and GO (17). Each Orange widget accepts input data and provides output results. Widgets can also interconnect with other visualization, network exploration and data mining widgets from the Orange data mining toolbox to compose sophisticated data analysis schemas.

Wild-type *Dictyostelium* gene expression data from PIPA are fed to the “Gene Selection” widget. The selected genes are analyzed for term enrichment in Gene Ontology, where a subset of genes is chosen and for which a KEGG pathway is displayed. The other branch of the schema computes and displays differences between expression profiles at different stages of development.

3.6. Data processing

3.6.1. Data input and management in PIPA

Login to PIPA, go to the ‘Run PIPA’ link.
Upload raw sequence data in FASTQ/FASTA format from a local data file or specifying a remote server address (using the ‘Upload’ button in the Library pane).
De-multiplex the data, if required (Library pane, ‘De-multiplex’ button).

3.6.2. Annotation in PIPA

Select a single experiment in the Data pane and click ‘Edit’.
Choose an annotation format (e.g. Dictyostelium) and populate the field values (e.g. Experiment name, Time point, Species, etc).
If a new field is required, edit the annotation format in ‘Settings/Annotation formats’. Add the desired field (select its type: string, number, date) and position it in the field list.

3.6.3. Data mapping in PIPA

Select experiments and initiate mapping to the chosen reference genome. Select the desired mapping parameters and features (e.g. iterative trimming of reads from the 3′ end).
Explore mapping statistics including: the number of uniquely mapped reads (single-hits) (N_UNIQUE), the number of unmapped reads (N_NOTMAPPED) and the number of reads with multiple mappings (N_MULTIPLE). The system computes three alternative expression values for each gene:
1. Exp_RAW – the number of reads uniquely mapped to gene exons
2. Exp_RPKM – raw gene expression scaled by exon length (Reads Per Kilobase of exon model per Million mapped reads) where:
  - Exp_RPKM = 10⁹ * Exp_RAW/(N_UNIQUE * Exon_LENGTH)
  - Exon_LENGTH = length of gene exons (nt)
  - N_UNIQUE = total number of all uniquely mapped reads from the experiment, excluding the non-polyadenylated genes
3. Exp_MAP – same as Exp_RPKM, but scaled by the uniquely mapable part of the exons: all possible subsequences of the reference genome (of the same length as reads in raw data) are mapped back to the reference genome, and Exon_MAPPABLE is the number of uniquely mapped sequences to the exons
  - Exp_MAP = 10⁹ * Exp_RAW/(N_UNIQUE * Exon_MAPPABLE)
Explore gene expression values for individual experiments, view read alignments together with gene features (exons, coverage) in gbrowse2 (http://gmod.org/wiki/GBrowse) or download BAM files (includes all mapping results).

3.7. Data analysis

3.7.1. Differential expression analysis in PIPA

Create a new differential expression study by clicking on ‘Differential expression’.
Select experiments for condition A and condition B and choose the analysis method (baySeq, DESeq).
Differentially expressed genes are shown in the results grid.

3.7.2. Expression analyses in dictyExpress

dictyExpress contains 7 interconnected components. Selecting a gene in one of the components highlights it in all the others and pressing the ‘Update’ button in any component results in propagation and commitment of the selected set in all the other components. We describe three options for exploration but there are many other ways to select and analyze genes or groups of genes.

3.7.2.1. Searching for genes by name

In the ‘Gene Expression Query’ component, select an experiment in the upper window (e.g. D. discoideum strain AX4 grown on K.a.)
In the same component, enter the desired gene names in the ‘Gene selection’ window. An interactive menu allows gene selection from a list.
Press the ‘update’ button to propagate the selection to the other components.
Use the green arrow buttons to return to previous selections.

3.7.2.2. Searching for genes by expression pattern

In the ‘Expression Profile’ component press the ‘Freehand’ button.
Place the brush cursor in the graph window and draw the desired pattern of expression.
Press the ‘Freehand’ button again and then press the ‘Find similarly expressed’ button. A new window will appear with gene names.
Select the desired genes (up to 30) and press update.

3.7.2.3. Selecting differentially expressed genes

In the ‘Prespore/Prestalk Differential Expression Analysis’ component, select the desired comparison (the default is D. discoideum prespore cells vs. prestalk cells)
Each spot in the volcano plot represents a gene. The x-axis shows the log₂ of the ratio between the selected samples and the y-axis represents the degree of confidence. Select a few spots of interest by pointing and clicking on a spot or on a group of spots.
Select up to 30 genes from the pop-up box and press the ‘Update’ button.

3.7.3. Accessing PIPA data in Orange

Select the ‘Bioinformatics’ tab in Orange
Place the PIPA widget (Fig. 6) on the canvas and open it by double clicking. The default settings access our published data. To access private data provide your PIPA user name and password at the bottom left corner.
Select the expression type (optional) and specify the type of data transformation. Choose ‘Average Replicates’ to output gene-wise median among replicates, and ‘Logarithmic Transformation’ to log-transform gene expressions (gene expression x is transformed to log₂(x+1)).
Select experiments. You may use the ‘Search’ window to find experiments that match terms such as name, species and strain. After selection, click the ‘Commit’ button to initiate data transfer from the server and place the data in the widget output (see ^{Note 15}).
Optionally connect the output of the PIPA widget to the input of a ‘Data Table’ widget. The Data Table shows gene expression values with the experiment labels on top and gene IDs in the rightmost column (Fig. 6)(see ^{Note 16}).

Data selection and downloading with the PIPA widget (left). The gene expression data for the selected experiments are shown in the Data Table widget (right).

3.7.4. Quality control in Orange

Load the expression data without replicate averaging (see above). Connect the expression data to the ‘Quality Control’widget.
In ‘Quality Control’, select the labels shared by experiments in an experimental group (e.g., wild type, or a specific mutant), and a distance metric to compute the gene expression profile distances between different replications of the experiment groups.
Explore the results. The widget shows distances between one instance (a reference) in the experiment group and the other instances of the same group. For comparison, distances to all other experiments are visualized as well (Fig. 7). Double clicking on one of the experiments changes it to be the reference.
Intuitively, replicates of the same experiment should appear closer to each other than to replicates outside the group. Clear outliers indicate irreproducible samples that can be removed from further analysis either by deselecting them in the PIPA widget or by choosing reproducible experiments in the ‘Select Attributes’ widget. Such experiments should also be annotated accordingly in PIPA (see Subheading 3.6.1).

The tooltip shows the experiment labels.

3.7.5. Gene expression data analysis in Orange

3.7.5.1. Estimation of gene expression profile distances or distances between genes

Connect the expression data (e.g., from the ‘PIPA’ widget) to the ‘Attribute Distance’ widget to compute distances between expression profiles, or to the ‘Example Distance’ widget for distances between genes.
Select a distance measure in the distance widget (e.g., ‘Euclidean distance’, ‘Spearman correlation’, ‘Pearson correlation’).
Connect the output of the distance widget to one of the widgets for visualization of distances (e.g., ‘Distance Map’, ‘Hierarchical Clustering’, ‘MDS’).

3.7.5.2. Estimation of genotype-specific gene expression profile distances

Connect the expression data (e.g. from the ‘PIPA’ widget) to the ‘Genotype Distances’ widget.
In the ‘Genotype Distances’ widget, select the labels shared by the experiments in a group with the same genotype, and labels by which to sort experiments within groups. Select a distance metric. Press ‘Compute’ to initiate distance estimation.
Visualize the distances as in Subheading 3.7.5.1, step 3.

3.7.5.3. Gene Ontology (GO) enrichment analysis

Connect the expression data (e.g. from the ‘PIPA’ widget) to the ‘Gene Selection’ widget, which enables gene selection based on differential expression. Differentially expressed genes can also be selected with the ‘Volcano Plot’ widget. In the ‘Volcano Plot’ widget, select target labels and then select genes on the graph. The marked genes will appear in the widget output.
In the ‘Gene Selection’ widget (Fig. 8), select the scoring method and the target labels. After the scores are computed, a histogram is shown in the widget main area.
Choose a cutoff point, either according to the p-value obtained from permutation tests, or specify a number of highest-ranked genes. Click the ‘Commit’ button to send the data to the widget output.
Connect the ‘Gene Selection’ to the GO Browser. In the GO Browser, choose the correct organism and select a GO aspect to analyze. If no GO terms are found, increase the P-value or reduce the term size threshold (‘Filter’ tab). The default reference set for the computation of enrichment includes all the genes of the given organism. To use a custom reference set, connect a customized reference set of Genes to the ‘Reference’ input of the GO Browser and choose the ‘Reference set (input)’ option.
The enriched terms are displayed in a tree. You may select a term for further analysis. Expression data with genes from the selected terms will appear in the widget output. Analyze these genes through either ‘Gene Info’, ‘KEGG’, ‘Data Table’ or other Orange widgets.

The highest-ranked 1000 genes are selected.

3.7.5.4. Gene Set Enrichment Analysis (GSEA)

Connect the expression data to the ‘GSEA’ widget. Unlike the GO Browser, this widget does not require a preselected subset of genes.
In the ‘GSEA’ widget, choose experiments that belong the two groups you want to compare. Select gene sets in the ‘Gene sets’ tab and click ‘Compute’.
A list of enriched gene sets is displayed. Choose a gene set for additional analysis.

3.7.5.5. Visualization of distances with a distance map

Connect the output of ‘Attribute Distance’, ‘Example Distance’ or ‘Genotype Distances to the ‘Distance Map’ widget (e.g. Fig. 5, lower left).
Observe the distances. Optionally sort the items and display the results of clustering. If an area in the distance map is selected, the widget outputs the respective data subset.

3.7.5.6. Visualization of distances with multi dimensional scaling

Connect the output of ‘Attribute Distance’, ‘Example Distance’ or ‘Genotype Distances’ to the ‘MDS’ widget for multi dimensional scaling (18).
In the ‘MDS’ widget, run optimization (click ‘Optimize’). Adjust the view in the ‘Graph’ tab (see ^{Note 17}).

3.7.5.7. Hierarchical clustering of experiments

Connect the expression data to the ‘Attribute Distance’ or ‘Genotype Distances’ widget and select the appropriate settings.
Connect the output of the ‘Distance’ widget to the ‘Hierarchical Clustering’ widget (Fig. 9). In ‘Hierarchical Clustering’ set the ‘Linkage’ to ‘Ward’s’ and the ‘Annotation’ to ‘label’.

Hierarchical clustering of genes and their expression profiles in Orange.

Footnotes

It is advisable to dedicate an area of the laboratory and a set of pipettors to RNA work. The work area and the pipettors should be cleaned with RNAseZap (Ambion) or a similar product before each procedure.

Plasticware should be sterilized by autoclave in glass beakers covered with aluminum foil and dedicated to RNA work.

In principle, other RNA purification procedures may be used as well but we have not tested their suitability for RNAseq.

⁴

The RNA is stable in Trizol for many months under these conditions and it can be shipped on dry ice if necessary.

⁵

We have not used smaller or larger amounts of total RNA so we cannot comment on samples outside this range.

⁶

The procedure can be stopped at this point and the samples can be stored at −80ºC for several days. We have not experienced problems with samples stored for as long as 7 days. Thaw the samples to room temperature before the next use.

⁷

We have successfully used as little as 30 ng of mRNA for library preparation. If one wants to use even lower amounts mRNA from a precious sample, we suggest testing first by comparing the RNA-sequencing results of a comparably small amount of a less precious and more readily available sample to larger amounts of the same mRNA. This analysis would reveal potential skewing in the observed mRNA species abundance.

⁸

If you wish to fragment the RNA to a different size or to examine the efficiency of fragmentation, we recommend analyzing the samples using an Agilent Bioanalyzer DNA 1000 chip.

⁹

One may stop at this point and store the samples at −20ºC for 2–3 days.

¹⁰

The transcriptomic library requires one-tenth the amount of adapters required for a standard genomic DNA library.

¹¹

The DNA ladder should be loaded at 0.1 μg per mm width of well. We usually load 6 μL of the diluted ladder into each well irrespective of the well width.

¹²

Faint bands of the positive control DNA may indicate loss of DNA through the reactions. In such cases, one may carry the positive control DNA through a PCR enrichment step and run the samples to determine the super shift post-PCR enrichment step. If you do not observe an increase in size after adapter ligation, replace all the enzymes and reagents and try again.

¹³

If there is no shift in the DNA size, one of the enzymes may have gone bad. Replace all the enzymes and repeat the procedure. If there is no band at all, make sure the SPRI beads are working well. Perform SPRI bead purification of a 25 bp DNA ladder to see the efficiency of the purification. Artifact bands in the negative control indicate cross contamination.

¹⁴

The elution is done in EB instead of EBT. In our hands, this gives better readings on the Nanodrop spectrophotometer. Elution with EB may result in some carryover of magnetic beads. This problem can be avoided by collecting only 38 μL of the EB rather than the entire 40 μL.

¹⁵

Sets of experiments that are used frequently can be saved.

¹⁶

If the Data Table is empty, check if the input is connected to the PIPA widget, and whether there were experiments selected and the ‘Commit’ button clicked in the PIPA widget.

¹⁷

If the optimization algorithm is stuck in a local minimum, click the ‘Jitter’ button, which moves the elements slightly, and click ‘Optimize’ again. The ‘Randomize’ button facilitates a complete restart of the MDS optimization.

References

1.Kibler K, Nguyen TL, Svetz J, van Driessche N, Ibarra M, Thompson C, Shaw C, Shaulsky G. A novel developmental mechanism in Dictyostelium revealed in a screen for communication mutants. Dev Biol. 2003;259:193–208. doi: 10.1016/s0012-1606(03)00204-5. [DOI] [PubMed] [Google Scholar]
2.Morio T, Urushihara H, Saito T, Ugawa Y, Mizuno H, Yoshida M, Yoshino R, Mitra BN, Pi M, Sato T, Takemoto K, Yasukawa H, Williams J, Maeda M, Takeuchi I, Ochiai H, Tanaka Y. The Dictyostelium developmental cDNA project: generation and analysis of expressed sequence tags from the first-finger stage of development. DNA Res. 1998;5:335–40. doi: 10.1093/dnares/5.6.335. [DOI] [PubMed] [Google Scholar]
3.Van Driessche N, Shaw C, Katoh M, Morio T, Sucgang R, Ibarra M, Kuwayama H, Saito T, Urushihara H, Maeda M, Takeuchi I, Ochiai H, Eaton W, Tollett J, Halter J, Kuspa A, Tanaka Y, Shaulsky G. A transcriptional profile of multicellular development in Dictyostelium discoideum. Development. 2002;129(7):1543–52. doi: 10.1242/dev.129.7.1543. [DOI] [PubMed] [Google Scholar]
4.Booth EO, Van Driessche N, Zhuchenko O, Kuspa A, Shaulsky G. Microarray phenotyping in Dictyostelium reveals a regulon of chemotaxis genes. Bioinformatics. 2005;21(24):4371–7. doi: 10.1093/bioinformatics/bti726. [DOI] [PubMed] [Google Scholar]
5.Parikh A, Miranda ER, Katoh-Kurasawa M, Fuller D, Rot G, Zagar L, Curk T, Sucgang R, Chen R, Zupan B, Loomis WF, Kuspa A, Shaulsky G. Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biol. 2010;11(3):R35. doi: 10.1186/gb-2010-11-3-r35. [DOI] [PMC free article] [PubMed] [Google Scholar]
6.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
7.Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5(7):613–9. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]
8.Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320(5881):1344–9. doi: 10.1126/science.1158441. [DOI] [PMC free article] [PubMed] [Google Scholar]
9.Smith AM, Heisler LE, St Onge RP, Farias-Hesson E, Wallace IM, Bodeau J, Harris AN, Perry KM, Giaever G, Pourmand N, Nislow C. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res. 2010;38(13):e142. doi: 10.1093/nar/gkq368. [DOI] [PMC free article] [PubMed] [Google Scholar]
10.Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. 2010;2010(6):pdb prot5448. doi: 10.1101/pdb.prot5448. [DOI] [PubMed] [Google Scholar]
11.Fey P, Kowal AS, Gaudet P, Pilcher KE, Chisholm RL. Protocols for growth and development of Dictyostelium discoideum. Nat Protoc. 2007;2(6):1307–16. doi: 10.1038/nprot.2007.178. [DOI] [PubMed] [Google Scholar]
12.Van Driessche N, Demsar J, Booth EO, Hill P, Juvan P, Zupan B, Kuspa A, Shaulsky G. Epistasis analysis with global transcriptional phenotypes. Nat Genet. 2005;37(5):471–7. doi: 10.1038/ng1545. [DOI] [PubMed] [Google Scholar]
13.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]
14.Gentleman R, Carey V, Huber W, Irisarry R, Dudoit S, editors. Bioinformatics and computational biology solutions using R and Bioconductor. New York, NY: Springer; 2005. [Google Scholar]
15.Rot G, Parikh A, Curk T, Kuspa A, Shaulsky G, Zupan B. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface. BMC Bioinformatics. 2009;10:265. doi: 10.1186/1471-2105-10-265. [DOI] [PMC free article] [PubMed] [Google Scholar]
16.Curk T, Demsar J, Xu Q, Leban G, Petrovic U, Bratko I, Shaulsky G, Zupan B. Microarray data mining with visual programming. Bioinformatics. 2005;21(3):396–8. doi: 10.1093/bioinformatics/bth474. [DOI] [PubMed] [Google Scholar]
17.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
18.Torgerson WS. Multidimentional Scaling: I. Theory and Method. Psychometrika. 1952;17(4):401–19. [Google Scholar]

[R1] 1.Kibler K, Nguyen TL, Svetz J, van Driessche N, Ibarra M, Thompson C, Shaw C, Shaulsky G. A novel developmental mechanism in Dictyostelium revealed in a screen for communication mutants. Dev Biol. 2003;259:193–208. doi: 10.1016/s0012-1606(03)00204-5. [DOI] [PubMed] [Google Scholar]

[R2] 2.Morio T, Urushihara H, Saito T, Ugawa Y, Mizuno H, Yoshida M, Yoshino R, Mitra BN, Pi M, Sato T, Takemoto K, Yasukawa H, Williams J, Maeda M, Takeuchi I, Ochiai H, Tanaka Y. The Dictyostelium developmental cDNA project: generation and analysis of expressed sequence tags from the first-finger stage of development. DNA Res. 1998;5:335–40. doi: 10.1093/dnares/5.6.335. [DOI] [PubMed] [Google Scholar]

[R3] 3.Van Driessche N, Shaw C, Katoh M, Morio T, Sucgang R, Ibarra M, Kuwayama H, Saito T, Urushihara H, Maeda M, Takeuchi I, Ochiai H, Eaton W, Tollett J, Halter J, Kuspa A, Tanaka Y, Shaulsky G. A transcriptional profile of multicellular development in Dictyostelium discoideum. Development. 2002;129(7):1543–52. doi: 10.1242/dev.129.7.1543. [DOI] [PubMed] [Google Scholar]

[R4] 4.Booth EO, Van Driessche N, Zhuchenko O, Kuspa A, Shaulsky G. Microarray phenotyping in Dictyostelium reveals a regulon of chemotaxis genes. Bioinformatics. 2005;21(24):4371–7. doi: 10.1093/bioinformatics/bti726. [DOI] [PubMed] [Google Scholar]

[R5] 5.Parikh A, Miranda ER, Katoh-Kurasawa M, Fuller D, Rot G, Zagar L, Curk T, Sucgang R, Chen R, Zupan B, Loomis WF, Kuspa A, Shaulsky G. Conserved developmental transcriptomes in evolutionarily divergent species. Genome Biol. 2010;11(3):R35. doi: 10.1186/gb-2010-11-3-r35. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R6] 6.Wang Z, Gerstein M, Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009;10(1):57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R7] 7.Cloonan N, Forrest AR, Kolle G, Gardiner BB, Faulkner GJ, Brown MK, Taylor DF, Steptoe AL, Wani S, Bethel G, Robertson AJ, Perkins AC, Bruce SJ, Lee CC, Ranade SS, Peckham HE, Manning JM, McKernan KJ, Grimmond SM. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat Methods. 2008;5(7):613–9. doi: 10.1038/nmeth.1223. [DOI] [PubMed] [Google Scholar]

[R8] 8.Nagalakshmi U, Wang Z, Waern K, Shou C, Raha D, Gerstein M, Snyder M. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science. 2008;320(5881):1344–9. doi: 10.1126/science.1158441. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R9] 9.Smith AM, Heisler LE, St Onge RP, Farias-Hesson E, Wallace IM, Bodeau J, Harris AN, Perry KM, Giaever G, Pourmand N, Nislow C. Highly-multiplexed barcode sequencing: an efficient method for parallel analysis of pooled samples. Nucleic Acids Res. 2010;38(13):e142. doi: 10.1093/nar/gkq368. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R10] 10.Meyer M, Kircher M. Illumina sequencing library preparation for highly multiplexed target capture and sequencing. Cold Spring Harb Protoc. 2010;2010(6):pdb prot5448. doi: 10.1101/pdb.prot5448. [DOI] [PubMed] [Google Scholar]

[R11] 11.Fey P, Kowal AS, Gaudet P, Pilcher KE, Chisholm RL. Protocols for growth and development of Dictyostelium discoideum. Nat Protoc. 2007;2(6):1307–16. doi: 10.1038/nprot.2007.178. [DOI] [PubMed] [Google Scholar]

[R12] 12.Van Driessche N, Demsar J, Booth EO, Hill P, Juvan P, Zupan B, Kuspa A, Shaulsky G. Epistasis analysis with global transcriptional phenotypes. Nat Genet. 2005;37(5):471–7. doi: 10.1038/ng1545. [DOI] [PubMed] [Google Scholar]

[R13] 13.Langmead B, Trapnell C, Pop M, Salzberg SL. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009;10(3):R25. doi: 10.1186/gb-2009-10-3-r25. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R14] 14.Gentleman R, Carey V, Huber W, Irisarry R, Dudoit S, editors. Bioinformatics and computational biology solutions using R and Bioconductor. New York, NY: Springer; 2005. [Google Scholar]

[R15] 15.Rot G, Parikh A, Curk T, Kuspa A, Shaulsky G, Zupan B. dictyExpress: a Dictyostelium discoideum gene expression database with an explorative data analysis web-based interface. BMC Bioinformatics. 2009;10:265. doi: 10.1186/1471-2105-10-265. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R16] 16.Curk T, Demsar J, Xu Q, Leban G, Petrovic U, Bratko I, Shaulsky G, Zupan B. Microarray data mining with visual programming. Bioinformatics. 2005;21(3):396–8. doi: 10.1093/bioinformatics/bth474. [DOI] [PubMed] [Google Scholar]

[R17] 17.Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25–9. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]

[R18] 18.Torgerson WS. Multidimentional Scaling: I. Theory and Method. Psychometrika. 1952;17(4):401–19. [Google Scholar]

Reagent	Volume (μL) per sample
5× first strand buffer	4
100 mM DTT	2
10 mM dNTP mix	1
RNaseOUT (40 U/μL)	0.5

Reagent	Volume (μL) per sample	Final concentration in 50 μL reaction
10× Klenow buffer	5	1×
1 mM dATP	10	0.5 mM
Klenow DNA polymerase (5 U/μL)	3	0.33 U/μL

Reagent	Volume (μL) per sample	Final concentration in 50 μL reaction
water	10
5× DNA ligase buffer	10	1×
Adapter oligo mix	1
DNA ligase (1 U/μL)	5	0.1 U/μL

PERMALINK

Transcriptional profiling of Dictyostelium with RNA sequencing

Edward Roshan Miranda

Gregor Rot

Marko Toplak

Balaji Santhanam

Tomaz Curk

Gad Shaulsky

Blaz Zupan

Summary

1. INTRODUCTION

2. MATERIALS

2.1. Reagents

2.1.1. RNA purification and cDNA synthesis

2.1.2. Single sample library preparation

2.1.3. Multiplexed library preparation

2.2. Equipment

2.3. Analysis software

3. METHODS

3.1. RNA purification and cDNA synthesis

3.1.1. Preparation of total RNA

3.1.2. mRNA purification

3.1.3. mRNA fragmentation

3.1.4. Precipitation of fragmented mRNA

3.1.5. First strand cDNA synthesis

3.1.6. Second strand synthesis

3.2. Single sample library preparation

3.2.1. End repair

3.2.2. Addition of a single A base

3.2.3. Adapter ligation

3.2.4. Gel purification

3.2.5. PCR Enrichment

3.3. Multiplexed library preparation

Figure 1. A strategy for preparing a multiplexed sequencing library.

3.3.1. Preparation of adapter mix

3.3.2. Reaction cleanup using Solid Phase Reversible Immobilization (SPRI)

3.3.3. End repair

3.3.4. Adapter ligation

3.3.5. Adapter fill-in reaction

3.3.6. Library quality control and characterization

3.3.7. Library quantification

3.3.8. Indexing PCR and sample pooling

3.4. Multiplexing: simulation and empirical results

Figure 2. Simulated and empirical multiplexing results.

3.5. Software tools

3.5.1. Input data

3.5.2. PIPA: a Dictyostelium RNAseq data management pipeline

Figure 3. PIPA.

3.5.3. dictyExpress: web-based gene expression analytics

Figure 4. dictyExpress.

3.5.4. Orange with a bioinformatics add-on: a visual programming suite for gene expression data analysis

Figure 5. A typical Orange bioinformatics schema.

3.6. Data processing

3.6.1. Data input and management in PIPA

3.6.2. Annotation in PIPA

3.6.3. Data mapping in PIPA

3.7. Data analysis

3.7.1. Differential expression analysis in PIPA

3.7.2. Expression analyses in dictyExpress

3.7.2.1. Searching for genes by name

3.7.2.2. Searching for genes by expression pattern

3.7.2.3. Selecting differentially expressed genes

3.7.3. Accessing PIPA data in Orange

Figure 6. Gene expression data selection with PIPA.

3.7.4. Quality control in Orange

Figure 7. Quality control widget in Orange.

3.7.5. Gene expression data analysis in Orange

3.7.5.1. Estimation of gene expression profile distances or distances between genes

3.7.5.2. Estimation of genotype-specific gene expression profile distances

3.7.5.3. Gene Ontology (GO) enrichment analysis

Figure 8. The Gene Selection widget in Orange.

3.7.5.4. Gene Set Enrichment Analysis (GSEA)

3.7.5.5. Visualization of distances with a distance map

3.7.5.6. Visualization of distances with multi dimensional scaling

3.7.5.7. Hierarchical clustering of experiments

Figure 9.

Footnotes

References

ACTIONS

PERMALINK