Skip to main content
. 2017 May 2;6:e22054. doi: 10.7554/eLife.22054

Figure 1. Three populations of cells from A. maculatum egg capsules containing stage 39 embryos were collected and prepared for mRNA extraction, cDNA sequencing, and differential expression analysis revealing several hundred significantly differentially expressed genes detected for the salamander and alga.

(a) Intracapsular algae (Population 1) were removed from intact eggs using a syringe and hypodermic needle (photo credit: Roger Hangarter). Embryos were decapsulated and washed, and the liver diverticulum region (dashed line), containing high concentrations of algae (red dots), was isolated and dissociated into a single cell suspension (illustration adapted from Harrison, 1969). The dissociated cells were screened for A. maculatum endoderm cells without alga (black arrowheads) and endoderm cells with intracellular alga (green arrowhead). Scale bars on microscope images are 20 µm. (b) Isolated endoderm cell, and isolated endoderm cell with intracellular alga. Scale bars on microscope images are 20 µm. (c) Representative cDNA distribution (bioanalyzer trace) from a population of 50 manually isolated A. maculatum endoderm cells. Peaks at 35 bp and 10380 bp are markers. Due to evidence of lysed A. maculatum cells observed in the cell suspension fluid after dissociation of A. maculatum embryos (debris seen in dissociated A. maculatum microscope images in (a) and (b)), that fluid was tested for the presence of contaminating mRNA. mRNA was not detected in the surrounding fluid, Figure 1—figure supplement 1. Lower limit abundance thresholds (Figure 1—figure supplement 2), and correction for low sequencing depth in intracelluar algal samples (Figure 1—figure supplement 3) were implemented to obtain the final gene sets used for differential expression analysis. Depth of sequencing was not biased for A. maculatum cell with and without alga samples (Figure 1—figure supplement 4). Library preparation GC bias affected the completeness of the algal transcriptome obtained from intracapsular and intracellular O. amblystomatis (Figure 1—figure supplement 5). (d and e) Dotplots of log2 fold change vs. expression level. The blue horizontal lines are plus and minus 4-fold change in expression between samples. The red dots are genes with FDR adjusted p-values<0.05, indicating a significant difference in expression level between conditions. (d) Differentially expressed algal transcripts. (e) Differentially expressed salamander transcripts.

DOI: http://dx.doi.org/10.7554/eLife.22054.003

Figure 1—source data 1. Raw counts matrix with counts for all reads mapped to the total evidence assembly (the assembly of all salamander and algal reads from wild-collected samples).
The data in this file (after filtering and normalization) was used to generate the dotplots in Figure 1D and E, Figure 1—figure supplements 24, and Figure 3. This is the raw data that was used for differential expression analysis. Rows are genes. Column names are as follows: S2a-S5a are counts for salamander cells without algae. S2b-S5b are counts for salamander cells with intracellular algae (samples are paired from the same individuals, such that S2a and S2b came from the same salamander). A1-A3 are intracapsular algae samples. RK_* are cultured algal samples.
DOI: 10.7554/eLife.22054.004
Figure 1—source data 2. List of 6,726 algal gene IDs used in differential expression analysis.
Use to filter raw counts matrix to get final algal gene list.
DOI: 10.7554/eLife.22054.005
Figure 1—source data 3. List of 46,549 salamander gene IDs used in differential expression analysis.
Use to filter raw counts matrix to get final salamander gene list.
DOI: 10.7554/eLife.22054.006

Figure 1.

Figure 1—figure supplement 1. A. maculatum cell lysis during embryo dissociation did not contaminate the cell suspension fluid with significant quantities of mRNA.

Figure 1—figure supplement 1.

(a) Representative cDNA distribution (bioanalyzer trace) from a population of 50 manually isolated A. maculatum endoderm cells. (b) No cDNA was produced when the fluid the cells were suspended in was tested indicating that the cDNA populations from manually isolated A. maculatum endoderm cells was specific and not contaminated with cDNAs derived from randomly lysed cells. In both (a) and (b), the peaks at 35 bp and 10380 bp are markers.
Figure 1—figure supplement 2. Determining lower limit FPKM thresholds for inclusion in differential expression analysis.

Figure 1—figure supplement 2.

For pairs of experimental conditions (i.e. n = 4 A. maculatum samples without intracellular algae, and n = 4 A. maculatum samples with intracellular algae), gene expression levels were sorted by the mean FPKM value (expression level) in one set of samples (i.e. in (a) expression levels of A. maculatum genes from samples with and without intracellular algae were sorted by mean expression per gene for n = 4 A. maculatum samples without intracellular algae). Using a sliding window of 100 genes, starting with the 100 most lowly expressed genes of the sorted set, median expression levels of the 100 gene bins were calculated for both experimental conditions. Those binned values were plotted with the expectation that on average, gene expression from one experimental condition should be positively correlated with gene expression from the other experimental condition. Vertical red dashed lines indicate the level of expression along the x-axis (in the sorted sample, determined by visual inspection of the plots) where positively correlated expression between the experimental conditions begins. Those values were used as lower limit thresholds in data pre-filtering steps. (a) Salamander cells with endosymbionts vs. salamander cells without endosymbionts; sorted by salamander cells without endosymbionts expression levels. (b) Salamander cells without endosymbionts vs. salamander cells with endosymbionts; sorted by salamander cells with endosymbionts expression levels. (c) Intracellular algae vs. intracapsular algae; sorted by intracapsular algae expression levels. (d) Intracapsular algae vs. intracellular algae; sorted by intracellular algae expression levels.
Figure 1—figure supplement 3. Determining a threshold for absence calls in intracellular algal data.

Figure 1—figure supplement 3.

Intracapsular algae samples had a higher sequencing depth than the intracellular algae. This filtering determined the lower FPKM limit of expression in intracapsular algae for inclusion in differential expression analysis. (a) Algal gene expression levels in intracapsular (red) and intracellular (blue) algae. The vertical dashed lines represent the median expression level of the respective populations. The large blue bar at −5 ln(FPKM) is the overrepresented proportion of genes with no expression in intracellular algal samples due to the low depth of sequencing. (b) Genes with low levels of intracapsular algal expression are detected in 100% of the intracellular algal samples due to pre-filtering inclusion of genes that were detected in all four intracellular algal samples. However, as the expression level of genes in intracapsular algal samples increases, the proportion of genes detected in intracellular algae decreases sharply with a minimum of 40%. Following this minimum, the proportion of genes detected in intracellular samples increase proportionally with the intracapsular expression. The red dashed vertical line is the FPKM value in intracapsular algae where 95% or more of the intracellular genes are detected. Below this threshold, a gene’s absence in intracellular genes is possibly due to the low sequencing depth, above this threshold, a gene’s absence in intracellular algae is interpreted as potential under-expression. (c) The same plot as in (a), after filtering to remove genes absent in intracellular algae with expression levels in intracapsular algae below threshold. (d) The same plot as in (b), after the dependence of detection on expression level was removed.
Figure 1—figure supplement 4. Determining threshold for absence calls in salamander data.

Figure 1—figure supplement 4.

The algal filtering described in Figure 1—figure supplement 3 was not required for salamander transcripts. (a) Salamander gene expression levels in salamander cells without algae (red) and salamander cells with algal endosymbionts (blue). Data is plotted on a natural log scale. The vertical dashed lines represent the median expression level of the respective populations (overlapping in this case). (b) The proportion of salamander mRNA’s detected in alga-containing cells does not depend on the mRNA expression level in salamander cells without algae. Greater than 95% of all genes are detected in salamander cells plus algal samples for all values of expression in salamander cells without alga samples.
Figure 1—figure supplement 5. High GC content algal genes were not detected by the combination of SMARTer cDNA synthesis and Nextera-XT library preparation.

Figure 1—figure supplement 5.

(a) The GC content distribution of algal transcripts generated using TrueSeq library preparation of total RNA, sequenced on the MySeq platform with approximately 30 million 75 bp paired end reads. 79% of eukaryote BUSCOs were detected in this assembly. The median GC content (green dashed line) is 62%. (b) The GC content distribution from (a), split by library preparation method. Red bars represent algal transcripts found in transcriptomes generated by both library preparation methods (SMARTer-Netxtera-XT and TruSeq). Blue bars represent transcripts found only in the transcriptome assembly from the TrueSeq library preparation method, that are absent from the transcriptome generated using the SMARTer cDNA synthesis-Nextera-XT library preparation method. There is an apparent bias against high GC content algal transcripts in library prepared using the SMARTer cDNA synthesis-Nextera-XT protocol (Kolgomorov-Smirnov test, p<2.2 × 10−16). Both libraries were sequenced to a similar depth of approximately 30 million reads for the alga-only samples in the total-evidence assembly from the SMARTer-cDNA synthesis-Nextera-XT library and 30 million reads for the TrueSeq library from unialgal cultures. Since sequencing depth was equivalent and GC bias is apparent, the data suggests that GC bias in the SMARTer-cDNA synthesis-Nextera-XT library is what accounts for the low number of detected BUSCOs (49%) in the algal transcriptome generated from wild-collected algal samples associated with salamander eggs and cells. (C.) The distribution of GC content in A. maculatum transcripts (gray bars) is centered around much lower GC content transcripts (median GC content of 43%) compared to that of O. amblystomatis (green bars, median GC content of 62%). The A. maculatum assembly contained 88% of eukaryote BUSCOs. Our evidence points to bias against high GC content transcripts in the SMARTer cDNA synthesis and Nextera-XT library prep method, that becomes significant above 60% GC content. Transcripts with GC content of 60% or greater are in the tail of the salamander GC content distribution, but near the median of the algal GC content distribution. This offers an explanation for the BUSCO results, where the salamander transcriptome from the wild-collected samples is comprehensive, while the algal transcriptome from the same samples and library prep methods is missing around 40% of the algal transcriptome.