Skip to main content
International Journal of Genomics logoLink to International Journal of Genomics
. 2015 Nov 24;2015:695947. doi: 10.1155/2015/695947

De Novo Assembly of the Pea (Pisum sativum L.) Nodule Transcriptome

Vladimir A Zhukov 1,*, Alexander I Zhernakov 1, Olga A Kulaeva 1, Nikita I Ershov 2, Alexey Y Borisov 1, Igor A Tikhonovich 1,3
PMCID: PMC4672141  PMID: 26688806

Abstract

The large size and complexity of the garden pea (Pisum sativum L.) genome hamper its sequencing and the discovery of pea gene resources. Although transcriptome sequencing provides extensive information about expressed genes, some tissue-specific transcripts can only be identified from particular organs under appropriate conditions. In this study, we performed RNA sequencing of polyadenylated transcripts from young pea nodules and root tips on an Illumina GAIIx system, followed by de novo transcriptome assembly using the Trinity program. We obtained more than 58,000 and 37,000 contigs from “Nodules” and “Root Tips” assemblies, respectively. The quality of the assemblies was assessed by comparison with pea expressed sequence tags and transcriptome sequencing project data available from NCBI website. The “Nodules” assembly was compared with the “Root Tips” assembly and with pea transcriptome sequencing data from projects indicating tissue specificity. As a result, approximately 13,000 nodule-specific contigs were found and annotated by alignment to known plant protein-coding sequences and by Gene Ontology searching. Of these, 581 sequences were found to possess full CDSs and could thus be considered as novel nodule-specific transcripts of pea. The information about pea nodule-specific gene sequences can be applied for gene-based markers creation, polymorphism studies, and real-time PCR.

1. Introduction

Pea (Pisum sativum L.), an important crop cultivated worldwide [1], is a valuable model system in plant genetics. Since Gregor Mendel's famous experiments, several scientific discoveries have occurred in modern pea genetics; these new insights include information regarding genetic control of compound leaf development [2, 3] and the molecular basis of symbiotic interactions with beneficial nitrogen-fixing bacteria (rhizobia) [46]. The study of pea gene polymorphism in relation to agronomically important traits is essential to both basic and applied research on this crop plant [7, 8]. Unfortunately, the large size (more than 4 Gb) (http://data.kew.org/cvalues/) and complexity [9] of the pea genome hamper its sequencing as well as the discovery of this crop plant's genetic resources, both of which are desperately needed for molecular and genomics-assisted breeding [8, 10].

As an alternative to whole genome sequencing, analysis of transcriptomes by RNA sequencing can provide extensive information about expressed genes [11, 12]. Because next-generation sequencing technologies are applicable to all organisms, including those for which information about genome organization is insufficient or lacking, considerable progress in pea transcriptome sequencing has been achieved over the last few years. Massive amounts of transcriptomic data have been obtained in the form of high-quality sequence reads that have been used for molecular marker creation, whole genome map construction [1317], and characterization of host-pathogen interactions (pea-Sclerotinia sclerotiorum) [18]. All these data (as well as additional unpublished pea transcriptome sequencing results) have been uploaded to the NCBI Sequence Read Archives (SRA) (http://www.ncbi.nlm.nih.gov/sra/) as raw reads. Assemblies have been created for some of these data and deposited in the NCBI Transcriptome Shotgun Assembly (TSA) database, allowing users to perform data mining (e.g., BLAST searching) and to study pea gene polymorphism.

Several genes have tissue-specific expression, however, and can therefore only be studied through analysis of the appropriate tissue. One such example involves symbiotic genes necessary for the establishment and development of nitrogen-fixing nodules which are predominantly expressed in those temporary plant organs. To date, only a few samples from pea nodules have been sequenced (available as raw SRA archives), and only one assembly built from a mixture of sequencing reads from different organs (including nodules) is present in the TSA database (see Table 1 for available pea nodule transcriptome sequencing results). Because this assembly was based on nodules harvested at a very late stage of symbiotic nodule development (3-month-old plants), it presumably contains insufficient information on nodule-specific transcripts. Consequently, sequences of nodule-specific genes of pea are still limited.

Table 1.

Bioprojects from http://www.ncbi.nlm.nih.gov/ containing pea nodule or root tip transcriptome data.

Bioproject Biosample SRA Platform Submitted by Description TSA
Pea genotype Sample Time of harvesting
PRJNA257308 SAMN02950507 SRX669192 Illumina
HiSeq 2000
University of Minnesota, Nevin Young; 2014-08-01 Little Marvel Nodules 30 days after inoculation NA
SRX669212 Illumina
HiSeq 2000
Little Marvel Nodules 30 days after inoculation NA

PRJNA267198 SAMN03153588 SRS747845 NA Cameor Nodules Development stage, 5-6 open leaves/7-8 nodes (nodules stage A) NA
SAMN03153589 SRS752085 NA INRA, Jonathan Kreplak; 2014-10-30 Cameor Nodules Development stage, flowering (nodules stage B) NA
SAMN03153591 SRS752086 NA Cameor Nodules Development stage, 18 days after sowing, that is, 10 days after inoculation (nodules stage G) NA

PRJNA277074 SAMN03396628 SRX952469 Illumina
HiSeq 2000
Kaspa Nodules 3-month-old plants GCMF00000000.1;
GCMG00000000.1;
GCMH00000000.1;
GCMI00000000.1;
GCMJ00000000.1;
GCMK00000000.1;
GCML00000000.1
SRX952470 Illumina
MiSeq
Kaspa Nodules 3-month-old plants
SAMN03396630 SRX952472 Illumina
HiSeq 2000
Department of Economic Development, Jobs, Transport and Resources, 5 Ring Road, Bundoora, VIC 3083, Australia; Shimna Sudheesh; 2015-03-09 Kaspa Root tips 4-week-old plants
SRX952473 Illumina
MiSeq
Kaspa Root tips 4-week-old plants
PRJNA277076 SAMN03396658 SRX952517 Illumina
HiSeq 2000
Parafield Nodules 3-month-old plant GCKA00000000.1;
GCMM00000000.1;
GCMN00000000.1;
GCMO00000000.1;
GCMP00000000.1;
GCMQ00000000.1
SRX952518 Illumina
MiSeq
Parafield Nodules 3-month-old plant
SAMN03396660 SRX952521 Illumina
HiSeq 2000
Parafield Root tips 4-week-old plants
SRX952522 Illumina
MiSeq
Parafield Root tips 4-week-old plants

PRJNA284856 SAMN03733514 SRS945123 Illumina
GAIIx
All-Russia Research Institute for Agricultural Microbiology, Saint Petersburg, Russia SGE Nodules 12 days Registered
as SUB965211
SAMN03733554 SRS945125 Illumina
GAIIx
SGE Root tips 12 days Registered
as SUB965299

The aim of our work was thus to sequence the transcriptome of young pea nodules, construct an assembly, and analyze the resulting assembly for unique sequences. Along with nodules, we harvested root tips to analyze their transcriptome content as well.

2. Materials and Methods

2.1. Biological Materials

Seeds of pea laboratory line SGE [19] were surface-sterilized with concentrated sulfuric acid (98%) (15 min on a shaker), washed 10 times with autoclaved distilled water, and germinated on Petri dishes containing sterile vermiculite for 3 days. The germinated seeds were then planted individually into 200 mL ceramic pots containing quartz sand, watered with 100 mL of 2x nitrogen-free mineral nutrition solution [20], and inoculated with an aqueous suspension of Rhizobium leguminosarum bv. viciae RCAM1026 [21] (1 × 106 CFU per plant). Plants were harvested 12 days after inoculation; nodules and root tips (5 mm distal portion of the root) were placed in liquid nitrogen, ground into powder, and stored at −80°C. Material was harvested from a total of 10 plants.

2.2. cDNA Library Construction and Sequencing

Total RNA was extracted from 100 μg of material using an RNeasy Plant mini kit (Qiagen, Hilden, Germany). cDNA libraries were constructed and sequenced according to the instructions provided with the Genome Analyzer IIx platform (Illumina, San Diego, CA, USA). After total RNA extraction and DNase-I treatment, mRNAs were captured using oligo (dT) magnetic beads and fragmented. First-strand cDNA was synthesized from these fragments using random hexamer primers; double-stranded cDNA was then generated, purified with magnetic beads, and subjected to end reparation and 3′ single adenylation. Sequencing adaptors were ligated to the adenylated fragments, and DNA fragments having adapter molecules on both ends were then amplified. After a quality control step performed on a 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA), the cDNA library products were sequenced in a single-read run with 75 bp length reads on an Illumina Genome Analyzer IIx platform.

2.3. De Novo Transcriptome Assembly

Preliminary quality control of the raw sequencing data was performed via the FastQC v.0.11.3 application (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/), which indicated that the reads were of acceptable quality. For adapter removing Cutadapt version 1.8.1 [22] was used. Low-quality read removal and trimming were then performed with the assembly program Trinity v.2.0.6 [23] using the “trimmomatic” option with default parameters. Next, contig assembly was performed by Trinity with default assembly parameters, including kmer = 25. As a result, two FASTA files were obtained, one for the nodule sample and one for the root tip sample. Statistical parameters for the assemblies were obtained by running the TrinityStats.pl script included in the Trinity package.

2.4. Assessment of Assembly Quality, Differential Expression Analysis, and Functional Annotation of Contigs

As a step in assessment of assembly quality, the generated reads were mapped to the assemblies with the Bowtie2 program v. 2.2.5 [24]. The contigs were grouped with known pea sequences (obtained from http://www.ncbi.nlm.nih.gov/) using CD-HIT_EST from the CD-HIT package (http://cd-hit.org/) [25] with parameters -c 0.80 -n 6.

In order to distinguish the transcripts enriched in nodules as compared to root tips, the reads of both libraries were mapped to “Nodules” assembly with Bowtie2 v. 2.2.5 [24]. The differential expression was calculated using EdgeR package [26] under a negative binomial model, with biological coefficient of variation 0.2 and FDR cutoff value 0.001.

The assemblies were compared to the NCBI nonredundant (nr) database using BLASTX [27]. The resulting BLAST output was processed using publicly available Blast2GO software (v.2.5.0) (BioBam Bioinformatics SL, Valencia, Spain) [28] to retrieve associated Gene Ontology (GO) terms describing biological processes, molecular functions, and cellular components [29].

To detect transcripts containing reliable full-length CDS regions two approaches based on similarities of either nucleotide or amino acid sequences were used. For each transcript, BLAST search against NCBI RefSeqGene database was performed in order to find orthologous sequences, and then these sequences were aligned by Smith-Waterman algorithm [30] with a “5-0” substitution matrix. Also, as an alternative approach, we used TransDecoder software [23] for CDS region prediction based on homology search against Swiss-Prot protein database [31].

To extend the annotation of the full-length nodule-specific transcripts, the nucleotide sequences were converted into amino acid sequences and then mapped to the Kyoto Encyclopedia of Genes and Genomes (KEGG) Web Server (http://www.genome.jp/kegg/) [32].

2.5. Sanger Sequencing, Primer Design, and Online Computational Tools

Direct sequencing of PCR fragments was performed on an ABI Prism 3500 xL system (Applied Biosystems, USA) at the Genomic Technologies, Proteomics, and Cell Biology Core Center of All-Russia Research Institute for Agricultural Microbiology (ARRIAM, Saint Petersburg, Russia). The online tool OligoCalc [33] was used for primer design. Alignments of small sequence sets were generated using Multalin [34]. Translation initiation site prediction was performed using NetStart 1.0 [35].

3. Results and Discussion

3.1. Sequencing and Assembly

Sequencing generated 52,021,865 reads from the “Nodules” library and 17,684,604 reads from the “Root Tips” library. After removal of adapter and index sequences, 75.1% reads were equal to or longer than 70 bp and 10.2% of reads were shorter than 10 bp for “Nodules” and 74.5% and 11.2% of reads, accordingly, for “Root Tips.” The nodule and root tip read sets were assembled individually. A total of 58,397 contigs belonging to 48,628 genes (as termed by Trinity) were constructed from the nodule set, with a mean contig length of 880.81, a median contig length of 620, and an N50 of 1,282. Isoforms were proposed for 4,550 genes. Root tip reads were assembled into 37,287 contigs of 35,081 genes, with a mean contig length of 841.14, a median contig length of 558, an N50 of 1,260, and 1,055 total isoforms. Length distributions of contigs in the two assemblies are presented in Figure 1, and distributions of isoform numbers are shown in Figure 2.

Figure 1.

Figure 1

Length distribution of contigs obtained from “Nodules” and “Root Tips” assemblies.

Figure 2.

Figure 2

Isoform distribution in “Nodules” and “Root Tips” assemblies.

To our knowledge, the appearance of isoforms can be due to either alternative splicing or the presence of paralogous sequences expressed in the tissue. As an illustration of the first case, we were able to detect two splice variants in the “Nodules” assembly for transcripts of the symbiotic gene Ign1, an ortholog of IGN1 (Ineffective Greenish Nodules 1) of Lotus japonicus (Regel.) K. Larsen [36] (GenBank accession number KR047192; TR2831|c0_g2_i1 and TR2831|c0_g2_i4 in the “Nodules” assembly). The longer transcript retained the first intron and, according to prediction by NetStart 1.0 [35], could be translated into a protein variant lacking the first 21 N-terminal amino acids. An example of the second case involved ENOD6 (for early nodulin 6; GenBank accession number X63700), which encodes a short protein belonging to a group of nodule-specific cysteine-rich (NCR) peptides [37, 38]. BLASTN searching uncovered a group of isoforms (TR2035 from the “Nodules” assembly) derived from paralogous genes encoding cysteine-rich peptides specific for nodules (it should be noted that some of these isoforms could be artificial chimeras containing parts of different transcripts that have extensive segments sharing 100% similarity).

3.2. Quality Assessment

Evaluating the quality of a de novo transcriptome assembly without a reference genome is challenging. We therefore implemented three approaches previously recommended for managing this task [39, 40].

First, we used the pea expressed-sequence tag (EST) sequences represented in GenBank as a standard to estimate assembly quality. We aligned 18,576 ESTs against the “Nodules” assembly. Of these, 2,571 ESTs (13.8%) shared no similarity with any contigs of the assembly. Furthermore, 102 ESTs were filtered out on the basis of an E-value cutoff of 1 × 10−10. From the remaining 15,903 ESTs (85.6%), we chose hits with maximal coverage of the EST (one per EST) and evaluated their coverage and identity distributions. Among these EST-contig pairs, 94.9% of the ESTs shared more than 90% identity with their corresponding contig fragments.

Following the second recommended approach, we mapped back all reads to contigs in both assemblies; as a result, 89% and 91% of reads in “Nodules” and “Root Tips” assemblies were, respectively, aligned back to the contigs, demonstrating that our assemblies were of acceptable quality.

Third, being interested in symbiosis-specific genes, we searched the “Nodules” assembly for previously unknown pea homologs of symbiotic genes EFD (ethylene response factor required for nodule differentiation) [41], VPY (Vapyrin) [42], and NSP1 (nodulation signaling pathway 1) [43] of Medicago truncatula Gaertn. and SEN1 (stationary endosymbiont nodule 1) [44] of L. japonicus. Long transcripts with high identity were found for all four genes (Table 2); these transcripts allowed us to design primers flanking coding sequence (CDS) regions and to amplify the corresponding regions in cDNA synthesized from 4-week-old nodules of pea genotypes SGE and Finale. Except for allelic variations of Vpy, Sen1, and Nsp1 that were found between SGE transcriptome and Finale cDNA genotypes, we observed complete sequence correspondence for all four genes, thereby demonstrating the satisfactory quality of the created assembly.

Table 2.

Pea nodule transcripts corresponding to known symbiotic genes of Medicago truncatula and Lotus japonicus.

Gene Accession number (M. truncatula or L. japonicus) Contig in “Nodules” Forward primer, 5′-3′ Reverse primer, 5′-3′ Identity % (CDS)
EFD EU251063.1 TR2716|c0_g1_i1 ACCTTCACTTCACTTCACTTAAG GGTGTCATGGAGAAATGCTACA 88%
NSP1 AJ972478.1 TR20452|c0_g1_i2 AATGATCCAAGAACACTACTAACC CAGCTCTCTTAATCACAGACAT 88%
VPY GQ423209.1 TR12524|c0_g2_i1 ACCATCATAAACCAAACTGTTGC TCCAAATCACACTCACAACTCC 93%
Sen1 AB573230.1 TR29546|c0_g1_i1 TAAACAGATCAATCAAGCATTCATG ATTGGTTCAACATGAAGTATACG 76%

% identity to the L. japonicus sequence; for others, % identity to M. truncatula sequences is given.

To evaluate the “Root Tips” assembly, we selected genes involved in glutathione biosynthesis: Gsh1 (gamma-glutamylcysteine synthetase precursor [AF128455.1]), Gshs (glutathione synthetase precursor [AF231137.1]), and hGshs (putative homoglutathione synthetase [AF258319.1]). BLASTN searching against the “Root Tips” assembly identified one contig (TR9283|c0_g1_i1) completely identical to full-length Gsh1 and one contig (TR8244|c0_g1_i1) completely identical to full-length Gshs except for a single nucleotide polymorphism in the 3′ untranslated region. This search also revealed four contigs (TR2244|c0_g1_i1, TR3920|c0_g1_i1, TR13401|c0_g1_i1, and TR25033|c0_g1_i1) representing portions of hGshs (each with 100% identity) that had not been assembled into a contig, probably because of insufficient overlapping of reads due to the low expression levels of this gene in pea root tips. Consequently, despite the good quality of the “Root Tips” assembly, its coverage was insufficient for finding full sequences of rare transcripts; nevertheless, the discovery of partial sequences allows primers to be designed for whole-transcript PCR amplification and transcript-end amplification by rapid amplification of cDNA ends (RACE) methodology.

3.3. Annotation of Nodule-Specific Transcripts

To obtain information on nodule-specific genes, we attempted to select portions of previously unknown sequences of the “Nodules” assembly by clustering them together with pea sequences produced from nonnodular tissues. In addition to our “Root Tips” assembly, the pea transcriptome assemblies created by Franssen et al. [13] and Kaur et al. [14] currently meet this requirement. Using Cd-Hit software, 277,211 sequences of these four pea transcriptome assemblies were grouped into 61,521 clusters (where a cluster is defined according to Cd-Hit as a set of similar sequences created to reduce sequence redundancy and to improve the performance of other sequence analyses). Among these clusters, 10,391 (approximately 17%) were common to all assemblies (Figure 3). The 13,305 nodule-specific clusters included 14,998 contigs belonging to 14,171 genes (i.e., without gene isoforms). These sequences were assigned to GO terms to characterize the nodule transcriptome profile.

Figure 3.

Figure 3

Clustering the sequences of the “de novo Nodules” assembly (this study) together with pea sequences produced from nonnodular tissues (“Franssen” [13], “Kaur” [14], and “de novo Root Tips” (this study)).

The transcripts were first aligned against plant protein sequences in the NCBI nr protein database (24.04.15 release). The following parameters were used: an E-value cutoff of 1 × 10−20, the same alignment direction for all high-scoring segment pairs (HSPs) in a hit, and 20 (or more, if of the same E-value) hits for a query. The transcripts were then annotated using Blast2GO software.

The length of a contig is supposedly the critical factor for successful annotation [40]. Only about 30% of sequences less than 1,000 bp long in our “Nodules” assembly were successfully annotated. The efficiency was 56% for sequences ranging from 1,000 to 2,000 bp. Almost all sequences (93%) longer than 2,000 bp were successfully annotated.

One-third of nodule-specific contigs (5,940) were associated with plant proteins in the nr database, with 565,464 total hits. Of these, 3,516 contigs were assigned to 13,697 GO terms. Among biological processes, the most abundant terms were metabolic processes (“organic substance metabolic process”, “primary metabolic process”, “cellular metabolic process”, “single-organism metabolic process”, and “nitrogen compound metabolic process”) along with “single-organism cellular process”, “biosynthetic process”, “establishment of localization”, and “single-organism localization” (Figure 4(a)). This distribution reflects the processes occurring in nodules, such as microsymbiont (rhizobia) hosting within cells and nitrogen compound metabolism. Within the molecular function category, contigs were assigned to “heterocyclic compound binding”, “organic cyclic compound binding”, “ion binding”, “small molecule binding”, “transferase activity”, and “carbohydrate derivative binding” (Figure 4(b)). These terms may be related to metabolite exchange between plants and bacteria, including exchanges with signal molecules. Regarding cellular components, the major GO terms were “cell part”, “membrane-bounded organelle”, “membrane part”, and “organelle part” (Figure 4(c)), which are similarly concerned with the formation and functioning of symbiotic compartments in nodule cells.

Figure 4.

Figure 4

Gene Ontology (GO) classification of nodule-specific contigs. GO subcategories of (a) “biological process,” (b) “molecular function,” and (c) “cellular component” are shown.

It seems also valuable to distinguish the transcripts that are preferentially expressed in nodules as compared to root tips. By mapping the reads of both libraries (“Nodules” and “Root Tips”) to “Nodules” assembly via Bowtie2 v. 2.2.5 and calculating the differential expression via EdgeR package with 0.001 FDR cutoff we selected 1081 contigs that represent genes with significantly higher expression level in nodules (Supplementary File 1 in Supplementary Material available online at http://dx.doi.org/10.1155/2015/695947). Still, more detailed analysis, including verification of the results of such “digital expression” analysis by real-time PCR, is needed, along with addition of more time points to the experiment.

3.4. Sequences of Nodule-Specific Transcripts

To identify sequences of novel unreported, highly reliable transcripts of pea, we analyzed the 14,998 sequences of the “nodule only” clusters using TransDecoder software [23]. As a result, 593 putative full-length ORFs were found in 536 contigs (Supplementary File 2).

As an alternative approach to the identification of full-length transcripts, we aligned the same set of 14,998 sequences of the “nodule only” clusters against plant RNA genes in the NCBI RefSeqGene database. Of these sequences, 9,931 had no significant matches. The remaining sequences were filtered according to the following criteria: minimal query coverage of 0.8, maximum E-value of 1 × 10−10, and the same direction for all HSPs. This step yielded 3,673 contigs and 21,389 hits.

We aligned each pair using the Smith-Waterman algorithm [30] with a “5-0” substitution matrix and identified aligned fragments corresponding to the CDS regions of hit sequences (as determined by GenBank). We selected 427 alignments (comprising 153 unique pea sequences, some of which aligned to multiple GenBank accessions representing gene isoforms or paralogs) with the following characteristics: (1) full coverage of the hit CDS region by a contig; (2) identity higher than 0.8; and (3) the contig having possible start and stop codons within a 50 bp region. Of these 153 contigs, 45 were not detected by TransDecoder.

In total, we identified 581 novel sequences containing putative full-length CDS in pea nodule transcriptome, among which 536 were found by TransDecoder and additional 45 were detected by alternative approach based on BLAST against known plant mRNA sequences. For annotation of these 581 sequences, homologous genes were found by BLASTN search in Medicago truncatula genome (ver. 4.0) [45] (see Supplementary File 2). Also, KO (KEGG Orthology) identifiers were assigned to the novel sequences, and 109 entries out of 581 (18.8%) were successfully annotated (Supplementary File 2).

In our opinion, our generated “Nodules” assembly adds valuable information, especially with respect to nodule-specific sequences, to the existing knowledge about pea transcriptome: some unique sequences of pea symbiosis-related genes can be identified only in our assembly. An example of this case involves CLE genes, some of which were shown to participate in systemic regulation of nodule formation in several legumes such as M. truncatula, L. japonicus, and Glycine max [4648]. The CLAVATA3/Embryo Surrounding Region-Related (CLE) gene family is composed of numerous genes that contain conserved CLE domains in various plant species and encode short regulatory peptides (CLE-peptides) (for review see [49]). In M. truncatula, two CLE genes, MtCLE12 and MtCLE13, have nodulation-related expression patterns that are linked to proliferation and differentiation [46]. In pea, sequences of CLE genes are not known, but it was shown that overexpression of MtCLE13 gene leads to similar effects (severe reduction in nodulation) in both pea and M. truncatula, proving that MtCLE13 is functional in pea [50]. So we sought for the sequences homologous to MtCLE12 and MtCLE13 (Medtr4g079630.1 and Medtr4g079610.1, resp.) in our nodule transcriptome assembly.

BLASTN search using the Medtr4g079610.1 transcript sequence (encoding Cle13 peptide [46]) as a query against our “Nodules” assembly retrieved the contig TR8317|c0_g1_i1, which contains a full open reading frame (ORF) corresponding to Cle13 of P. sativum; the same search against “Organism Pisum sativum (taxid: 3880)” in the NCBI TSA database returned two partial transcripts: (1) gb|GCMK01019899.1| (TSA: “Pisum sativum Ps_029064 transcribed RNA sequence”), containing only part of the Cle13 ORF, and (2) the apparently chimeric gb|GCMO01040960.1| (TSA: “Pisum sativum Ps_150017 transcribed RNA sequence”) containing a portion of the ankyrin repeat gene (similar to Glycine max ankyrin repeat-containing protein At5g02620-like [LOC100812799]) as well as the 5′-part of the Cle13 transcript. A BLASTN search using the Medtr4g079630.1 transcript sequence (encoding Cle12 peptide [46]) as a query found two contigs: TR116|c0_g1_i1, containing the full ORF of P. sativum Cle12, and TR23484|c0_g1_i1, containing the full ORF of an unknown protein similar to P. sativum and M. truncatula Cle12 and therefore presumably a paralog of P. sativum Cle12. We thus tentatively designated these genes as Cle12a and Cle12b, respectively (Table 3). At the same time, the search against the NCBI TSA database retrieved no significant homologs of Cle12 in pea.

Table 3.

Pea transcripts homologous to Medicago truncatula transcripts encoding Cle12 and Cle13 peptides.

M. truncatula gene Contig in “Nodules” Identity % (CDS/protein) Suggested name for pea transcript
Cle12 Medtr4g079630.1 TR116|c0_g1_i1 79/65 Cle12a
TR23484|c0_g1_i1 74/50 Cle12b
Cle13 Medtr4g079610.1 TR8317|c0_g1_i1 80/68 Cle13

The pea transcriptome assembled after Illumina sequencing is thus a good resource for the study of pea transcripts related to nodulation. It can be used in future investigations focused on pea symbiosis-specific genes. Such potential research targets include genes encoding nodule-specific peptides such as NCR- and glycine-rich protein peptides that have been exhaustively described in M. truncatula [38, 51] but not P. sativum, as well as other symbiotic genes expressed in nodules, including Cle peptide-encoding genes.

Also, the present “Nodules” assembly is a convenient tool that can facilitate study of transcription changes in nodules of symbiotic mutants. In pea, several mutant lines with impaired nodule formation were obtained and phenotypically characterized [5254]. RNA sequencing of the whole transcriptome from mutant nodules is considered to be a reasonable approach for further characterization of genes and gene networks that operate during nodule development. In this regard, the present assembly of pea nodule transcriptome can be used as a reference for mapping reads and differential expression analysis. Also, this reference transcriptome is indispensable for annotation of short contigs obtained according to the MACE (Massive Analysis of cDNA Ends) protocol, which implies sequencing of 3′-part of each transcript instead of the whole mRNA [55]. Direct annotation of 3′-parts of transcripts by Gene Ontology or KEGG is often not successful because of dissimilarity of these regions between different species, and the present assembly containing significant number of pea nodule-specific transcripts can therefore serve as a reference for annotation of differentially expressed transcripts revealed by MACE technology.

4. Conclusions

The aim of the present study was the acquisition of nodule-specific transcript sequences via next-generation sequencing. Using an Illumina platform, we obtained 52 million reads from a sample derived from young pea nodules and more than 17 million reads from a root tip sample. We constructed the assemblies (more than 58,000 and 37,000 contigs from nodules and root tips, resp.) and analyzed the “Nodules” assembly for unique sequences. We identified approximately 15,000 nodule-specific contigs associated with different GO biological function terms. Of these, 581 sequences were found to possess full CDSs and could thus be considered as new nodule-specific transcripts of pea.

Because the ability of pea plants to form symbiotic nodules is an agronomically important trait, information about pea nodule-specific gene sequences can be applied by scientists and breeders for primer design, gene-based marker creation, polymorphism studies, and real-time PCR. These findings will thus benefit both fundamental and applied science. The next challenge for researchers is characterization of pea transcripts specific to another symbiosis formed by leguminous plants: arbuscular mycorrhiza, which is also of great importance to both fundamental science and contemporary sustainable agriculture.

Supplementary Material

Supplementary File 1: Contigs representing genes with significantly higher expression level in nodules that were selected by comparison of “Nodules” and “Root Tips” libraries.

Supplementary File 2: Novel pea nodule-specific sequences containing putative full-length CDSs annotated by: 1) comparison with known plant full-length CDSs (GenBank) and M. truncatula genic sequences, and 2) KEGG Orthology mapping.

695947.f1.zip (209.8KB, zip)

Acknowledgments

The work was financially supported by the Russian Science Foundation (Grant no. 14-24-00135). The authors thank Sophie Malakho, Victoria Shtratnikova, and Yury Pekov (Center of Innovations and Technologies “Biological Active Compounds and Their Applications,” Moscow, Russia) for carrying out RNA sequencing on the Illumina GAIIx, Jaroslava Fedorina and Alexander Pinaev (ARRIAM, Saint Petersburg, Russia) for performing Sanger sequencing of pea genes, and Ludmila Dvoryaninova (ARRIAM, Saint Petersburg, Russia) for her excellent technical assistance.

Conflict of Interests

The authors declare that they have no conflict of interests.

References

  • 1.FAOSTAT. Food and Agriculture Organization Corporate Statistical Database. FAOSTAT; 2014. http://faostat.fao.org/ [Google Scholar]
  • 2.Tattersall A. D., Turner L., Knox M. R., Ambrose M. J., Ellis T. H. N., Hofer J. M. I. The mutant crispa reveals multiple roles for PHANTASTICA in pea compound leaf development. Plant Cell. 2005;17(4):1046–1060. doi: 10.1105/tpc.104.029447. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 3.Bar M., Ori N. Compound leaf development in model plant species. Current Opinion in Plant Biology. 2015;23:61–69. doi: 10.1016/j.pbi.2014.10.007. [DOI] [PubMed] [Google Scholar]
  • 4.Oldroyd G. E. D., Downie J. A. Calcium, kinases and nodulation signalling in legumes. Nature Reviews Molecular Cell Biology. 2004;5(7):566–576. doi: 10.1038/nrm1424. [DOI] [PubMed] [Google Scholar]
  • 5.Kouchi H., Imaizumi-Anraku H., Hayashi M., et al. How many peas in a pod? Legume genes responsible for mutualistic symbioses underground. Plant & Cell Physiology. 2010;51(9):1381–1397. doi: 10.1093/pcp/pcq107. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 6.Couzigou J.-M., Zhukov V., Mondy S., et al. NODULE ROOT and COCHLEATA maintain nodule development and are legume orthologs of Arabidopsis BLADE-ON-PETIOLE genes. Plant Cell. 2012;24(11):4498–4510. doi: 10.1105/tpc.112.103747. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 7.Smýkal P., Aubert G., Burstin J., et al. Pea (Pisum sativum L.) in the genomic era. Agronomy. 2012;2(4):74–115. doi: 10.3390/agronomy2020074. [DOI] [Google Scholar]
  • 8.Bohra A., Pandey M. K., Jha U. C., et al. Genomics-assisted breeding in four major pulse crops of developing countries: present status and prospects. Theoretical and Applied Genetics. 2014;127(6):1263–1291. doi: 10.1007/s00122-014-2301-3. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 9.Macas J., Neumann P., Navrátilová A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula . BMC Genomics. 2007;8, article 427 doi: 10.1186/1471-2164-8-427. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 10.Varshney R. K., Close T. J., Singh N. K., Hoisington D. A., Cook D. R. Orphan legume crops enter the genomics era! Current Opinion in Plant Biology. 2009;12(2):202–210. doi: 10.1016/j.pbi.2008.12.004. [DOI] [PubMed] [Google Scholar]
  • 11.Wang Z., Gerstein M., Snyder M. RNA-Seq: a revolutionary tool for transcriptomics. Nature Reviews Genetics. 2009;10(1):57–63. doi: 10.1038/nrg2484. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 12.Ozsolak F., Milos P. M. RNA sequencing: advances, challenges and opportunities. Nature Reviews Genetics. 2011;12(2):87–98. doi: 10.1038/nrg2934. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 13.Franssen S. U., Shrestha R. P., Bräutigam A., Bornberg-Bauer E., Weber A. P. M. Comprehensive transcriptome analysis of the highly complex Pisum sativum genome using next generation sequencing. BMC Genomics. 2011;12, article 227 doi: 10.1186/1471-2164-12-227. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 14.Kaur S., Pembleton L. W., Cogan N. O. I., et al. Transcriptome sequencing of field pea and faba bean for discovery and validation of SSR genetic markers. BMC Genomics. 2012;13(1, article 104) doi: 10.1186/1471-2164-13-104. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 15.Leonforte A., Sudheesh S., Cogan N. O. I., et al. SNP marker discovery, linkage map construction and identification of QTLs for enhanced salinity tolerance in field pea (Pisum sativum L.) BMC Plant Biology. 2013;13(1, article 161) doi: 10.1186/1471-2229-13-161. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 16.Duarte J., Rivière N., Baranger A., et al. Transcriptome sequencing for high throughput SNP development and genetic mapping in pea. BMC Genomics. 2014;15(1, article 126) doi: 10.1186/1471-2164-15-126. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 17.Sindhu A., Ramsay L., Sanderson L.-A., et al. Gene-based SNP discovery and genetic mapping in pea. Theoretical and Applied Genetics. 2014 doi: 10.1007/s00122-014-2375-y. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 18.Zhuang X., McPhee K. E., Coram T. E., Peever T. L., Chilvers M. I. Rapid transcriptome characterization and parsing of sequences in a non-model host-pathogen interaction; pea-Sclerotinia sclerotiorum . BMC Genomics. 2012;13, article 668 doi: 10.1186/1471-2164-13-668. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 19.Kosterin O. E., Rozov S. M. Mapping of the new mutation blb and the problem of integrity of linkage group I. Pisum Genetics. 1993;25:27–31. [Google Scholar]
  • 20.Borisov A. Y., Rozov S. M., Tsyganov V. E., Morzhina E. V., Lebsky V. K., Tikhonovich I. A. Sequential functioning of Sym-13 and Sym-31, two genes affecting symbiosome development in root nodules of pea (Pisum sativum L.) Molecular & General Genetics. 1997;254(5):592–598. doi: 10.1007/s004380050456. [DOI] [PubMed] [Google Scholar]
  • 21.Safronova V. I., Novikova N. I. Comparison of two methods for root nodule bacteria preservation: lyophilization and liquid nitrogen freezing. Journal of Microbiological Methods. 1996;24(3):231–237. doi: 10.1016/0167-7012(95)00042-9. [DOI] [Google Scholar]
  • 22.Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal. 2011;17(1):10–12. doi: 10.14806/ej.17.1.200. [DOI] [Google Scholar]
  • 23.Grabherr M. G., Haas B. J., Yassour M., et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature Biotechnology. 2011;29(7):644–652. doi: 10.1038/nbt.1883. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 24.Langmead B., Salzberg S. L. Fast gapped-read alignment with Bowtie 2. Nature Methods. 2012;9(4):357–359. doi: 10.1038/nmeth.1923. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 25.Li W., Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22(13):1658–1659. doi: 10.1093/bioinformatics/btl158. [DOI] [PubMed] [Google Scholar]
  • 26.Robinson M. D., McCarthy D. J., Smyth G. K. edgeR: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–140. doi: 10.1093/bioinformatics/btp616. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 27.Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. Basic local alignment search tool. Journal of Molecular Biology. 1990;215(3):403–410. doi: 10.1006/jmbi.1990.9999. [DOI] [PubMed] [Google Scholar]
  • 28.Götz S., García-Gómez J. M., Terol J., et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Research. 2008;36(10):3420–3435. doi: 10.1093/nar/gkn176. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 29.Ashburner M., Ball C. A., Blake J. A., et al. Gene ontology: tool for the unification of biology. Nature Genetics. 2000;25(1):25–29. doi: 10.1038/75556. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 30.Smith T. F., Waterman M. S. Identification of common molecular subsequences. Journal of Molecular Biology. 1981;147(1):195–197. doi: 10.1016/0022-2836(81)90087-5. [DOI] [PubMed] [Google Scholar]
  • 31.The UniProt Consortium. UniProt: a hub for protein information. Nucleic Acids Research. 2015;43:D204–D212. doi: 10.1093/nar/gku989. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 32.Kanehisa M., Goto S., Kawashima S., Okuno Y., Hattori M. The KEGG resource for deciphering the genome. Nucleic Acids Research. 2004;32:D277–D280. doi: 10.1093/nar/gkh063. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 33.Kibbe W. A. OligoCalc: an online oligonucleotide properties calculator. Nucleic Acids Research. 2007;35(supplement 2):W43–W46. doi: 10.1093/nar/gkm234. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 34.Corpet F. Multiple sequence alignment with hierarchical clustering. Nucleic Acids Research. 1988;16(22):10881–10890. doi: 10.1093/nar/16.22.10881. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 35.Pedersen A. G., Nielsen H. Neural network prediction of translation initiation sites in eukaryotes: perspectives for EST and genome analysis. Proceedings of the International Conference on Intelligent Systems for Molecular Biology (ISMB '97); June 1997; Halkidiki, Greece. pp. 226–233. [PubMed] [Google Scholar]
  • 36.Kumagai H., Hakoyama T., Umehara Y., et al. A novel ankyrin-repeat membrane protein, IGN1, is required for persistence of nitrogen-fixing symbiosis in root nodules of Lotus japonicus . Plant Physiology. 2007;143(3):1293–1305. doi: 10.1104/pp.106.095356. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 37.Kardailsky I., Yang W. C., Zalensky A., van Kammen A., Bisseling T. The pea late nodulin gene PsNOD6 is homologous to the early nodulin genes PsENOD3/14 and is expressed after the leghaemoglobin genes. Plant Molecular Biology. 1993;23(5):1029–1037. doi: 10.1007/bf00021817. [DOI] [PubMed] [Google Scholar]
  • 38.Mergaert P., Nikovics K., Kelemen Z., et al. A novel family in Medicago truncatula consisting of more than 300 nodule-specific genes coding for small, secreted polypeptides with conserved cysteine motifs. Plant Physiology. 2003;132(1):161–173. doi: 10.1104/pp.102.018192. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 39.Singhal S. De novo transcriptomic analyses for non-model organisms: an evaluation of methods across a multi-species data set. Molecular Ecology Resources. 2013;13(3):403–416. doi: 10.1111/1755-0998.12077. [DOI] [PubMed] [Google Scholar]
  • 40.Wei C., Tao X., Li M., et al. De novo transcriptome assembly of Ipomoea nil using Illumina sequencing for gene discovery and SSR marker identification. Molecular Genetics and Genomics. 2015;290(5):1873–1884. doi: 10.1007/s00438-015-1034-6. [DOI] [PubMed] [Google Scholar]
  • 41.Vernié T., Moreau S., de Billy F., et al. EFD is an ERF transcription factor involved in the control of nodule number and differentiation in Medicago truncatula . Plant Cell. 2008;20(10):2696–2713. doi: 10.1105/tpc.108.059857. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 42.Murray J. D., Muni R. R. D., Torres-Jerez I., et al. Vapyrin, a gene essential for intracellular progression of arbuscular mycorrhizal symbiosis, is also essential for infection by rhizobia in the nodule symbiosis of Medicago truncatula . The Plant Journal. 2011;65(2):244–252. doi: 10.1111/j.1365-313x.2010.04415.x. [DOI] [PubMed] [Google Scholar]
  • 43.Smit P., Raedts J., Portyanko V., et al. NSP1 of the GRAS protein family is essential for rhizobial nod factor-induced transcription. Science. 2005;308(5729):1789–1791. doi: 10.1126/science.1111025. [DOI] [PubMed] [Google Scholar]
  • 44.Hakoyama T., Niimi K., Yamamoto T., et al. The integral membrane protein SEN1 is required for symbiotic nitrogen fixation in Lotus japonicus nodules. Plant & Cell Physiology. 2012;53(1):225–236. doi: 10.1093/pcp/pcr167. [DOI] [PubMed] [Google Scholar]
  • 45.Tang H., Krishnakumar V., Bidwell S., et al. An improved genome release (version Mt4.0) for the model legume Medicago truncatula . BMC Genomics. 2014;15, article 312 doi: 10.1186/1471-2164-15-312. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 46.Mortier V., Den Herder G., Whitford R., et al. CLE peptides control Medicago truncatula nodulation locally and systemically. Plant Physiology. 2010;153(1):222–237. doi: 10.1104/pp.110.153718. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 47.Okamoto S., Ohnishi E., Sato S., et al. Nod factor/nitrate-induced CLE genes that drive HAR1-mediated systemic regulation of nodulation. Plant and Cell Physiology. 2009;50(1):67–77. doi: 10.1093/pcp/pcn194. [DOI] [PubMed] [Google Scholar]
  • 48.Reid D. E., Ferguson B. J., Gresshoff P. M. Inoculation- and nitrate-induced CLE peptides of soybean control NARK-dependent nodule formation. Molecular Plant-Microbe Interactions. 2011;24(5):606–618. doi: 10.1094/MPMI-09-10-0207. [DOI] [PubMed] [Google Scholar]
  • 49.Miyawaki K., Tabata R., Sawa S. Evolutionarily conserved CLE peptide signaling in plant development, symbiosis, and parasitism. Current Opinion in Plant Biology. 2013;16(5):598–606. doi: 10.1016/j.pbi.2013.08.008. [DOI] [PubMed] [Google Scholar]
  • 50.Osipova M. A., Mortier V., Demchenko K. N., et al. Wuschel-related homeobox 5 gene expression and interaction of CLE peptides with components of the systemic control add two pieces to the puzzle of autoregulation of nodulation. Plant Physiology. 2012;158(3):1329–1341. doi: 10.1104/pp.111.188078. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 51.Alunni B., Kevei Z., Redondo-Nieto M., Kondorosi A., Mergaert P., Kondorosi E. Genomic organization and evolutionary insights on GRP and NCR genes, two large nodule-specific gene families in Medicago truncatula . Molecular Plant-Microbe Interactions. 2007;20(9):1138–1148. doi: 10.1094/mpmi-20-9-1138. [DOI] [PubMed] [Google Scholar]
  • 52.Tsyganov V. E., Voroshilova V. A., Priefer U. B., Borisov A. Y., Tikhonovich I. A. Genetic dissection of the initiation of the infection process and nodule tissue development in the Rhizobium-pea (Pisum sativum L.) symbiosis. Annals of Botany. 2002;89(4):357–366. doi: 10.1093/aob/mcf051. [DOI] [PMC free article] [PubMed] [Google Scholar]
  • 53.Voroshilova V. A., Demchenko K. N., Brewin N. J., Borisov A. Y., Tikhonovich I. A. Initiation of a legume nodule with an indeterminate meristem involves proliferating host cells that harbour infection threads. New Phytologist. 2009;181(4):913–923. doi: 10.1111/j.1469-8137.2008.02723.x. [DOI] [PubMed] [Google Scholar]
  • 54.Ivanova K. A., Tsyganova A. V., Brewin N. J., Tikhonovich I. A., Tsyganov V. E. Induction of host defences by Rhizobium during ineffective nodulation of pea (Pisum sativum L.) carrying symbiotically defective mutations sym40 (PsEFD), sym33 (PsIPD3/PsCYCLOPS) and sym42 . Protoplasma. 2015:1–13. doi: 10.1007/s00709-015-0780-y. [DOI] [PubMed] [Google Scholar]
  • 55.Zawada A. M., Rogacev K. S., Müller S., et al. Massive analysis of cDNA Ends (MACE) and miRNA expression profiling identifies proatherogenic pathways in chronic kidney disease. Epigenetics. 2014;9(1):161–172. doi: 10.4161/epi.26931. [DOI] [PMC free article] [PubMed] [Google Scholar]

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supplementary File 1: Contigs representing genes with significantly higher expression level in nodules that were selected by comparison of “Nodules” and “Root Tips” libraries.

Supplementary File 2: Novel pea nodule-specific sequences containing putative full-length CDSs annotated by: 1) comparison with known plant full-length CDSs (GenBank) and M. truncatula genic sequences, and 2) KEGG Orthology mapping.

695947.f1.zip (209.8KB, zip)

Articles from International Journal of Genomics are provided here courtesy of Wiley

RESOURCES